EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance


Zicheng Duan (The University of Adelaide), Yuxuan Ding (Qualcomm AI Research), Chenhui Gou (Monash University), Ziqin Zhou (The University of Adelaide), Ethan Smith (Leonardo.AI), Lingqiao Liu (The University of Adelaide)
The 35th British Machine Vision Conference

Abstract

Zero-shot personalized image generation models aim to produce images that align with both a given text prompt and subject image, requiring the model to incorporate both sources of guidance. Existing methods often struggle to capture fine-grained subject details and frequently prioritize one form of guidance over the other, resulting in suboptimal subject encoding and imbalanced generation. In this study, we uncover key insights into overcoming such drawbacks, notably that 1) the choice of the subject image encoder critically influences subject identity preservation and training efficiency, and 2) the text and subject guidance should take effect at different denoising stages. Building on these insights, we introduce a new approach, EZIGen, that employs two main components: leveraging a fixed pre-trained Diffusion UNet itself as subject encoder, following a process that balances the two guidances by separating their dominance stage and revisiting certain time steps to bootstrap subject transfer quality. Through these two components, EZIGen, initially built upon SD2.1-base, achieved state-of-the-art performances on multiple personalized generation benchmarks with a unified model, while using 100 times less training data. Moreover, by further migrating our design to SDXL, EZIGen has been proven to be a versatile model-agnostic solution for personalized generation. The code and checkpoints are publicly available at https://github.com/ZichengDuan/EZIGen.

Citation

@inproceedings{Duan_2025_BMVC,
author    = {Zicheng Duan and Yuxuan Ding and Chenhui Gou and Ziqin Zhou and Ethan Smith and Lingqiao Liu},
title     = {EZIGen: Enhancing zero-shot personalized image generation with precise subject encoding and decoupled guidance},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_558/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection