Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition


Chun-Hsiao Yeh (UC Berkeley), Ta-Ying Cheng (University of Oxford), He-Yen Hsieh (Harvard University), David Chuan-En Lin (Carnegie Mellon University), Yi Ma (University of Hong Kong), Andrew Markham (University of Oxford), Niki Trigoni (University of Oxford), H. T. Kung (Harvard University), Yubei Chen (UC Davis)
The 35th British Machine Vision Conference

Abstract

In this paper, we identify two major gaps in personalizing text-to-image diffusion models, i.e., placing personalized concepts into generated image: 1) Creating a high-quality multi-concept personalized dataset with detailed and aligned text descriptions is challenging. 2) There lacks comprehensive metrics to evaluate multiple personalized concepts in an image. To overcome these challenges, we propose Gen4Gen, a novel generative data pipeline for creating a benchmark dataset (MyCanvas) that combines personalized concepts into complex compositions aligning with detailed text descriptions, aiming to benchmark and improve multi-concept personalization. In addition, we introduce comprehensive metrics (CP-CLIP / TI-CLIP) for evaluating the performance of multi-concept personalization models more effectively. Finally, we provide a simple yet effective baseline built on top of several personalization methods with empirical prompting strategies for future researchers to evaluate on MyCanvas benchmark. By improving data quality, we can significantly increase the multi-concept image generation quality without changing the model architecture or training algorithms, and we show our work can be simply plug in to personalization approaches. We suggest that leveraging strong foundation models for dataset generation could benefit various computer vision tasks. Code and benchmark dataset are available at https://danielchyeh.github.io/Gen4Gen/.

Citation

@inproceedings{Yeh_2025_BMVC,
author    = {Chun-Hsiao Yeh and Ta-Ying Cheng and He-Yen Hsieh and David Chuan-En Lin and Yi Ma and Andrew Markham and Niki Trigoni and H. T. Kung and Yubei Chen},
title     = {Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_28/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection