Controllable Garment Generation with Multi-Modal Diffusion Guidance


Sanhita Pathak (IIT Delhi), Vinay Kaushik (IIIT Sonepat), Brejesh Lall (IIT Delhi)
The 35th British Machine Vision Conference

Abstract

We address the task of controllable garment image generation, where the goal is to synthesize realistic clothing images from incomplete and interpretable inputs such as sketches, color cues, text prompts, and fabric textures. This task poses unique challenges in aligning high-level semantics with spatial structure while preserving visual fidelity. We propose GenWear, a unified conditional diffusion framework built atop a frozen Paint-by-Example (PBE) denoising UNet, augmented with learnable modules for modality-specific control. We introduce SketchCtrl, a ControlNet based module which injects multi-scale spatial features from sketches via zero-convolution, ensuring structural fidelity. ColorFiLM employs feature-wise linear modulation to steer global color tone without disrupting pretrained activations. A BLIP-2-based text adapter biases the sketch encoder with semantic priors to resolve ambiguities, while a cross-attentive texture adapter injects local fabric cues into the decoder for material realism. These modules operate synergistically, enabling disentangled control without modifying the core diffusion backbone. Unlike prior works, GenWear does not require a full reference image at test time, treating the sketch as both spatial and semantic guide. Experiments on VITON-HD dataset demonstrate that our approach achieves state-of-the-art quality and controllability across diverse garment types and modalities.

Citation

@inproceedings{Pathak_2025_BMVC,
author    = {Sanhita Pathak and Vinay Kaushik and Brejesh Lall},
title     = {Controllable Garment Generation with Multi-Modal Diffusion Guidance},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_1072/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection