Learning from Synthetic Data for Visual Grounding


Ruozhen He (Rice University), Ziyan Yang (Rice University), Paola Cascante-Bonilla (Stony Brook University), Alexander C. Berg (University of California, Irvine), Vicente Ordonez (Rice University)
The 35th British Machine Vision Conference

Abstract

We introduce SynGround, a novel approach to learn representations for visual grounding using a combination of real data along with synthetic images, synthetic referring expressions, and corresponding bounding boxes. We explore various strategies to best generate additional image-text pairs and image-text-box triplets using pretrained models under different settings and varying degrees of reliance on real data. Through comparative analyses with synthetic, real, and web-crawled data, we identify factors that contribute to performance differences. We find that SynGround can improve the localization capabilities of a vision-and-language model and offers the potential for arbitrarily large-scale data generation. Particularly, data generated with SynGround improves the pointing game accuracy of a pretrained ALBEF model by 4.81% and improves BLIP by 17.11% absolute percentage points on average across RefCOCO+ and Flickr30k benchmarks.

Citation

@inproceedings{He_2025_BMVC,
author    = {Ruozhen He and Ziyan Yang and Paola Cascante-Bonilla and Alexander C. Berg and Vicente Ordonez},
title     = {Learning from Synthetic Data for Visual Grounding},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_22/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection