MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction


Yingshuang Zou (Tsinghua University), Yikang Ding (Megvii Technology), Chuanrui Zhang (Tsinghua University), Jiazhe Guo (Tsinghua University), Bohan Li (Shanghai Jiaotong University), Xiaoyang Lyu (University of Hong Kong), Feiyang Tan (Mach Drive), Xiaojuan Qi (University of Hong Kong), Haoqian Wang (Tsinghua University)
The 35th British Machine Vision Conference

Abstract

Recent breakthroughs in radiance fields have significantly advanced 3D scene reconstruction and novel view synthesis (NVS) in autonomous driving. Nevertheless, critical limitations persist: reconstruction-based methods exhibit substantial performance deterioration under large viewpoint deviations from training trajectories, while generation-based techniques struggle with temporal coherence and precise scene controllability. To overcome these challenges, we present MuDG, an innovative framework that integrates Multi-modal Diffusion model with Gaussian Splatting (GS) for Urban Scene Reconstruction. MuDG leverages aggregated LiDAR point clouds with RGB and geometric priors to condition a multi-modal video diffusion model, synthesizing photorealistic RGB, depth, and semantic outputs for novel viewpoints. This synthesis pipeline enables feed-forward NVS without computationally intensive per-scene optimization,providing supervision to refine 3DGS representations for robust rendering under extreme viewpoint changes. Experiments on the Open Waymo Dataset demonstrate that MuDG outperforms existing methods in both reconstruction and synthesis quality.

Citation

@inproceedings{Zou_2025_BMVC,
author    = {Yingshuang Zou and Yikang Ding and Chuanrui Zhang and Jiazhe Guo and Bohan Li and Xiaoyang Lyu and Feiyang Tan and Xiaojuan Qi and Haoqian Wang},
title     = {MuDG: Taming Multi-modal Diffusion with Gaussian Splatting for Urban Scene Reconstruction},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_628/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection