Lang4D: Weakly Supervised Learning of 4D Language Splatting


Mana Masuda (CyberAgent, Keio University), Taiki Sekii (CyberAgent)
The 35th British Machine Vision Conference

Abstract

Techniques for obtaining the abstract meaning of scenes as language embeddings at arbitrary positions in time-varying 3D spaces, i.e., in 4D spaces, are now possible. These techniques are expected to enable applications such as content search and prediction in digital archives and digital twins. In this study, to acquire such a *4D language field*, we focus on two key challenges when applying existing methods designed for static scenes: (1) expanding the previous techniques to 4D space and (2) leveraging video-level language embeddings obtained from vision-language video foundation models. To address these challenges simultaneously, we propose *Lang4D*, a novel method that models the temporal changes of language embeddings in 3D space using video foundation models in a data-driven manner. Lang4D employs a video foundation model that inputs virtual viewpoint images, where recognition accuracy is stable, to obtain a common language embedding across all times and pixels for each video. Subsequently, it weakly supervises the learning of fluctuations in the language embeddings projected onto virtual viewpoints. In our experiments, we constructed a dataset specifically to evaluate the novel 4D language grounding and segmentation task. We verified the effectiveness of the proposed method in addressing these two challenges through quantitative evaluations and ablation studies.

Citation

@inproceedings{Masuda_2025_BMVC,
author    = {Mana Masuda and Taiki Sekii},
title     = {Lang4D: Weakly Supervised Learning of 4D Language Splatting},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_136/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection