OpenHuman4D: Open-Vocabulary 4D Human Parsing


Keito Suzuki (University of California, San Diego), Bang Du (University of California, San Diego), Runfa Li (University of California, San Diego), Kunyao Chen (Qualcomm Inc), Lei Wang (Qualcomm Inc), Peng Liu (Qualcomm Inc), Ning Bi (Qualcomm Inc), Truong Nguyen (University of California, San Diego)
The 35th British Machine Vision Conference

Abstract

Understanding dynamic 3D human representation has become increasingly critical in virtual and extended reality applications. However, existing human part segmentation methods are constrained by reliance on closed-set datasets and prolonged inference times, which significantly restrict their applicability. In this paper, we introduce the first 4D human parsing framework that simultaneously addresses these challenges by reducing the inference time and introducing open-vocabulary capabilities. Building upon state-of-the-art open-vocabulary 3D human parsing techniques, our approach extends the support to 4D human-centric video with three key innovations: 1) We adopt mask-based video object tracking to efficiently establish spatial and temporal correspondences, avoiding the necessity of segmenting all frames. 2) A novel Mask Validation module is designed to manage new target identification and mitigate tracking failures. 3) We propose a 4D Mask Fusion module, integrating memory-conditioned attention and logits equalization for robust embedding fusion. Extensive experiments demonstrate the effectiveness and flexibility of the proposed method on 4D human-centric parsing tasks, achieving up to 93.3\% acceleration compared to the previous state-of-the-art method, which was limited to parsing fixed classes.

Citation

@inproceedings{Suzuki_2025_BMVC,
author    = {Keito Suzuki and Bang Du and Runfa Li and Kunyao Chen and Lei Wang and Peng Liu and Ning Bi and Truong Nguyen},
title     = {OpenHuman4D: Open-Vocabulary 4D Human Parsing},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_566/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection