DepthHMR: Leveraging Depth Around Humans for Multi-Human Mesh Generation


Nikhil Sharma (University of Illinois Chicago), Jiachen Tao (University of Illinois Chicago), Junyi Wu (University of Illinois Chicago), Yan Yan (University of Illinois Chicago)
The 35th British Machine Vision Conference

Abstract

We present DepthHMR, a novel one-stage framework for multi-human mesh reconstruction from a single RGB image. While recent DETR-style approaches have shown promising results, they primarily rely on image features that lack explicit 3D reasoning, leading to depth ambiguities in scenes with occlusion, scale variation, or distant individuals. To address this, our \textsc{DepthHMR} integrates metric depth cues and explicit 3D positional reasoning into a unified one-stage DETR framework. Central to our approach is a Depth Around Human (DAH) module, which isolates human-centric depth cues from monocular depth maps. Unlike general scene depth estimation, DAH focuses specifically on human subjects, improving depth priors for distant and occluded humans. To enhance depth representation, we adopt a non-uniform depth discretization scheme, allocating denser bins in near-field regions and sparser bins at greater distances. This design enables more precise depth reasoning in human-centric zones. Our depth prediction branch, supervised using DAH-generated pseudo ground-truth, enables the shared backbone to simultaneously learn geometry-aware and appearance-aware features. Building on these depth-informed representations, we propose depth-guided 3D query initialization followed by a depth-aware cross-attention decoder that refines SMPL mesh related attributes for each query representing person in the scene. Our model achieves state-of-the-art mesh reconstruction performance, reducing MVE by 2.8 mm on AGORA and 4.6 mm on 3DPW, while only requiring substantially lower resolution inputs, enhancing both accuracy and efficiency.

Citation

@inproceedings{Sharma_2025_BMVC,
author    = {Nikhil Sharma and Jiachen Tao and Junyi Wu and Yan Yan},
title     = {DepthHMR: Leveraging Depth Around Humans for Multi-Human Mesh Generation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_328/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection