3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes


Tejaswini Medi (Universität Mannheim), Arianna Rampini (Autodesk), Pradyumna Reddy (Autodesk), Pradeep Kumar Jayaraman (Autodesk), Margret Keuper (Universität Mannheim)
The 35th British Machine Vision Conference

Abstract

Autoregressive (AR) models excel in language and image generation, but their role in 3D generation faces high computational cost and resolution challenges. Existing 3D AR methods, using voxel grids or implicit representations, produce long, redundant token sequences, limiting high-fidelity 3D shape generation and incurring high inference cost. To address these issues, we introduce 3D-WAG, a novel autoregressive framework employing compact wavelet-based hierarchical representations for efficient and expressive 3D shape generation. By representing the shapes in the wavelet domain, 3D-WAG captures coarse to fine geometric details as multi-scale discrete token maps, using a 3D vector-quantized variational autoencoder (VQVAE), enabling efficient AR modeling and detailed shape understanding. Unlike conventional next-token prediction, 3D-WAG formulates 3D shape generation as a next-scale token map prediction problem, achieving a faster inference time of 1.15 seconds per sample on a single NVIDIA H100 GPU, which is 15 times faster than the state-of-the-art diffusion-based 3D generation model UDiFF. Furthermore, 3D-WAG supports unconditional, class-conditional, and text-conditional shape generation. Experimental results on standard 3D benchmarks, including ShapeNet and DeepFashion3D, show that 3D-WAG outperforms state-of-the-art methods on metrics such as Minimum Matching Distance (MMD) and Coverage (COV), generating high-quality 3D shapes that accurately represent real-world data distributions.

Citation

@inproceedings{Medi_2025_BMVC,
author    = {Tejaswini Medi and Arianna Rampini and Pradyumna Reddy and Pradeep Kumar Jayaraman and Margret Keuper},
title     = {3D-WAG: Hierarchical Wavelet-Guided Autoregressive Generation for High-Fidelity 3D Shapes},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_1064/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection