CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation


Hariprasath Govindarajan (Linköping University, Qualcomm Inc), Maciej Wozniak (KTH Royal Institute of Technology), Marvin Klingner (Qualcomm Inc), Camille Maurice (Qualcomm Inc), B Ravi Kiran (Qualcomm Inc), Senthil Yogamani (Qualcomm Inc)
The 35th British Machine Vision Conference

Abstract

Vision foundation models have revolutionized 2D camera-based perception by ex- tracting generalized features for downstream tasks. Recent work applies self-supervised cross-modal knowledge distillation (KD) to transfer these capabilities to 3D LiDAR mod- els, but often relies on complex losses, pseudo-semantic maps, or limits KD to seman- tic segmentation. We introduce CleverDistiller, a self-supervised, cross-modal 2D-to-3D KD framework with simple yet effective design choices. Our method uses a direct feature similarity loss and an MLP projection head to capture complex semantic dependencies without relying on pseudo-semantic maps or explicit semantic supervision. Additionally, we enhance the learned knowledge with a self-supervised occupancy prediction task, further improving 3D spatial reasoning. Experiments on autonomous driving benchmarks show that CleverDistiller achieves state-of-the-art performance in both semantic segmentation and 3D object detection, with up to 10% mIoU improvement, particularly when fine tuning with limited data. Additionally, models pretrained with our approch shows extreme robustness towards weather and sensor corruption as well as great domain generalization capabilities.

Citation

@inproceedings{Govindarajan_2025_BMVC,
author    = {Hariprasath Govindarajan and Maciej Wozniak and Marvin Klingner and Camille Maurice and B Ravi Kiran and Senthil Yogamani},
title     = {CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_479/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection