DualDistill: A Unified Cross-Modal Knowledge Distillation Framework for Camera-Based BEV Representation


Gaeun Kim (Seoul National University of Science and Technology), Daeil Han (Seoul National University of Science and Technology), Yeong Jun Koh (Chungnam National University), Hanul Kim (Seoul National University of Science and Technology)
The 35th British Machine Vision Conference

Abstract

Cross-modal knowledge distillation has drawn much attention to camera-based bird’s-eye-view (BEV) models, aiming to narrow the performance gap with their LiDAR-based counterparts. However, distilliing knowledge from a LiDAR-based teacher is not easy due to the discrepancy between sensor modalities. In this work, we introduce DualDistill, a unified cross-modal knowledge distillation framework to address this challenge. We propose an attention-guided orthogonal alignment (AOA) to align student features with the teacher's representations while preserving useful information. This alignment is integrated into a multi-scale feature distillation with adaptive region weighting scheme. In addition, we introduce a cross-head response distillation (CRD) to enforce consistency in BEV representations by comparing the predictions of the teacher and the aligned student. We evaluate our method on the nuScenes dataset. Comprehensive experiments show that our method significantly improves camera-based BEV models and outperforms recent cross-modal knowledge distillation techniques. The code is available at https://github.com/Gaeun-Kimm/DualDistill.

Citation

@inproceedings{Kim_2025_BMVC,
author    = {Gaeun Kim and Daeil Han and Yeong Jun Koh and Hanul Kim},
title     = {DualDistill: A Unified Cross-Modal Knowledge Distillation Framework for Camera-Based BEV Representation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_915/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection