UMM: A Unified Multi-Modal Model for Low-Level Vision Tasks with Dual-Driven Prompting


Ziqi Luo ( South University of Science and Technology), Jinxiang Lai (The Hong Kong University of Science and Technology), Ruitao Chen (Southern University of Science and Technology), Jinyu Yang (tapall.ai), Bin-Bin Gao (Tencent), Qiang Nie (The Hong Kong University of Science and Technology), Jun Liu (Tencent YouTu Lab), Jinfan Wang (Southern University of Science and Technology), Feng Zheng (Southern University of Science and Technology)
The 35th British Machine Vision Conference

Abstract

Current research lacks exploration of the unified multi-modal model for low-level vision tasks. Although there are some RGB-only unified models, they usually require multiple independent decoders or additional parameters for different tasks, failing to exploit their potential connection and shared knowledge. In this paper, we propose a unified multi-modal model (UMM) that can cope with various low-level vision tasks with all parameters shared and one common decoder. The core of our model is the proposed innovative dual-driven prompting paradigm, which aims to employ multi-modal prompting to enhance the robustness of the model and utilize task prompts to guide the model to extract features related to the specific tasks. Furthermore, we propose a task-aware fusion module (TFM). It guides multi-modal fusion through task prompts, enabling the model to focus on key features of the specific tasks during the fusion process. The experimental results show that our unified model UMM achieves competitive performance on various multi-modal low-level vision tasks, including RGB-T glass detection, RGB-T low-light enhancement, RGB-D salient object detection and RGB-N drivable area detection.

Citation

@inproceedings{Luo_2025_BMVC,
author    = {Ziqi Luo and Jinxiang Lai and Ruitao Chen and Jinyu Yang and Bin-Bin Gao and Qiang Nie and Jun Liu and Jinfan Wang and Feng Zheng},
title     = {UMM: A Unified Multi-Modal Model for Low-Level Vision Tasks with Dual-Driven Prompting},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_392/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection