Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning

Wenyi Lian (Uppsala University), Patrick Micke (Uppsala University), Joakim Lindblad (Uppsala University), Nataša Sladoje (Uppsala University)

The 35^th British Machine Vision Conference

PDF Poster Video (Right click to download)Supplementary

Abstract

Vision Transformers (ViTs) have achieved remarkable success in standard RGB image analysis. However, applying ViTs to multi-channel imaging (MCI) data, e.g., for medical and remote sensing applications, remains a challenge. In particular, MCI data often consist of layers acquired from different modalities. Directly training ViTs on such data can obscure modality/channel-specific information and impair performance. In this paper, we propose Isolated Channel ViT (IC-ViT), a simple yet effective training framework for large-scale MCI data. By randomly sampling one channel per image per iteration, IC-ViT learns channel-specific representations without requiring multi-channel fusion during pretraining. These representations are later integrated during finetuning to capture cross-channel dependencies in downstream tasks. Experiments on various benchmarks, including JUMP-CP and CHAMMI for cell microscopy, and So2Sat-LCZ42 for satellite imaging, show the proposed IC-ViT outperforms existing channel-adaptive approaches by 4–14\%. Moreover, its efficient training makes it a suitable candidate for large-scale pretraining of foundation models on heterogeneous data. Our code is available at https://github.com/shermanlian/IC-ViT.

Citation

@inproceedings{Lian_2025_BMVC,
author    = {Wenyi Lian and Patrick Micke and Joakim Lindblad and Nataša Sladoje},
title     = {Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_1132/paper.pdf}
}

Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection

body { background-color: white !important; color: black !important; }Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning

Wenyi Lian (Uppsala University), Patrick Micke (Uppsala University), Joakim Lindblad (Uppsala University), Nataša Sladoje (Uppsala University)

Wenyi Lian (Uppsala University), Patrick Micke (Uppsala University), Joakim Lindblad (Uppsala University), Nataša Sladoje (Uppsala University)

The 35th British Machine Vision Conference

Abstract

Citation

Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning

The 35^th British Machine Vision Conference