Split Matching for Inductive Zero-shot Semantic Segmentation


Jialei Chen (Nagoya University), Xu Zheng (The Hong Kong University of Science and Technology (Guangzhou) (HKUST (GZ))), Dongyue Li (Nagoya University), Chong Yi (Nagoya University), Seigo Ito (Nagoya University), Danda Pani Paudel (Artificial Intelligence and Technology (INSAIT), Sofia University), Luc Van Gool (Artificial Intelligence and Technology (INSAIT), Sofia University), Hiroshi Murase (Nagoya University), Daisuke Deguchi (Nagoya University)
The 35th British Machine Vision Conference

Abstract

Zero-shot Semantic Segmentation (ZSS) targets the segmentation of unseen classes, i.e., classes not annotated during training. While fine-tuned vision-language models show promise, they often overfit to seen classes due to the lack of supervision. Query-based methods offer strong potential by enabling object localization without explicit labels, but conventional approaches assume full supervision and thus tend to misclassify unseen classes as background in ZSS settings. To address this issue, we propose Split Matching (SM), a novel assignment strategy that decouples Hungarian matching into two components: one for seen classes in annotated regions and another for latent classes in unannotated regions (referred to as unseen candidates). Specifically, we split the queries into seen and candidate queries, enabling each to be optimized independently according to its available supervision. To discover unseen candidates, we cluster CLIP dense features to generate pseudo masks and extract region-level embeddings using CLS tokens. Matching is then conducted separately for the two groups based on both class and mask similarity. Additionally, we introduce a Multi-scale Feature Enhancement (MFE) module that refines decoder features through residual multi-scale aggregation, improving the model’s ability to capture spatial details across resolutions. Besides, we also introduce a Random Query (RQ) strategy to further enhance the performance after training. Our method is the first to introduce decoupled Hungarian matching under the inductive ZSS setting, and achieves 0.8% and 1.1% higher hIoU on two ZSS benchmarks.

Citation

@inproceedings{Chen_2025_BMVC,
author    = {Jialei Chen and Xu Zheng and Dongyue Li and Chong Yi and Seigo Ito and Danda Pani Paudel and Luc Van Gool and Hiroshi Murase and Daisuke Deguchi},
title     = {Split Matching for Inductive Zero-shot Semantic Segmentation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_43/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection