Cross-Modal Scene Semantic Alignment for Image Complexity Assessment


Yuqing Luo (Cardiff University), YIXIAO LI (Beihang University), Jiang Liu (Cardiff University), Jun Fu (Cardiff University), Hadi Amirpour (University of Klagenfurt), Guanghui Yue (Shenzhen University), Baoquan Zhao (SUN YAT-SEN UNIVERSITY), Padraig Corcoran (Cardiff University), Hantao Liu (Cardiff University), Wei Zhou (Cardiff University)
The 35th British Machine Vision Conference

Abstract

Image complexity assessment (ICA) is a challenging task in perceptual evaluation due to the subjective nature of human perception and the inherent semantic diversity in real-world images. Existing ICA methods predominantly rely on hand-crafted or shallow convolutional neural network-based features of a single visual modality, which are insufficient to fully capture the perceived representations closely related to image complexity. Recently, cross-modal scene semantic information has been shown to play a crucial role in various computer vision tasks, particularly those involving perceptual understanding. However, the exploration of cross-modal scene semantic information in the context of ICA remains unaddressed. Therefore, in this paper, we propose a novel ICA method called Cross-Modal Scene Semantic Alignment (CM-SSA), which leverages scene semantic alignment from a cross-modal perspective to enhance ICA performance, enabling complexity predictions to be more consistent with subjective human perception. Specifically, the proposed CM-SSA consists of a complexity regression branch and a scene semantic alignment branch. The complexity regression branch estimates image complexity levels under the guidance of the scene semantic alignment branch, while the scene semantic alignment branch is used to align images with corresponding text prompts that convey rich scene semantic information by pair-wise learning. Extensive experiments on several ICA datasets demonstrate that the proposed CM-SSA significantly outperforms state-of-the-art approaches. Codes are available at https://github.com/XQ2K/First-Cross-Model-ICA.

Citation

@inproceedings{Luo_2025_BMVC,
author    = {Yuqing Luo and YIXIAO LI and Jiang Liu and Jun Fu and Hadi Amirpour and Guanghui Yue and Baoquan Zhao and Padraig Corcoran and Hantao Liu and Wei Zhou},
title     = {Cross-Modal Scene Semantic Alignment for Image Complexity Assessment},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_37/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection