Task Progressive Curriculum Learning for Robust Visual Question Answering


Ahmed Akl (Griffith University), Abdelwahed Khamis (CSIRO), Zhe Wang (Griffith University), Ali Cheraghian (CSIRO), Sara Khalifa (Queensland University of Technology), Kewen Wang (Griffith University)
The 35th British Machine Vision Conference

Abstract

Visual Question Answering (VQA) systems are notoriously brittle under distribution shifts and data scarcity. While previous solutions—such as ensemble methods and data augmentation—can improve performance in isolation, they fail to generalise well across in-distribution (IID), out-of-distribution (OOD), and low-data settings simultaneously. We argue that this limitation stems from the suboptimal training strategies employed. Specifically, treating all training samples uniformly—without accounting for question difficulty or semantic structure— leaves the models vulnerable to dataset biases. Thus, they struggle to generalise beyond the training distribution. To address this issue, we introduce Task-Progressive Curriculum Learning (TPCL)— a simple, model-agnostic framework that progressively trains VQA models using a curriculum built by jointly considering question type and difficulty. Specifically, TPCL first groups questions based on their semantic type (e.g., yes/no, counting) and then orders them using a novel Optimal Transport-based difficulty measure. Without relying on data augmentation or explicit debiasing, TPCL improves generalisation across IID, OOD, and low-data regimes and achieves state-of-the-art performance on VQA-CP v2, VQA-CP v1, and VQA v2. It outperforms the most competitive robust VQA baselines by over 5% and 7% on VQA-CP v2 and v1, respectively, and boosts backbone performance by up to 28.5%.

Citation

@inproceedings{Akl_2025_BMVC,
author    = {Ahmed Akl and Abdelwahed Khamis and Zhe Wang and Ali Cheraghian and Sara Khalifa and Kewen Wang},
title     = {Task Progressive Curriculum Learning for Robust Visual Question Answering},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_294/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection