Conformal Predictors for Efficient Video Text Spotting


Amor Ben Tanfous (Terminal Industries), Sankha Subhra Mukherjee (Terminal Industries), Neil M. Robertson (Terminal Industries)
The 35th British Machine Vision Conference

Abstract

Video text spotting can be achieved by first detecting and recognizing text instances at the frame level, then tracking these texts across video frames. A major challenge with this approach is the inconsistency of text recognition across frames, where an image text spotter may misrecognize text in some frames while correctly recognizing it in others. We argue that text recognition accuracy is correlated with the model's prediction uncertainty. To improve prediction accuracy in videos, we use a statistically rigorous approach to estimate the uncertainty of predicted texts and leverage tracking information to produce more confident and accurate results. To this end, we employ a pretrained state-of-the-art text spotter and apply the conformal prediction framework to estimate its prediction uncertainties. A second challenge addressed in this work is the inconsistent quality of predicted bounding boxes, which impacts text tracking performance. As a solution, we use a second conformal prediction approach that applies corrections to the predicted bounding boxes, providing guarantees with a predefined probability of success. Extensive experiments on four public datasets demonstrate consistent improvements over state-of-the-art methods, achieving IDF1 gains of 1.6% on DSText, 1.0% on ArTVideo, and 2.3% on BOVText. While being computationally lightweight, the proposed method integrates seamlessly into any pretrained matching-based video text spotter, enhancing overall performance. The code is available at https://github.com/OmarBent/CVTS.

Citation

@inproceedings{Tanfous_2025_BMVC,
author    = {Amor Ben Tanfous and Sankha Subhra Mukherjee and Neil M. Robertson},
title     = {Conformal Predictors for Efficient Video Text Spotting},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_1118/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection