Verifier Matters: Enhancing Inference-Time Scaling for Video Diffusion Models


Lorenzo Baraldi (University of Pisa), Davide Bucciarelli (University of Pisa), Zifan Zeng (Technische Universität München), Chongzhe Zhang (Technische Universität Berlin), Qunli Zhang (Huawei Technologies Ltd.), Marcella Cornia (University of Modena and Reggio Emilia), Lorenzo Baraldi (University of Modena and Reggio Emilia), Feng Liu (Huawei Technologies Ltd.), Zheng Hu (Huawei Technologies Ltd.), Rita Cucchiara (University of Modena and Reggio Emilia)
The 35th British Machine Vision Conference

Abstract

Inference-time scaling has recently gained attention as an effective strategy for improving the performance of generative models without requiring additional training. Although this paradigm has been successfully applied in text and image generation tasks, its extension to video diffusion models remains relatively underexplored. Indeed, video generation presents unique challenges due to its spatiotemporal complexity, particularly in evaluating intermediate generated samples, a procedure that is required by inference-time scaling algorithms. In this work, we systematically investigate the role of the verifier: the scoring mechanism used to guide sampling. We show that current verifiers, when applied at early diffusion steps, face significant reliability challenges due to noisy samples. We further demonstrate that fine-tuning verifiers on partially denoised samples significantly improves early-stage evaluation and leads to gains in generation quality across multiple inference-time scaling algorithms, including Greedy Search, Beam Search, and a new Successive Halving baseline, which we adapt for the inference-time scaling setting.

Citation

@inproceedings{Baraldi_2025_BMVC,
author    = {Lorenzo Baraldi and Davide Bucciarelli and Zifan Zeng and Chongzhe Zhang and Qunli Zhang and Marcella Cornia and Lorenzo Baraldi and Feng Liu and Zheng Hu and Rita Cucchiara},
title     = {Verifier Matters: Enhancing Inference-Time Scaling for Video Diffusion Models},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_1006/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection