S$^2$V2V: Training-Free Video-to-Video with Sparse Points and Motion Guidance


Xinyu Zhang (University of Auckland), Zicheng Duan (The University of Adelaide), Dong Gong (University of New South Wales), Lingqiao Liu (University of Adelaide)
The 35th British Machine Vision Conference

Abstract

Most existing training-free, motion-controllable video-to-video generation methods typically operate on attention maps or noise estimation for motion guidance. Despite development, they often struggle to maintain consistent temporal coherence across frames and to accurately follow the guided motion. In this paper, we propose S$^2$V2V, a novel training-free paradigm with double sparse guidance, to address the challenge of generating temporally consistent videos with motion guidance. Specifically, we introduce two types of explicit guidance from the reference videos, i.e., i) sparse point guidance, extracted from inter-frame motion correlation patterns at selected sparse key points, and ii) sparsity-based motion guidance, obtained by refining motion correlation patterns in a local region around those sparse points. To further improve temporal consistency, we incorporate a sparse motion consistency loss during the denoising process, encouraging the generated motion representations to align with the reference guidance in the sparse regions. The gradient of this loss in latent space is then used to steer the generation toward precise motion control. Extensive experiments demonstrate that S$^2$V2V sets a new standard for efficient, temporally coherent video generation in various scenarios, particularly in local object motion and box-trajectory-guided motion.

Citation

@inproceedings{Zhang_2025_BMVC,
author    = {Xinyu Zhang and Zicheng Duan and Dong Gong and Lingqiao Liu},
title     = {S$^2$V2V: Training-Free Video-to-Video with Sparse Points and Motion Guidance},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_257/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection