Identity-Motion Trade-offs in Text-to-Video Generation


Yuval Atzmon (NVIDIA), Rinon Gal (NVIDIA), Yoad Tewel (Tel Aviv University), Yoni Kasten (NVIDIA), Gal Chechik (Bar Ilan University)
The 35th British Machine Vision Conference

Abstract

Text-to-video diffusion models have shown remarkable progress in generating coherent video clips from textual descriptions. However, the interplay between motion, structure, and identity representations in these models remains under-explored. Here, we investigate how self-attention query ($Q$) features simultaneously govern motion, structure, and identity and examine the challenges arising when these representations interact. Our analysis reveals that Q affects not only layout, but that during denoising Q also has a strong effect on subject identity, making it hard to transfer motion without the side-effect of transferring identity. Understanding this dual role enabled us to control query feature injection (Q-injection) and demonstrate two applications: (1) a zero-shot motion transfer method — implemented with VideoCrafter2 and WAN 2.1 — that is 10$\times$ more efficient than existing approaches, and (2) a training-free technique for consistent multi-shot video generation, where characters maintain identity across multiple video shots while Q-injection enhances motion fidelity. Project page: https://research.nvidia.com/labs/par/MotionByQueries/

Citation

@inproceedings{Atzmon_2025_BMVC,
author    = {Yuval Atzmon and Rinon Gal and Yoad Tewel and Yoni Kasten and Gal Chechik},
title     = {Identity-Motion Trade-offs in Text-to-Video Generation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_159/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection