Llama Learns to Direct: DirectorLLM for Human-Centric Video Generation


Kunpeng Song (The State University of New Jersey), Tingbo Hou (GenAI at Meta), Zecheng He (GenAI at Meta), Haoyu Ma (GenAI at Meta), Jialiang Wang (GenAI at Meta), Animesh Sinha (GenAI at Meta), Sam Tsai (GenAI at Meta), Yaqiao Luo (GenAI at Meta), Xiaoliang Dai (GenAI at Meta), Li Chen (GenAI at Meta), Xide Xia (GenAI at Meta), Peizhao Zhang (GenAI at Meta), Peter Vajda (GenAI at Meta), Ahmed M. Elgammal (The State University of New Jersey), Felix Juefei-Xu (GenAI at Meta)
The 35th British Machine Vision Conference

Abstract

In this paper, we introduce DirectorLLM, a novel video generation model that employs a large language model (LLM) as the ``director'' to simulate human poses within videos. To enhance the authenticity of human motions in text-to-video models, we extend the LLM from a text generator to a video director and human motion simulator. We train the DirectorLLM to generate detailed human poses, to guide video generation, offloading the simulation of human motion from the video generator to an LLM, effectively creating informative outlines for human-centric scenes. These signals are used as conditions by the video renderer, facilitating more realistic and prompt-following video generation. Experiments on automatic evaluation benchmarks and human evaluations show that our model outperforms existing ones in generating videos with higher human motion fidelity, improved prompt faithfulness, and enhanced rendered subject naturalness.

Citation

@inproceedings{Song_2025_BMVC,
author    = {Kunpeng Song and Tingbo Hou and Zecheng He and Haoyu Ma and Jialiang Wang and Animesh Sinha and Sam Tsai and Yaqiao Luo and Xiaoliang Dai and Li Chen and Xide Xia and Peizhao Zhang and Peter Vajda and Ahmed M. Elgammal and Felix Juefei-Xu},
title     = {Llama Learns to Direct: DirectorLLM for Human-Centric Video Generation},
booktitle = {36th British Machine Vision Conference 2025, {BMVC} 2025, Sheffield, UK, November 24-27, 2025},
publisher = {BMVA},
year      = {2025},
url       = {https://bmva-archive.org.uk/bmvc/2025/assets/papers/Paper_896/paper.pdf}
}


Copyright © 2025 The British Machine Vision Association and Society for Pattern Recognition
The British Machine Vision Conference is organised by The British Machine Vision Association and Society for Pattern Recognition. The Association is a Company limited by guarantee, No.2543446, and a non-profit-making body, registered in England and Wales as Charity No.1002307 (Registered Office: Dept. of Computer Science, Durham University, South Road, Durham, DH1 3LE, UK).

Imprint | Data Protection