X-Dyna: Expressive Dynamic
Human Image Animation

CVPR 2025

1USC     2ByteDance Inc.     3Stanford     4UCLA     5UCSD
*Equally contributed as second authors    

All samples are directly generated by our model without any post-processing.

Abstract

We introduce X-Dyna, a novel zero-shot, diffusion-based pipeline for animating a single human image using facial expressions and body movements derived from a driving video, that generates realistic, context-aware dynamics for both the subject and the surrounding environment. Building on prior approaches centered on human pose control, X-Dyna addresses key factors underlying the loss of dynamic details, enhancing the lifelike qualities of human video animations. At the core of our approach is the Dynamics-Adapter, a lightweight module that effectively integrates reference appearance context into the spatial attentions of the diffusion backbone while preserving the capacity of motion modules in synthesizing fluid and intricate dynamic details. Beyond body pose control, we connect a local control module with our model to capture identity-disentangled facial expressions, facilitating accurate expression transfer for enhanced realism in animated scenes. Together, these components form a unified framework capable of learning physical human motion and natural scene dynamics from a diverse blend of human and scene videos. Comprehensive qualitative and quantitative evaluations demonstrate that X-Dyna outperforms state-of-the-art methods, creating highly lifelike and expressive animations.

Method

MY ALT TEXT

The overview of our method. We leverage a pretrained diffusion UNet backbone for controlled human image animation, enabling expressive dynamic details and precise motion control. Our model achieves remarkable transfer of body poses and facial expressions, as well as highly vivid and detailed dynamics for both the human and the scene.

Results

Comparison to Previous Works

Different Architecture Designs

Effectiveness of Mix data training

Demo from our method (Short videos)

Demo from our method (Long videos)

BibTeX

If you find X-Dyna useful for your research or applications, please cite X-Dyna using this BibTeX:

@misc{chang2025xdynaexpressivedynamichuman,
      title={X-Dyna: Expressive Dynamic Human Image Animation}, 
      author={Di Chang and Hongyi Xu and You Xie and Yipeng Gao and Zhengfei Kuang and Shengqu Cai and Chenxu Zhang and Guoxian Song and Chao Wang and Yichun Shi and Zeyuan Chen and Shijie Zhou and Linjie Luo and Gordon Wetzstein and Mohammad Soleymani},
      year={2025},
      eprint={2501.10021},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2501.10021}, 
}

Acknowledgement

We appreciate the support from Quankai Gao, Qiangeng Xu, Shen Sang, and Tiancheng Zhi for their suggestions and discussions.

IP Statement

The purpose of this work is only for research. The images and videos used in these demos are from public sources. If there is any infringement or offense, please contact us and we will delete it in time.