# DiffSHEG **Repository Path**: sheldongchen/DiffSHEG ## Basic Information - **Project Name**: DiffSHEG - **Description**: No description available - **Primary Language**: Unknown - **License**: BSD-3-Clause - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-10 - **Last Updated**: 2025-09-10 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README
## DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation (CVPR 2024 Official Repo) [Junming Chen](https://jeremycjm.github.io)†1,2, [Yunfei Liu](http://liuyunfei.net/)2, [Jianan Wang](https://scholar.google.com/citations?user=mt5mvZ8AAAAJ&hl=en&inst=1381320739207392350)2, [Ailing Zeng](https://ailingzeng.site/)2, [Yu Li](https://yu-li.github.io/)*2, [Qifeng Chen](https://cqf.io)*1

1HKUST   2International Digital Economy Academy (IDEA)   
*Corresponding authors   Work done during an internship at IDEA

#### [Project Page](https://jeremycjm.github.io/proj/DiffSHEG/) · [Paper](https://arxiv.org/abs/2401.04747) · [Video](https://www.youtube.com/watch?v=HFaSd5do-zI)

![DiffSEHG Teaser](./assets/teaser_for_demo_cvpr.png) ## Environment We have tested on Ubuntu 18.04 and 20.04. ``` cd assets ``` - Option 1: conda install ``` conda env create -f environment.yml conda activate diffsheg ``` - Option 2: pip install ``` conda create -n "diffsheg" python=3.9 conda activate diffsheg pip install -r requirements.txt pip install torch==1.13.1+cu117 torchvision==0.14.1+cu117 torchaudio==0.13.1 --extra-index-url https://download.pytorch.org/whl/cu117 ``` - Untar data.tar.gz for data statistics ``` tar zxvf data.tar.gz mv data ../ ``` ## Checkpoints [Google Drive](https://drive.google.com/file/d/1JPoMOcGDrvkFt7QbN6sEyYAPOOWkVN0h/view) ## Inference on a Custom Audio First specify the '--test_audio_path' argument to your test audio path in the following mentioned bash files. Note that the audio should be a .wav file. - Use model trained on BEAT dataset: ``` bash inference_custom_audio_beat.sh ``` - Use model trained on SHOW dataset: ``` bash inference_custom_audio_talkshow.sh ``` ## Training
Train on BEAT dataset ``` PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4 python -u runner.py \ --dataset_name beat \ --name beat_diffsheg \ --batch_size 2500 \ --num_epochs 1000 \ --save_every_e 20 \ --eval_every_e 40 \ --n_poses 34 \ --ddim \ --multiprocessing-distributed \ --dist-url 'tcp://127.0.0.1:6666' ```
Train on SHOW dataset ``` PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4 python -u runner.py \ --dataset_name talkshow \ --name talkshow_diffsheg \ --batch_size 950 \ --num_epochs 4000 \ --save_every_e 20 \ --eval_every_e 40 \ --n_poses 88 \ --classifier_free \ --multiprocessing-distributed \ --dist-url 'tcp://127.0.0.1:6667' \ --ddim \ --max_eval_samples 200 ```
## Testing
Test on BEAT dataset ``` PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -u runner.py \ --dataset_name talkshow \ --name talkshow_GesExpr_unify_addHubert_encodeHubert_mdlpIncludeX_condRes_LN_ClsFree \ --PE pe_sinu \ --n_poses 88 \ --multiprocessing-distributed \ --dist-url 'tcp://127.0.0.1:8889' \ --classifier_free \ --cond_scale 1.25 \ --ckpt ckpt_e2599.tar \ --mode test_arbitrary_len \ --ddim \ --timestep_respacing ddim25 \ --overlap_len 10 ```
Test on SHOW dataset ``` PYTHONPATH="$(dirname $0)/..":$PYTHONPATH \ OMP_NUM_THREADS=10 CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python -u runner.py \ --dataset_name talkshow \ --name talkshow_GesExpr_unify_addHubert_encodeHubert_mdlpIncludeX_condRes_LN_ClsFree \ --PE pe_sinu \ --n_poses 88 \ --multiprocessing-distributed \ --dist-url 'tcp://127.0.0.1:8889' \ --classifier_free \ --cond_scale 1.25 \ --ckpt ckpt_e2599.tar \ --mode test_arbitrary_len \ --ddim \ --timestep_respacing ddim25 \ --overlap_len 10 ```
## Visualization After running under the test or test-custom-audio mode, the Gesture and Expression results will be saved in the ./results directory. ### BEAT 1. Open ```assets/beat_visualize.blend``` with latest Blender on your local computer. 2. Specify the audio, BVH (for gesture), JSON (for expression), and video saving path in the transcript in Blender. 3. (Optional) Click Window --> Toggle System Console to check the visulization progress. 4. Run the script in Blender. ### SHOW Please refer the the [TalkSHOW](https://github.com/yhw-yhw/TalkSHOW) code for the visualization of our generated motion. ## Acknowledgement Our implementation is partially based on [BEAT](https://github.com/PantoMatrix/BEAT), [TalkSHOW](https://github.com/yhw-yhw/TalkSHOW), and [MotionDiffuse](https://github.com/mingyuan-zhang/MotionDiffuse/tree/main). ## Citation If you use our code or find this repo useful, please consider cite our paper: ``` @inproceedings{chen2024diffsheg, title = {DiffSHEG: A Diffusion-Based Approach for Real-Time Speech-driven Holistic 3D Expression and Gesture Generation}, author = {Chen, Junming and Liu, Yunfei and Wang, Jianan and Zeng, Ailing and Li, Yu and Chen, Qifeng}, booktitle = {CVPR}, year = {2024} } ```