# AR-Diffusion **Repository Path**: sheldongchen/AR-Diffusion ## Basic Information - **Project Name**: AR-Diffusion - **Description**: No description available - **Primary Language**: Unknown - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-09-11 - **Last Updated**: 2025-09-11 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion ![image](https://code.byted.org/sunmingzhen.triz/3d_tokenizer/blob/release/fig1.png) ![image](https://code.byted.org/sunmingzhen.triz/3d_tokenizer/blob/release/fig2.png) # Environment python 3.9 ``` pip install -r requirements.txt ``` # Infer Scripts Ensure all model checkpoints are put in the `experiments` folder. 1. Testing the reconstruction performance of VAE ``` bash shell_scripts/stage1_vae_scripts/face_vae_infer.sh ``` 2. Testing the generation performance of AR Diffusion ``` DATA_FILE_NAME=facelatent MODEL_FILE_NAME=diff_ardiff_vtattn_x0pred_nvae_midt VALIDDATA_FILE=facevidflatten TRAINDATA_FILE=$DATA_FILE_NAME bash shell_scripts/stage2_ardiff_scripts/face_gen/infer_base_script.sh $TRAINDATA_FILE $VALIDDATA_FILE $MODEL_FILE_NAME 2.0 16 5 ``` # Train Scripts 1. Training VAE on video frames. 【NOTE: Please download the tokenizer_titok_l32.bin file from https://huggingface.co/TrizZZZ/ar_diffusion and put it in the root folder before training the VAE.】 ``` bash shell_scripts/stage1_vae_scripts/sky_vae_train.sh ``` 2. Finetuning VAE on videos with temporal causal attention. ``` bash shell_scripts/stage1_vae_scripts/sky_vae_train_ftwt.sh ``` 3. Extract video latents using VAEs for speeding up the training of AR Diffusion model. ``` Open the line of 'bash shell_scripts/base_vae/infer_savelatent_script.sh' in shell_scripts/stage1_vae_scripts/sky_vae_infer.sh ``` 4. Training AR Diffusion model. ``` DATA_FILE_NAME=skyvidlatent MODEL_FILE_NAME=diff_ardiff_vtattn_x0pred_nvae_midt bash shell_scripts/stage2_ardiff_scripts/sky_gen/train_base_script.sh ${DATA_FILE_NAME} ${MODEL_FILE_NAME} ``` # Checkpoints Checkpoints of VAE and AR-Diffusion models on the Sky-Timelapse, TaiChi-HD, UCF101, and Faceforensics datasets have been uploaded to the Huggingface hub: https://huggingface.co/TrizZZZ/ar_diffusion # More Samples More video samples can be viewed in: https://anonymouss765.github.io/AR-Diffusion # Citation TODO ``` @article{ardiff, title={AR-Diffusion: Asynchronous Video Generation with Auto-Regressive Diffusion}, author={Sun, Mingzhen and Wang, Weining and Li, Gen and Liu, Jiawei and Sun, Jiahui and Feng, Wanquan and Lao, shanshan and Zhou, SiYu and He, Qian and Liu, Jing}, booktitle={Proceedings of the IEEE/CVF conference on computer vision and pattern recognition}, year={2025} } ```