# DeCo **Repository Path**: monkeycc/DeCo ## Basic Information - **Project Name**: DeCo - **Description**: No description available - **Primary Language**: Python - **License**: Not specified - **Default Branch**: main - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2025-12-03 - **Last Updated**: 2025-12-03 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Zehong Ma^1,3^†, Longhui Wei³^‡^*, Shuai Wang², Shiliang Zhang¹^*, Qi Tian³

¹State Key Laboratory of Multimedia Information Processing,
School of Computer Science, Peking University, ²Nanjing University, ³Huawei Inc.

(† Work was done during internship at Huawei., ‡ Project Leader. * Corresponding author.)

[![hf_paper](https://img.shields.io/badge/🤗-Paper%20In%20HF-red.svg)](https://huggingface.co/papers/2511.19365) [![arXiv](https://img.shields.io/badge/Arxiv-2511.19365-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2511.19365) [![Home Page](https://img.shields.io/badge/Project--blue.svg)](https://zehong-ma.github.io/DeCo/) [![Huggingface](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Online_Demo-green)](https://14467288703cf06a3c.gradio.live) [![github](https://img.shields.io/github/stars/Zehong-Ma/DeCo.svg?style=social)](https://github.com/Zehong-Ma/DeCo/)

Figure 1: Visualization of 512x512 images generated by our DeCo.

## 🫖 Introduction We introduce a novel frequency-decoupled framework DeCo for pixel diffusion, where a lightweight pixel decoder is proposed to model high-frequency signals, freeing the DiT to specialize in low-frequency semantic modeling.

- We achieve **1.62 FID** on ImageNet256x256 Benchmark with DeCo-XL/16. - We achieve **2.22 FID** on ImageNet512x512 Benchmark with DeCo-XL/16. - We achieve **0.86 overall score** on GenEval Benchmark with DeCo-XXL/16. - We achieve **81.4 avergae score** on DPG Benchmark with DeCo-XXL/16. - **If you like our project, please kindly give us a star ⭐ on GitHub.** ##

DCT Spectral Analysis

DCT energy distribution of DiT outputs and predicted pixel velocities. Compared with baseline, DeCo suppresses high-frequency signals in DiT outputs while preserving strong high-frequency energy in pixel velocity, confirming effective frequency decoupling. The distribution is computed on 10K images across all diffusion steps using DCT transform with 8x8 block size. (b) FID comparison between our DeCo and baseline. DeCo reaches 2.57 FID in 400k iterations, 10× faster than the baseline.

## 🧩 Visualizations + Visualization of more images generated by our text-to-image DeCo.

+ Visualization of 256*256 images generated by our class-to-image DeCo.

## 🎉 Checkpoints | Dataset | Epoch | Model | Params | FID | HuggingFace | |---------------|-------|---------------|--------|-------|---------------------------------------| | ImageNet256 | 320 | DeCo-XL/16 | 682M | 1.90 | [🤗](https://huggingface.co/zehongma/DeCo/blob/main/imagenet256_epoch320.ckpt) | | ImageNet256 | 600 | DeCo-XL/16 | 682M | 1.78 | [🤗](https://huggingface.co/zehongma/DeCo/blob/main/imagenet256_epoch600.ckpt) | | ImageNet256 | 800 | DeCo-XL/16 | 682M | 1.62 | [🤗](https://huggingface.co/zehongma/DeCo/blob/main/imagenet256_epoch800.ckpt) | | ImageNet512 | 340 | DeCo-XL/16 | 682M | 2.22 | [🤗](https://huggingface.co/zehongma/DeCo/blob/main/imagenet512_epoch340.ckpt) | | Dataset | Model | Params | GenEval | DPG | HuggingFace | |---------------|---------------|--------|------|------|----------------------------------------------------------| | Text-to-Image | DeCo-XXL/16| 1.1B | 0.86 | 81.4| [🤗](https://huggingface.co/zehongma/DeCo/blob/main/t2i_DeCo.ckpt) | ## 🔥 Online Demos ![](./docs/static/images/demo.jpg) We provide online demos for DeCo-XXL/16(text-to-image) on HuggingFace Spaces. HF spaces: [https://14467288703cf06a3c.gradio.live](https://14467288703cf06a3c.gradio.live) To host the local gradio demo, run the following command: ```bash # for text-to-image applications python app.py --config ./configs_t2i/sft_res512.yaml --ckpt_path=./ckpts/t2i_DeCo.ckpt ``` ## 🤖 Usages In class-to-image(ImageNet) experiments, We use [ADM evaluation suite](https://github.com/openai/guided-diffusion/tree/main/evaluations) to report FID. In text-to-image experiments, we use BLIP3o dataset as training set and utilize GenEval and DPG to collect metrics. + Environments ```bash # for installation (recommend python 3.10) pip install -r requirements.txt ``` + Inference ```bash # for inference python main.py predict -c ./configs_c2i/DeCo_XL.yaml --ckpt_path=XXX.ckpt ``` + Train ```bash # for c2i training # Please modify the ImageNet1k path in the config file before training. python main.py fit -c ./configs_c2i/DeCo_XL.yaml # for 512*512 continuing pretraining python main.py fit -c ./configs_c2i/DeCo_XL_512.yaml --ckpt_path=/path/to/256/checkpoint/at/320/epochs ``` ```bash # for t2i training python main.py fit -c ./configs_t2i/pretraining_res256.yaml python main.py fit -c ./configs_t2i/pretraining_res512.yaml --ckpt_path=./ckpts/pretrain256.ckpt python main.py fit -c ./configs_t2i/sft_res512.yaml --ckpt_path=./ckpts/pretrain512.ckpt ``` ## 💐 Acknowledgement This repository is built based on [PixNerd](https://github.com/MCG-NJU/PixNerd) and [DDT](https://github.com/MCG-NJU/DDT). Thanks for their contributions and [Shuai Wang](https://github.com/WANGSSSSSSS)'s support! ## 📖 Citation If you find DeCo is useful in your research or applications, please consider giving us a star ⭐ and citing it by the following BibTeX entry. ``` @misc{ma2025decofrequencydecoupledpixeldiffusion, title={DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation}, author={Zehong Ma and Longhui Wei and Shuai Wang and Shiliang Zhang and Qi Tian}, year={2025}, eprint={2511.19365}, archivePrefix={arXiv}, primaryClass={cs.CV}, url={https://arxiv.org/abs/2511.19365}, } ```