##############################################################################
# #
# ____ _ __ __ _ .-----------. #
# | _ \(_)/ _|/ _|_ _ ___(_) ___ _ __ | | #
# | | | | | |_| |_| | | / __| |/ _ \| '_ \ | ░▒▓█▓▒░▒▓ | #
# | |_| | | _| _| |_| \__ \ | (_) | | | | | ▒▓█████▓▒ | #
# |____/|_|_| |_| \__,_|___/_|\___/|_| |_| | ▓███████▓ | #
# | ↓ | #
# ____ _ | █████████ | #
# | __ ) ___ _ __ ___| |__ | ▓███████▓ | #
# | _ \ / _ \ '_ \ / __| '_ \ | ▒▓█████▓▒ | #
# | |_) | __/ | | | (__| | | | | | #
# |____/ \___|_| |_|\___|_| |_| '-----------' #
# #
# Because ImageNet evaluation alone is no longer enough! #
# #
##############################################################################
We have released a very preliminary technical report for DiffusionBench v0.1! We plan to update it heavily going forward, add new contributors/authors, and work with the community to make DiffusionBench more robust and better. Please join us!
Please refer to
docs/contributors.mdanddocs/contributing.mdfor further details.
This repo contains the unified codebase for DiffusionBench. It supports training and evaluation across different generation tasks (ImageNet, T2I, ...) through a single interface. Please see the sections below for the detailed structure. Come join us!
Text-to-image samples at 256×256 from models trained for 200K iterations using DiffusionBench.
# install uv project manager (if you don't already have it)
curl -LsSf https://astral.sh/uv/install.sh | sh
# install dependencies
uv sync
# prepare data
uv run python scripts/prepare.py --data {all,imagenet,t2i,eval}
# download pretrained models
uv run hf download diffusion-bench/diffusion-bench --local-dir pretrained_models --exclude .gitattributesReproduction flow: Stage 1 → Stage 2. Set these environment variables first (used for the output directory and W&B logging):
export EXPERIMENT_NAME=<run-name>
export ENTITY=<wandb-entity>
export PROJECT=<wandb-project>
export WANDB_KEY=<key>Stage 1. Train the RAE tokenizer:
uv run torchrun --standalone --nproc_per_node=8 \
src/train_stage1.py \
--config [STAGE1_CONFIG_PATH] \
--results-dir results/stage1 --precision bf16 --compile --wandbStage 2. Train the diffusion model on VAE/RAE/Pixel space:
uv run torchrun --standalone --nproc_per_node=8 \
src/train.py \
--config [STAGE2_CONFIG_PATH] \
--results-dir results/stage2 --precision bf16 --compile --wandbStage 2 training configs run online evaluation during training (the eval: block). For standalone evaluation of a released checkpoint, use the sampling/ configs — each embeds stage_2.ckpt (pointing into pretrained_models/) and the eval-time guidance, so the weights load automatically:
export EXPERIMENT_NAME=<run-name>
# stage 1 reconstruction (rFID/PSNR/SSIM/LPIPS)
uv run torchrun --nproc_per_node=8 src/offline_eval_stage1.py --config [STAGE1_CONFIG_PATH]
# stage 2 generation (FID/IS, GenEval/DPGBench/...)
uv run torchrun --nproc_per_node=8 src/offline_eval.py --config [STAGE2_CONFIG_PATH]configs/
├── stage1/
└── stage2/
├── training/
│ ├── imagenet/
│ └── t2i/
└── sampling/
├── imagenet/
└── t2i/
Stage 2 spans VAE (11), RAE (6), REG (4), and Pixel (3) families, identical across ImageNet and T2I. Swap any config between tasks with a single path change. The sampling/ set mirrors training/ but adds the trained checkpoint and eval-time guidance, so it runs offline eval directly.
For ImageNet, pick the CFG-off baseline ([STAGE2_CONFIG_PATH].yaml) or the per-model best-CFG variant ([STAGE2_CONFIG_PATH]-cfg<scale>-t0.0-0.9.yaml).
| Category | Methods |
|---|---|
| Latent Space | Pixel Space RAE (30+ representation encoders): DINOv2 SigLIP2 WebSSL PE LangPE and more RAEv2 (30+ representation encoders): DINOv2 SigLIP2 WebSSL PE LangPE etc VAE (10+ VAEs): FLUX.2 FLUX.1 SD3.5 VA-VAE E2E-VAE and more |
| Output Prediction | x-prediction v-prediction |
| Transport | Rectified-Flow MeanFlow Improved-MeanFlow Pixel-MeanFlow Drifting |
| Loss | Flow Matching REPA iREPA |
| Architecture | LightningDiT JiT DDT |
| Tasks | ImageNet: class-conditional generation T2I: text-to-image generation |
| Evaluation | ImageNet: FID IS T2I: GenEval DPGBench GenAIBench VQAScore |
| Training Backend | DDP FSDP [TODO] |
| Status | Details | |
|---|---|---|
| Coding Agents | Yes | Agent-compatible. See skills/ for setup and workflow skills. |
| AutoResearch | [TODO] | AutoResearch integration is planned (not yet available). |
We welcome contributions! Please refer to docs/contributors.md and docs/contributing.md for further details.
The codebase is built upon some amazing projects:
We thank the authors for making their work publicly available.