[ECCV2026] CompoSIA: Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

Yifan Zhan^1*, Zhengqing Chen^2*‡, Qingjie Wang^2*, Zhuo He³, Muyao Niu¹, Xiaoyang Guo²
Wei Yin², Weiqiang Ren², Qian Zhang², Yinqiang Zheng^1†

¹The University of Tokyo ²Horizon Robotics ³University of Glasgow

^*Equal Contribution ^‡Project Lead ^†Corresponding Author

📌 Release Status

Paper
Inference code
Model weights
Training code

🌍 Overview

CompoSIA is a compositional driving video simulator designed for fine-grained adversarial scenario generation through disentangled control of:

Structure 🚗: object layout and trajectory placement
Identity 🎨: appearance editing from a single reference image
Action 🎮: ego-motion and controllable traffic dynamics

✨ Key Features

Disentangled structure, identity, and action control
Pose-agnostic identity injection
Hierarchical dual-branch action conditioning
Scenario generation for planner stress testing

🛠️ Installation

Create a Python environment and install the project dependencies:

conda create -n composia python=3.10 -y
conda activate composia

cd CompoSIA
pip install -r requirements.txt

requirements.txt installs PyTorch 2.7.1 with CUDA 12.8 wheels by default. If your CUDA driver stack is different, install the matching PyTorch build first, then install the remaining dependencies.

The default evaluation path does not require the optional metrics packages. Install them only if you enable the corresponding metric:

# Required only when validation_kwargs.eval_metrics contains "met3r"
pip install git+https://github.com/mohammadasim98/met3r

# Required only when validation_kwargs.eval_metrics contains VBench-related evaluation
pip install vbench

📦 Model Weights

CompoSIA uses the public Wan2.1 T2V 1.3B checkpoint as the base model and the released CompoSIA transformer/VAE weights.

Expected layout:

models/
├── Wan2.1-T2V-1.3B/
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
│   ├── models_t5_umt5-xxl-enc-bf16.pth
│   ├── Wan2.1_VAE.pth
│   └── google/
│       └── umt5-xxl/
│           ├── special_tokens_map.json
│           ├── spiece.model
│           ├── tokenizer.json
│           └── tokenizer_config.json
├── composia/
│   └── composia-transformer.pt
└── vae/
    └── composia-vae.pkl

Download links:

Base model: Wan-AI/Wan2.1-T2V-1.3B
CompoSIA weights: SUDOKISUI/CompoSIA

🗂️ nuScenes Data

The released metadata files are hosted in SUDOKISUI/CompoSIA:

mkdir -p nuScenes-metadata-full/nuscenes_mmdet3d-12Hz

huggingface-cli download SUDOKISUI/CompoSIA \
  nuscenes_interp_12Hz_infos_val_with_bid.pkl \
  --local-dir nuScenes-metadata-full/nuscenes_mmdet3d-12Hz

For images, download nuScenes from the official nuScenes website and unpack it so the sample images are available under:

nuScenes/origin/
└── samples/
    └── CAM_FRONT/
        └── ...

The default config reads:

samples_path: "./nuScenes/origin"
ann_path: "./nuScenes-metadata-full/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val_with_bid.pkl"

If your nuScenes or metadata files are stored elsewhere, update these two paths in config/wan_unified.yaml.

🚀 Evaluation

Run the default evaluation script after preparing weights and data:

CUDA_VISIBLE_DEVICES=0 bash run_eval.sh

The script uses:

MODEL_NAME=${MODEL_NAME:-models/Wan2.1-T2V-1.3B}
EVAL_CKPT=${EVAL_CKPT:-models/composia/composia-transformer.pt}
VAE_PATH=${VAE_PATH:-models/vae/composia-vae.pkl}

For the Hugging Face release filenames, run:

CUDA_VISIBLE_DEVICES=0 \
MODEL_NAME=models/Wan2.1-T2V-1.3B \
EVAL_CKPT=models/composia/composia-transformer.pt \
VAE_PATH=models/vae/composia-vae.pkl \
bash run_eval.sh

Generated videos and logs are written under logs/test/validation_res_final/.

The evaluation modes are configured in config/composia_unified_i2v_eval.yaml. By default, this file enables several action, bbox, and identity-editing modes. To run a smaller smoke test, reduce validation_kwargs.max_validation_samples or keep only one entry under validation_kwargs.val_modes.

🙏 Acknowledgements

CompoSIA builds on the open-source video generation and autonomous driving research ecosystem. Our base generative model is built upon Wan2.1, and our implementation benefits from the VideoX-Fun codebase.

We also thank NVIDIA Cosmos for inspiring components of our projection pipeline, and the developers of Hugging Face Diffusers, Accelerate, and Transformers for their model and inference tooling.

Our evaluation and data processing are built around the nuScenes dataset. We also acknowledge MEt3R and VBench for open-source video evaluation tools.

🧪 Citation

If you find our work useful, please cite it as

@article{zhan2026composing,
  title={Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation},
  author={Zhan, Yifan and Chen, Zhengqing and Wang, Qingjie and He, Zhuo and Niu, Muyao and Guo, Xiaoyang and Yin, Wei and Ren, Weiqiang and Zhang, Qian and Zheng, Yinqiang},
  journal={arXiv preprint arXiv:2603.12864},
  year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
assets		assets
composia		composia
config		config
images		images
videox_fun		videox_fun
LICENSE		LICENSE
README.md		README.md
model.md		model.md
requirements.txt		requirements.txt
run_eval.sh		run_eval.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

[ECCV2026] CompoSIA: Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

📌 Release Status

🌍 Overview

✨ Key Features

🛠️ Installation

📦 Model Weights

🗂️ nuScenes Data

🚀 Evaluation

🙏 Acknowledgements

🧪 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

[ECCV2026] CompoSIA: Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

📌 Release Status

🌍 Overview

✨ Key Features

🛠️ Installation

📦 Model Weights

🗂️ nuScenes Data

🚀 Evaluation

🙏 Acknowledgements

🧪 Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages