Skip to content

Yifever20002/CompoSIA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image [ECCV2026] CompoSIA: Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation

Yifan Zhan1*, Zhengqing Chen2*‡, Qingjie Wang2*, Zhuo He3, Muyao Niu1, Xiaoyang Guo2
Wei Yin2, Weiqiang Ren2, Qian Zhang2, Yinqiang Zheng1†

1The University of Tokyo    2Horizon Robotics    3University of Glasgow

*Equal Contribution    Project Lead    Corresponding Author

The University of Tokyo        Horizon Robotics


arXiv Project Page

Image


📌 Release Status

  • Paper
  • Inference code
  • Model weights
  • Training code

🌍 Overview

CompoSIA is a compositional driving video simulator designed for fine-grained adversarial scenario generation through disentangled control of:

  • Structure 🚗: object layout and trajectory placement
  • Identity 🎨: appearance editing from a single reference image
  • Action 🎮: ego-motion and controllable traffic dynamics

✨ Key Features

  • Disentangled structure, identity, and action control
  • Pose-agnostic identity injection
  • Hierarchical dual-branch action conditioning
  • Scenario generation for planner stress testing

🛠️ Installation

Create a Python environment and install the project dependencies:

conda create -n composia python=3.10 -y
conda activate composia

cd CompoSIA
pip install -r requirements.txt

requirements.txt installs PyTorch 2.7.1 with CUDA 12.8 wheels by default. If your CUDA driver stack is different, install the matching PyTorch build first, then install the remaining dependencies.

The default evaluation path does not require the optional metrics packages. Install them only if you enable the corresponding metric:

# Required only when validation_kwargs.eval_metrics contains "met3r"
pip install git+https://github.com/mohammadasim98/met3r

# Required only when validation_kwargs.eval_metrics contains VBench-related evaluation
pip install vbench

📦 Model Weights

CompoSIA uses the public Wan2.1 T2V 1.3B checkpoint as the base model and the released CompoSIA transformer/VAE weights.

Expected layout:

models/
├── Wan2.1-T2V-1.3B/
│   ├── config.json
│   ├── diffusion_pytorch_model.safetensors
│   ├── models_t5_umt5-xxl-enc-bf16.pth
│   ├── Wan2.1_VAE.pth
│   └── google/
│       └── umt5-xxl/
│           ├── special_tokens_map.json
│           ├── spiece.model
│           ├── tokenizer.json
│           └── tokenizer_config.json
├── composia/
│   └── composia-transformer.pt
└── vae/
    └── composia-vae.pkl

Download links:

🗂️ nuScenes Data

The released metadata files are hosted in SUDOKISUI/CompoSIA:

mkdir -p nuScenes-metadata-full/nuscenes_mmdet3d-12Hz

huggingface-cli download SUDOKISUI/CompoSIA \
  nuscenes_interp_12Hz_infos_val_with_bid.pkl \
  --local-dir nuScenes-metadata-full/nuscenes_mmdet3d-12Hz

For images, download nuScenes from the official nuScenes website and unpack it so the sample images are available under:

nuScenes/origin/
└── samples/
    └── CAM_FRONT/
        └── ...

The default config reads:

samples_path: "./nuScenes/origin"
ann_path: "./nuScenes-metadata-full/nuscenes_mmdet3d-12Hz/nuscenes_interp_12Hz_infos_val_with_bid.pkl"

If your nuScenes or metadata files are stored elsewhere, update these two paths in config/wan_unified.yaml.

🚀 Evaluation

Run the default evaluation script after preparing weights and data:

CUDA_VISIBLE_DEVICES=0 bash run_eval.sh

The script uses:

MODEL_NAME=${MODEL_NAME:-models/Wan2.1-T2V-1.3B}
EVAL_CKPT=${EVAL_CKPT:-models/composia/composia-transformer.pt}
VAE_PATH=${VAE_PATH:-models/vae/composia-vae.pkl}

For the Hugging Face release filenames, run:

CUDA_VISIBLE_DEVICES=0 \
MODEL_NAME=models/Wan2.1-T2V-1.3B \
EVAL_CKPT=models/composia/composia-transformer.pt \
VAE_PATH=models/vae/composia-vae.pkl \
bash run_eval.sh

Generated videos and logs are written under logs/test/validation_res_final/.

The evaluation modes are configured in config/composia_unified_i2v_eval.yaml. By default, this file enables several action, bbox, and identity-editing modes. To run a smaller smoke test, reduce validation_kwargs.max_validation_samples or keep only one entry under validation_kwargs.val_modes.


🙏 Acknowledgements

CompoSIA builds on the open-source video generation and autonomous driving research ecosystem. Our base generative model is built upon Wan2.1, and our implementation benefits from the VideoX-Fun codebase.

We also thank NVIDIA Cosmos for inspiring components of our projection pipeline, and the developers of Hugging Face Diffusers, Accelerate, and Transformers for their model and inference tooling.

Our evaluation and data processing are built around the nuScenes dataset. We also acknowledge MEt3R and VBench for open-source video evaluation tools.


🧪 Citation

If you find our work useful, please cite it as

@article{zhan2026composing,
  title={Composing Driving Worlds through Disentangled Control for Adversarial Scenario Generation},
  author={Zhan, Yifan and Chen, Zhengqing and Wang, Qingjie and He, Zhuo and Niu, Muyao and Guo, Xiaoyang and Yin, Wei and Ren, Weiqiang and Zhang, Qian and Zheng, Yinqiang},
  journal={arXiv preprint arXiv:2603.12864},
  year={2026}
}

About

(ECCV2026) CompoSIA is a powerful simulator for synthesizing rare driving scenes

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors