CVPR 2026
Yiming Wang1,2, Qihang Zhang3,2*, Shengqu Cai2*, Tong Wu2†, Jan Ackermann2†, Zhengfei Kuang2†, Yang Zheng2†, Frano Rajič1†, Siyu Tang1, Gordon Wetzstein2
1ETH Zurich 2Stanford University 3CUHK
*,† Equal contribution
Given a single input video, BulletTime generates new videos with decoupled control over world time and camera pose — enabling bullet-time effects such as moving the camera while freezing or slowing scene dynamics.
teaser_bullettime.mp4
This project builds on CogVideoX1.5-5B. For environment setup and issues with upstream pretrained weights (e.g., VAE), please refer to the official CogVideoX repository.
We also provide a conda environment tested with Python 3.10 and PyTorch 2.5.1:
conda env create -f environment.yaml
conda activate bullettimeGiven a single input video, BulletTime synthesizes new videos with user-specified control over camera pose and world time.
Download the pretrained BulletTime checkpoint from Hugging Face:
- Pretrained checkpoints:
19reborn/bulletTime_ckpt
We use example videos from ReCamMaster as reference inputs. Organize your data directory as follows:
your_data/
├── metadata.csv # columns: file_name, text
└── videos/
└── {video_name}.mp4
metadata.csv should list each video path (relative to the data root) and its text prompt. See the released dataset for a complete example. For videos without captions, you can generate prompts automatically from the input videos using video understanding models (e.g., MiniCPM-V).
BulletTime requires separate control signals for camera pose and world time. Example files are provided in example_scripts/example_data/; the expected formats are:
| Control | Format | Example |
|---|---|---|
| Camera trajectory | JSON with per-frame c2w matrices | example_scripts/example_data/camera.json |
| Camera intrinsics | JSON with focal length and principal point | example_scripts/example_data/intrinsic.json |
| World time trajectory | .npy or .txt, per-frame world time of shape [T, 1] |
example_scripts/example_data/slow_motion.npy |
You can also design custom trajectories with the interactive tools below.
We also provide browser-based tools for designing control signals:
example_scripts/camera_control.html— interactive camera trajectory editor (export camera JSON)example_scripts/time_control.html— time-curve tuner (export timing.npy)
Open either file locally in a browser to visualize and export trajectories.
Edit the user-editable paths at the top of example_scripts/eval.sh:
DATA_ROOT: directory containingmetadata.csvand input videosCHECKPOINT: local path to the downloaded BulletTime checkpointOUTPUT_DIR: directory to save generated videosGPU_IDS: GPU id(s) to use
The script also sets the control-signal paths below; update them if needed:
--validation_camera_dir: camera JSON file or directory of.jsonfiles (default:example_scripts/example_data/camera.json)--validation_timing_dir: timing.npy/.txtfile or directory (default:example_scripts/example_data/slow_motion.npy)
Then launch from the repo root:
bash example_scripts/eval.shKey flags (already configured in example_scripts/eval.sh):
--do_validation true— run generation--only_validation true— skip training and run inference only--data_flag "real_world"/--validation_data_flag "real_world"--num_inference_steps 50— diffusion sampling steps--train_resolution "81x384x640"— output resolution asframes x height x width
The model generates one output video for each (camera, timing) pair. Results are saved to {output_dir}/my_validate/.
Download the training dataset from Hugging Face:
- Training dataset:
19reborn/bulletTime_dataset
In example_scripts/train.sh, set:
DATA_ROOT: path to training data (use--data_flag "simulator"for the released synthetic dataset)VALIDATION_DIR: path to validation data (can be the same asDATA_ROOT, or real-world videos prepared as in the Inference section above)OUTPUT_DIR: directory to save checkpoints and logs
Make sure
DATA_ROOTandVALIDATION_DIRpoint to valid local paths.
Edit the distributed settings at the top of example_scripts/train.sh:
NUM_PROCESSES: number of processes (typically equals the number of GPUs)GPU_IDS: comma-separated GPU ids (e.g.,"0,1")
For DeepSpeed settings, edit finetune/accelerate_config.yaml.
Then launch from the repo root:
bash example_scripts/train.shWe also provide the source code to generate our synthetic dataset with disentangled 4D controls: 19reborn/BulletTime_dataSimulator.
In this release, camera and time AdaLN are applied only before attention; the post-attention AdaLN blocks are omitted. We found this design achieves comparable performance with roughly half added parameters.
We thank the authors of CogVideoX, ReCamMaster for their open-source contributions.
If you find this repository useful for your research, please cite:
@inproceedings{wang2026bullettime,
title = {Bullettime: Decoupled control of time and camera pose for video generation},
author = {Wang, Yiming and Zhang, Qihang and Cai, Shengqu and Wu, Tong and Ackermann, Jan and Kuang, Zhengfei and Zheng, Yang and Raji{\v{c}}, Frano and Tang, Siyu and Wetzstein, Gordon},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages = {18319--18330},
year = {2026}
}