BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

CVPR 2026

Yiming Wang^1,2, Qihang Zhang^3,2*, Shengqu Cai^2*, Tong Wu^2†, Jan Ackermann^2†, Zhengfei Kuang^2†, Yang Zheng^2†, Frano Rajič^1†, Siyu Tang¹, Gordon Wetzstein²

¹ETH Zurich ²Stanford University ³CUHK

*,† Equal contribution

Given a single input video, BulletTime generates new videos with decoupled control over world time and camera pose — enabling bullet-time effects such as moving the camera while freezing or slowing scene dynamics.

teaser_bullettime.mp4

Setup

This project builds on CogVideoX1.5-5B. For environment setup and issues with upstream pretrained weights (e.g., VAE), please refer to the official CogVideoX repository.

We also provide a conda environment tested with Python 3.10 and PyTorch 2.5.1:

conda env create -f environment.yaml
conda activate bullettime

Inference

Given a single input video, BulletTime synthesizes new videos with user-specified control over camera pose and world time.

1. Prepare checkpoints

Download the pretrained BulletTime checkpoint from Hugging Face:

Pretrained checkpoints: 19reborn/bulletTime_ckpt

2. Prepare input video

We use example videos from ReCamMaster as reference inputs. Organize your data directory as follows:

your_data/
├── metadata.csv          # columns: file_name, text
└── videos/
      └── {video_name}.mp4

metadata.csv should list each video path (relative to the data root) and its text prompt. See the released dataset for a complete example. For videos without captions, you can generate prompts automatically from the input videos using video understanding models (e.g., MiniCPM-V).

3. Prepare camera and time controls

BulletTime requires separate control signals for camera pose and world time. Example files are provided in example_scripts/example_data/; the expected formats are:

Control	Format	Example
Camera trajectory	JSON with per-frame c2w matrices	`example_scripts/example_data/camera.json`
Camera intrinsics	JSON with focal length and principal point	`example_scripts/example_data/intrinsic.json`
World time trajectory	`.npy` or `.txt`, per-frame world time of shape `[T, 1]`	`example_scripts/example_data/slow_motion.npy`

You can also design custom trajectories with the interactive tools below.

Interactive control tools

We also provide browser-based tools for designing control signals:

example_scripts/camera_control.html — interactive camera trajectory editor (export camera JSON)
example_scripts/time_control.html — time-curve tuner (export timing .npy)

Open either file locally in a browser to visualize and export trajectories.

4. Run inference

Edit the user-editable paths at the top of example_scripts/eval.sh:

DATA_ROOT: directory containing metadata.csv and input videos
CHECKPOINT: local path to the downloaded BulletTime checkpoint
OUTPUT_DIR: directory to save generated videos
GPU_IDS: GPU id(s) to use

The script also sets the control-signal paths below; update them if needed:

--validation_camera_dir: camera JSON file or directory of .json files (default: example_scripts/example_data/camera.json)
--validation_timing_dir: timing .npy / .txt file or directory (default: example_scripts/example_data/slow_motion.npy)

Then launch from the repo root:

bash example_scripts/eval.sh

Key flags (already configured in example_scripts/eval.sh):

--do_validation true — run generation
--only_validation true — skip training and run inference only
--data_flag "real_world" / --validation_data_flag "real_world"
--num_inference_steps 50 — diffusion sampling steps
--train_resolution "81x384x640" — output resolution as frames x height x width

The model generates one output video for each (camera, timing) pair. Results are saved to {output_dir}/my_validate/.

Training

1. Training Dataset

Download the training dataset from Hugging Face:

Training dataset: 19reborn/bulletTime_dataset

2. Configure data and output paths

In example_scripts/train.sh, set:

DATA_ROOT: path to training data (use --data_flag "simulator" for the released synthetic dataset)
VALIDATION_DIR: path to validation data (can be the same as DATA_ROOT, or real-world videos prepared as in the Inference section above)
OUTPUT_DIR: directory to save checkpoints and logs

Make sure DATA_ROOT and VALIDATION_DIR point to valid local paths.

3. Launch training

Edit the distributed settings at the top of example_scripts/train.sh:

NUM_PROCESSES: number of processes (typically equals the number of GPUs)
GPU_IDS: comma-separated GPU ids (e.g., "0,1")

For DeepSpeed settings, edit finetune/accelerate_config.yaml.

Then launch from the repo root:

bash example_scripts/train.sh

Dataset Simulator

We also provide the source code to generate our synthetic dataset with disentangled 4D controls: 19reborn/BulletTime_dataSimulator.

Note on Implementation

In this release, camera and time AdaLN are applied only before attention; the post-attention AdaLN blocks are omitted. We found this design achieves comparable performance with roughly half added parameters.

Acknowledgements

We thank the authors of CogVideoX, ReCamMaster for their open-source contributions.

Citation

If you find this repository useful for your research, please cite:

@inproceedings{wang2026bullettime,
  title   = {Bullettime: Decoupled control of time and camera pose for video generation},
  author  = {Wang, Yiming and Zhang, Qihang and Cai, Shengqu and Wu, Tong and Ackermann, Jan and Kuang, Zhengfei and Zheng, Yang and Raji{\v{c}}, Frano and Tang, Siyu and Wetzstein, Gordon},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages   = {18319--18330},
  year    = {2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
example_scripts		example_scripts
finetune		finetune
.gitignore		.gitignore
LICENSE		LICENSE
MODEL_LICENSE		MODEL_LICENSE
README.md		README.md
environment.yaml		environment.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

Setup

Inference

1. Prepare checkpoints

2. Prepare input video

3. Prepare camera and time controls

Interactive control tools

4. Run inference

Training

1. Training Dataset

2. Configure data and output paths

3. Launch training

Dataset Simulator

Note on Implementation

Acknowledgements

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BulletTime: Decoupled Control of Time and Camera Pose for Video Generation

Setup

Inference

1. Prepare checkpoints

2. Prepare input video

3. Prepare camera and time controls

Interactive control tools

4. Run inference

Training

1. Training Dataset

2. Configure data and output paths

3. Launch training

Dataset Simulator

Note on Implementation

Acknowledgements

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages