Skip to content

sv-pp/SceneVersepp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Lifting Unlabeled Internet-level Data for 3D Scene Understanding
CVPR 2026

Project Page arXiv Dataset

SceneVerse++ teaser

TL;DR

Annotated 3D scene data is scarce. We build an automated data engine that lifts web videos into structured 3D supervision — instance-level point clouds, object layouts, spatial VQA, and vision-language navigation — and show through experiments that this generated data has strong potential to supplement the broad 3D scene understanding.

What's in this repo

This is the public release of the training code and data pipeline from the paper.

Directory Purpose
PQ3D/ 3D instance segmentation training
SpatialLM/ 3D object detection training
data_processing/ Video download, frame extraction, camera-pose visualization for the SVPP dataset

Quick start

1. Get the dataset

huggingface-cli download bigai/SceneVersepp --repo-type dataset --local-dir ./svpp

2. Set up the data-processing environment

The scripts in data_processing/ (video download, frame extraction, pose visualization) use a light-weight environment defined by requirements.txt:

conda create -n svpp python=3.10 -y
conda activate svpp
pip install -r requirements.txt

The training stacks under PQ3D/ and SpatialLM/ each have their own heavier environments. See their respective READMEs.

3. Process the raw videos

# Download YouTube videos referenced by each scene's data_info.json
python data_processing/download_videos.py ./svpp

# Extract raw and cropped frames into images/ and crop_images/
python data_processing/extract_images.py ./svpp

# (Optional) Visualize camera poses for one scene with Open3D
python data_processing/view_camera_poses.py ./svpp --scene-name bedroom_100_3o5KSzfdOSE

4. Train

Each training stack is independent and ships with its own README.md:

  • PQ3D/README.md — segmentation data generation and two-stage training
  • SpatialLM/README.md — layout generation, pretraining, fine-tuning, inference, and evaluation

Citation

@inproceedings{chen2026lifting,
  title     = {Lifting Unlabeled Internet-level Data for 3D Scene Understanding},
  author    = {Chen, Yixin and Zhang, Yaowei and Yu, Huangyue and He, Junchao and Wang, Yan and Huang, Jiangyong and Shen, Hongyu and Ni, Junfeng and Wang, Shaofei and Jia, Baoxiong and Zhu, Song-Chun and Huang, Siyuan},
  booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year      = {2026}
}

Acknowledgements

This repository builds on:

About

Official implementation of CVPR26 paper "Lifting Unlabeled Internet-level Data for 3D Scene Understanding"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages