Sicheng Zuo

I am a third year Ph.D student in i-VisionGroup in the Department of Automation, Tsinghua University, advised by Prof. Jiwen Lu . In 2023, I received my BS degree from the Department of Automation, Tsinghua University. I am interested in computer vision and deep learning. My current research focuses on autonomous driving and vision foundation models.

Email / Google Scholar / GitHub

News

2026-02: One paper on 3D dense reconstruction is accepted to CVPR 2026.

2025-09: One paper on 3D occupancy prediction is accepted to NeurIPS 2025.

2025-06: One paper on embodied 3D occupancy prediction is accepted to ICCV 2025.

2025-02: One paper on 3D occupancy prediction is accepted to CVPR 2025.

2024-07: One paper on image representation learning is accepted to ECCV 2024.

Publications

*Equal contribution ^†Project leader.

	DVGT: Driving Visual Geometry Transformer Sicheng Zuo* , Zixun Xie* , Wenzhao Zheng^† , Shaoqing Xu, Fang Li, Shengyin Jiang, Long Chen, Zhi-Xin Yang, Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2026. [arXiv] [Code] [Project Page] DVGT is a universal driving geometry model that reconstructs metric-scaled dense 3D point maps directly from unposed multi-view images, significantly outperforming existing SOTA methods and generalizing across diverse camera setups and driving scenarios.
	QuadricFormer: Scene as Superquadrics for 3D Semantic Occupancy Prediction Sicheng Zuo* , Wenzhao Zheng^† , Xiaoyong Han , Longchao Yang, Yong Pan, Jiwen Lu The Thirty-Ninth Annual Conference on Neural Information Processing Systems (NeurIPS), 2025. [arXiv] [Code] [Project Page] QuadricFormer proposes geometrically expressive superquadrics as scene primitives, enabling efficient and powerful object-centric representation of driving scenes.
	Gaussianworld: Gaussian world model for streaming 3d occupancy prediction Sicheng Zuo* , Wenzhao Zheng^† , Yuanhui Huang , Jie Zhou , Jiwen Lu IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)*, 2025. [arXiv] [Code] GaussianWorld reformulates 3D occupancy prediction as a 4D occupancy forecasting problem conditioned on the current sensor input and proposes a Gaussian World Model to exploit the scene evolution for perception.
	EmbodiedOcc: Embodied 3D Occupancy Prediction for Vision-based Online Scene Understanding Yuqi Wu, Wenzhao Zheng^† , Sicheng Zuo , Yuanhui Huang , Jie Zhou , Jiwen Lu IEEE International Conference on Computer Vision (ICCV), 2025. [arXiv] [Code] [Project Page] EmbodiedOcc formulates an embodied 3D occupancy prediction task and employs a Gaussian-based framework to accomplish it.
	SpatialFormer: Towards Generalizable Vision Transformers with Explicit Spatial Understanding Han Xiao* , Wenzhao Zheng* , Sicheng Zuo , Peng Gao, Jie Zhou , Jiwen Lu European Conference on Computer Vision (ECCV), 2024. [Paper] SpatialFormer proposes an efficient vision transformer architecture with explicit spatial understanding for generalizable image representation learning.
	PointOcc: Cylindrical Tri-Perspective View for Point-based 3D Semantic Occupancy Prediction Sicheng Zuo* , Wenzhao Zheng* , Yuanhui Huang , Jie Zhou , Jiwen Lu arXiv, 2023. [arXiv] [Code] [中文解读 (in Chinese)] As the first 2D-projection-based method on the 3D semantic occupancy prediction task, PointOcc significantly outperforms all other methods by a large margin with a much faster speed.

Website Template