Yiming Huang

I'm a PhD student (Fall 2024 - now) in Computer and Information Science at the University of Pennsylvania, affiliated with GRASP Lab. I am grateful to be advised by Prof. Lingjie Liu.

My interests revolve broadly around the fields of vision, language, and robotics with a focus on the perception and modeling of human behavior in 3D environments.

I received my master's degree in Robotics from GRASP Lab, University of Pennsylvania, where I had the honor to work with Prof. Lingjie Liu, Prof. Jianbo Shi and Prof. Mark Yatskar. I received my bachelor's degree in Mathematics and Computer Science from NYU Shanghai.

Email / GitHub / Google Scholar / LinkedIn

Research

(* indicates equal contribution)

	ModSkill: Physical Character Skill Modularization Yiming Huang, Zhiyang Dou, Lingjie Liu, ICCV 2025 arxiv / website / We introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts, leveraging body structure-inspired inductive bias to enhance skill learning performance.
	PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction Qiao Feng, Yiming Huang, Yufu Wang, Jiatao Gu, Lingjie Liu, SIGGRAPH Asia 2025 (Conditionally Accepted) PhysHMR learns a visual-to-action policy that directly predicts control signals from visual input for physically plausible motion reconstruction.
	PhysCtrl: Generative Physics for Controllable and Physics-Grounded Video Generation Chen Wang, Chuhao Chen, Yiming Huang, Zhiyang Dou, Yuan Liu, Jiatao Gu, Lingjie Liu, NeurIPS 2025 website / PhysCtrl achieves controllable and physics-grounded video generation from an initial force input.
	Vid2Sim: Generalizable, Video-based Reconstruction of Appearance, Geometry and Physics for Mesh-free Simulation Chuhao Chen, Zhiyang Dou, Chen Wang, Yiming Huang, Anjun Chen, Qiao Feng, Jiatao Gu, Lingjie Liu, CVPR 2025 arxiv / code / website / Vid2Sim achieves high-quality, simulation-ready reconstruction of appearance, geometry, and physics from multi-view videos.
	CoMo: Controllable Motion Generation through Language Guided Pose Code Editing Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu, ECCV 2024 arxiv / code / website / We present CoMo, a unified framework for fine-grained, text-driven human motion generation and editing using discrete and semantically meaningful pose codes.
	Ego-Exo4D: Understanding Skilled Human Activity from First-and Third-Person Perspectives Kristen Grauman, Andrew Westbury, (et al., including Yiming Huang) CVPR 2024 (Oral) arxiv / video / website / We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge, centered around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).
	DiffusionPhase: Motion Diffusion in Frequency Domain Weilin Wan* , Yiming Huang, Shutong Wu, Taku Komura, Wenping Wang, Dinesh Jayaraman, Lingjie Liu arXiv 2023* arxiv / video / code / website / We propose a network encoder that converts motion sequences into periodic signals and a conditional diffusion model for predicting periodic motion parameters based on text descriptions and the starting pose, enabling the generation of a broader variety of high-quality longer motion sequences.