My interests revolve broadly around the fields of vision, language, and robotics with a focus on the perception and modeling of human behavior in 3D environments.
We introduce a novel skill learning framework, ModSkill, that decouples complex full-body skills into compositional, modular skills for independent body parts, leveraging body structure-inspired inductive bias to enhance skill learning performance.
PhysHMR: Learning Humanoid Control Policies from Vision for Physically Plausible Human Motion Reconstruction
Vid2Sim achieves high-quality, simulation-ready reconstruction of appearance, geometry, and physics from multi-view videos.
CoMo: Controllable Motion Generation through Language Guided Pose Code Editing
Yiming Huang, Weilin Wan, Yue Yang, Chris Callison-Burch, Mark Yatskar, Lingjie Liu,
ECCV 2024 arxiv /
code /
website /
We present CoMo, a unified framework for fine-grained, text-driven human motion generation and editing using discrete and semantically meaningful pose codes.
Kristen Grauman, Andrew Westbury, (et al., including Yiming Huang)
CVPR 2024 (Oral) arxiv /
video /
website /
We present Ego-Exo4D, a diverse, large-scale multimodal multiview video dataset and benchmark challenge, centered around simultaneously-captured egocentric and exocentric video of skilled human activities (e.g., sports, music, dance, bike repair).
We propose a network encoder that converts motion sequences into periodic signals and a conditional diffusion model for predicting periodic motion parameters based on text descriptions and the starting pose, enabling the generation of a broader variety of high-quality longer motion sequences.