Hi there! I am an incoming PhD student at The Chinese University of Hong Kong, where I am advised by Weiyang Liu. Now I'm interning at Tongyi Lab. Previously, I spent a wonderful time at the University of Electronic Science and Technology of China (undergraduate) and Shanghai AI Laboratory.
I am deeply interested in the fundamental problems of multimodal sequence modeling (e.g., efficient attention and position encoding) and optimizing training and inference efficiency. Recently, I enjoy developing simple yet effective algorithms that scale efficiently on modern ML systems. Some papers are highlighted.
*Equal Contribution
‡Project Lead
†Corresponding Author
Orthogonal Model Merging Sihan Yang,
Kexuan Shi,
Weiyang Liu
ICML 2026 Homepage |
Code |
Paper |
arXiv
We introduce a geometrically principled framework that shifts the integration of expert models from Euclidean space to the Riemannian manifold of the orthogonal group, effectively maintaining model performance across diverse tasks and mitigating catastrophic forgetting.
MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence Sihan Yang*,
Runsen Xu*‡,
Yiman Xie,
Sizhe Yang,
Mo Li,
Jingli Lin,
Chenming Zhu,
Xiaochen Chen,
Haodong Duan,
Xiangyu Yue,
Dahua Lin,
Tai Wang†,
Jiangmiao Pang† ICLR 2026 Homepage |
Dataset |
Paper |
arXiv |
Code
We introduce a challenging, diverse, and comprehensive multi-image spatial reasoning benchmark, manually annotated by six 3D vision experts, which additionally supports thorough evaluation of reasoning processes.
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization Sihan Yang,
Runsen Xu,
Chenhang Cui,
Tai Wang,
Dahua Lin,
Jiangmiao Pang
ICCV 2025
Paper |
arXiv |
Code
Improving Alignment in LVLMs with Debiased Self-Judgment Sihan Yang*,
Chenhang Cui*,
Zihao Zhao,
Yiyang Zhou,
Weilong Yan,
Ying Wei,
Huaxiu Yao
EMNLP 2025 Findings
Paper |
arXiv |
Dataset |
Code