Changbin Zhang

张长彬

Ph.D. Candidate

The University of Hong Kong

zhangchbin@gmail.com

I'm currently a final-year Ph.D. candidate working with Dr. Yujie Zhong and Prof. Kai Han at The University of Hong Kong (HKU) . Before that, I spent wonderful years at Nankai University (NKU), supervised by Prof. Ming-Ming Cheng and received my M.Eng degree in Computer Science.

I am currently working on research related to RL / Agentic RL, on-policy distillation (OPD), Multi-Agent Systems, and self-evolving agents. I have published 5 first-author and 2 co-first-author papers in CCF-A venues, including 1 ESI Highly Cited Paper (Top 1%) and 1 CVPR Highlight paper (Top 2.8%). My work has received over 2,300 citations on Google Scholar. I also won a 🥈 Silver Medal in the ACM-ICPC Asia Regional Contest. I am currently on the job market, seeking roles in post-training and agents.

我目前从事 RL / Agentic RL、on-policy distillation (OPD)、Multi-Agent Systems、self-evolving agents 等相关研究。我以第一作者（含共同一作）身份在 CCF-A 类会议上发表论文 7 篇（5 篇一作、2 篇共同一作），其中包含 1 篇 ESI 高被引论文（Top 1%）和 1 篇 CVPR Highlight 论文（Top 2.8%）。我在 Google Scholar 上累计获得超过 2300 次引用。我也曾获得 ACM-ICPC 亚洲区域赛 🥈 银牌。我目前正在求职，寻找 post-training 与 agent 相关的职位。

Publications

Visual Perception Foundation Model MoE Foundation Model Reinforcement Learning for MLLMs Think with visual primitives All

ICML 2026

iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning
Changbin Zhang, Yujie Zhong, Qiang Zhang and Kai Han
ICML, 2026
[arXiv] [code] [project page] [models]
A new third paradigm of visual reasoning that internalizes visual grounding into textual CoT via RL — beyond OpenAI's "think with images" and DeepSeek's "think with visual primitives".

Reinforcement Learning for MLLMs Think with visual primitives
Under Review

Mr. DETR++: Instructive Multi-Route Training for Detection Transformers with Mixture-of-Experts
Changbin Zhang, Yujie Zhong and Kai Han
Under Review, 2025
[arXiv] [code] [project page] [HuggingFace Demo]
A flexible multi-route MoE training framework that boosts detection transformers to SOTA across object detection, instance and panoptic segmentation.

MoE Foundation Model
CVPR 2025

Mr. DETR: Instructive Multi-Route Training for Detection Transformers
Changbin Zhang, Yujie Zhong and Kai Han
CVPR, 2025
[arXiv] [code] [project page] [HuggingFace Demo]
April, 2025, Rank #1 in the Leaderboard of COCO 2017 val

Visual Perception Foundation Model
CVPR 2025

v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Changbin Zhang, Jinhong Ni, Yujie Zhong and Kai Han
CVPR, 2025, Highlight (Top 2.8% of submissions)
[arXiv] [code] [project page] [HuggingFace Demo]
View-consistent learning achieves SOTA on open-world instance segmentation.

Visual Perception Foundation Model
CVPR 2023

Endpoints Weight Fusion for Class Incremental Semantic Segmentation
Jia-Wen Xiao*, Changbin Zhang*, Jiekang Feng, Xialei Liu, Joost van de Weijer and Ming-Ming Cheng
CVPR, 2023
[IEEE/CVF] [code]
An elegant method that balances stability and plasticity for class-incremental segmentation.

Visual Perception Foundation Model
CVPR 2022

Representation Compensation Networks for Continual Semantic Segmentation
Changbin Zhang*, Jia-Wen Xiao*, Xialei Liu, Yingcong Chen and Ming-Ming Cheng
CVPR, 2022
[IEEE/CVF] [arXiv] [中译版] [code]
A re-parameterization approach that decouples old/new knowledge, hitting SOTA continual segmentation with zero inference overhead.

Visual Perception Foundation Model
IEEE TIP 2021

Delving Deep into Label Smoothing
Changbin Zhang*, Peng-Tao Jiang*, Qibin Hou, Yunchao Wei, Qi Han, Zhen Li and Ming-Ming Cheng
IEEE Transactions on Image Processing (IEEE TIP), 2021
[IEEE Explore] [arXiv] [中译版] [code]
An online label smoothing that mines the model's own prediction statistics to craft more reliable soft labels.

Visual Perception Foundation Model
IEEE TIP 2021

LayerCAM: Exploring Hierarchical Class Activation Maps For Localization
Peng-Tao Jiang*, Changbin Zhang*, Qibin Hou, Ming-Ming Cheng and Yunchao Wei
IEEE Transactions on Image Processing (IEEE TIP), 2021
[IEEE Explore] [中译版] [code] [ESI Highly Cited Paper (1%)]
The first method to extract class activation maps from ANY network layer, revealing hierarchical localization.

Visual Perception Foundation Model
ICCV 2021

Personalized Image Semantic Segmentation
Yu Zhang, Changbin Zhang, Peng-Tao Jiang, Ming-Ming Cheng and Mao Feng
ICCV, 2021
[IEEE/CVF] [arXiv] [中译版] [code]

Visual Perception Foundation Model
IEEE TPAMI 2021

Deep Hough Transform for Semantic Line Detection
Kai Zhao*, Qi Han*, Changbin Zhang, Jun Xu, and Ming-Ming Cheng
IEEE Transactions on Pattern Analysis and Machine Intelligence (IEEE TPAMI), 2021
[IEEE Explore] [arXiv] [中译版] [code] [ESI Highly Cited Paper (1%)]

Visual Perception Foundation Model

Experience

ByteDance · Internship · 2026.05 ~ present · Beijing
SAIC Motor Autonomous · Engineer · 2022.07 ~ 2023.01 · Shanghai, working closely with Dr. Peixuan Li
NIO Autonomous · Internship · 2022.01 ~ 2022.04 · Beijing
DJI · Internship · 2021.06 ~ 2021.08 · Shenzhen

Honors & Awards

💰

National Scholarship · 2017 / 2018 · 16,000 CNY
国家奖学金
💰

Nankai - SK Hynix Scholarship · 2020 / 2021 · 42,000 CNY
南开大学-SK海力士奖学金
🥈

Silver Medal of International Collegiate Programming Contest (ACM-ICPC) Asia Regional Contest · 2017
ACM-ICPC 亚洲区域赛银牌
🥇

First Prize of JiangSu Province Collegiate Programming Contest (JSCPC) · 2017
江苏省大学生程序设计竞赛一等奖