AI Infrastructure · MLSys · Compilers
I build and optimize the systems that make large models fast — from compiler-level kernel work up to distributed inference serving.
- Multimodal & LLM Inference Infrastructure (main focus) — performance engineering for multimodal serving on SGLang-omni, alongside LLM serving stacks (SGLang, vLLM): model integration, scheduling, memory efficiency, and throughput/latency optimization.
- RL Infrastructure — systems and tooling for reinforcement learning workloads: training/inference orchestration, rollout, and scaling.
- Kernel Compiler Optimization — compiler-driven kernel optimization for ML workloads: codegen, graph-level transformations, and automatic kernel generation/tuning (Triton, CUDA) on NVIDIA Hopper (H100) and Blackwell (B200).
-
📧 Email: chongyue.cc@gmail.com
-
💼 LinkedIn: Chenchen Hong
-
🐦 X / Twitter: @HaydenCC
-
✍️ Blog: hayden727.github.io
Feel free to touch me on WeChat: hayden-gai.


