🚀 DFlash now runs on SGLang's new default speculative-decoding engine, Spec V2.
⚡️ Hitting >4.3× baseline throughput (1.5× over native MTP) on Qwen 3.5 397B-A17B. Same quality, more speed!
⭐ github.com/z-lab/dflash
📢 Excited to share that I’ll be joining @UCSanDiego@HDSIUCSD as an Assistant Professor in January 2026! My lab will focus on efficient AI.
I'm recruiting PhD students from HDSI/CSE in the Fall 2024 cycle and also looking for RAs/interns! More info see zhijianliu.com.
Expanding LLMs' context size requires expensive training. Our new method, LongLoRA, allows for fine-tuning using sparse local attention, but retains dense attention during inference. This enabled us to extend LLaMA2-70B's context length to 32K using a single 8xA100 machine. 🧵
Thrilled to announce that our papers on FlatFormer and SparseViT have been accepted to #CVPR2023! They showcase remarkable efficiency improvements for transformers in 2D and 3D vision. Stay tuned for more updates!
Thrilled to announce that our papers on FlatFormer and SparseViT have been accepted to #CVPR2023! They showcase remarkable efficiency improvements for transformers in 2D and 3D vision. Stay tuned for more updates!
We are excited to share our latest research, BEVFusion (bevfusion.mit.edu), that achieves SOTA performance on nuScenes for 3D object detection and BEV map segmentation, with half the computation cost. We will soon release our code at github.com/mit-han-lab/be…. Stay tuned!
Deeply grateful to my amazing advisor (@songhan_mit) for his unwavering support through my PhD and beyond. Huge thanks to my mentors (@KurtKeutzer, @drmapavone, @ShenlongWang, Philip Harris, and many others) for their guidance. Thanks everyone who has been part of this journey!
Super excited to receive the Qualcomm Innovation Fellowship 2021. Thanks for the great support from my advisor @SongHan_MIT and collaborators!
qualcomm.com/research/resea…
Until then, I'll be working at @NVIDIA Research on building efficient foundation models. I'll still be based in Boston, so please say hi if you're around -- I'd love to catch up!
We are excited to share our latest research, BEVFusion (bevfusion.mit.edu), that achieves SOTA performance on nuScenes for 3D object detection and BEV map segmentation, with half the computation cost. We will soon release our code at github.com/mit-han-lab/be…. Stay tuned!
FlatFormer bridges the 3-4x latency gap between point cloud transformers and sparse convolutional models. The key idea is to trade spatial proximity for better computational regularity. This is a joint effort with Xinyu Yang, @haotiant1998, Shang Yang, and @songhan_mit.
Dear efficientml.ai students,
Congratulations on completing the TinyML and Efficient Deep Learning course! I hope that you have found the course to be informative and valuable in learning about the challenges and techniques of deploying neural networks on mobile devices.