Log inSign up
Beidi Chen
609 posts
Image
user avatar
Beidi Chen
@BeidiChen
Asst. Prof @CarnegieMellon, @amazon Scholar, Prev: Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.
infini-ai-lab.cmu.edu
Joined November 2011
411
Following
16.6K
Followers
  • user avatar
    Beidi Chen
    @BeidiChen
    Jul 17, 2022
    Excited to share some life updates 🥳📢: I'll be starting as an Assistant Professor @CarnegieMellon @CMU_ECE in Fall 2023. Until then, I'll be a visiting researcher at @Meta @metaai. I'm heading to #ICML2022 tmr!!! DM if you want to catch up 😃☕️🍱...
  • user avatar
    Beidi Chen
    @BeidiChen
    Nov 7, 2024
    🥳We're recruiting PhD students at CMU for Fall 2025! If you are interested in machine-learning algorithms and systems (🔑Keywords: new model arch, LLM reasoning, longcontext modeling, efficiency, etc), please mention my name in your application~ 👇 Application links: (Dec 15)
    ece.cmu.edu
    Admissions - Electrical and Computer Engineering - College of Engineering - Carnegie Mellon...
    Review the requirements for admission into Carnegie Mellon’s Department of Electrical and Computer Engineering. We encourage prospective students to visit campus and tour the department.
    123K
  • user avatar
    Beidi Chen
    @BeidiChen
    Nov 27, 2024
    Replying to @bubbleboi and @beidi
    Not Llama, 30K commits for developing MagicPig for Llama to counter NVIDIA's monopoly 😉 @chenzhuoming911 : github.com/Infini-AI-Lab/…
    Image
    23K
  • user avatar
    Beidi Chen
    @BeidiChen
    Nov 27, 2024
    🫢 oops someone discovered our secret summer proj to counter NVIDIA's monopoly @chenzhuoming911 github.com/Infini-AI-Lab/…
    Image
    Image
    user avatar
    bubble boi
    Thru
    @bubbleboi
    Nov 27, 2024
    Most insane github I've ever seen in my life lol.
    91K
  • user avatar
    Beidi Chen
    @BeidiChen
    Mar 13, 2024
    📢 Announcing our new speculative decoding framework Sequoia ❗️❗️❗️ It can now serve Llama2-70B on one RTX4090 with half-second/token latency (exact❗️no approximation) 🤔Sounds slow as a sloth 🦥🦥🦥??? Fun fact😛: DeepSpeed -> 5.3s / token; 8 x A100: 25ms / token (costs 8 x
    Image
    GIF
    104K
  • user avatar
    Beidi Chen
    @BeidiChen
    Dec 10, 2021
    Can sparse training achieve wall-clock time speed up on GPU? Yes! Simple and static #sparsity -> 2.5x faster🚀 training MLP-Mixer, ViT, and GPT-2 medium from scratch with NO drop in accuracy. arxiv.org/abs/2110.15343 (#NeurIPS2021) arxiv.org/abs/2112.00029 [1/6]
    Image
  • user avatar
    Beidi Chen
    @BeidiChen
    Feb 14, 2025
    ⏰📢After years of working on long-context efficiency, I’ve started to doubt if it’s truly necessary (Many of you have probably noticed the decline of interest in long llms). Despite strong models like Gemini, short-context + retrieval often do the trick—faster, cheaper, and
    user avatar
    Infini-AI-Lab
    @InfiniAILab
    Feb 14, 2025
    🚀 RAG vs. Long-Context LLMs: The Real Battle ⚔️ 🤯Turns out, simple-to-build RAG can match million-dollar long-context LLMs (LC LLMs) on most existing benchmarks. 🤡So, do we even need long-context models? YES. Because today’s benchmarks are flawed: ⛳ Too Simple –
    Image
    65K
  • user avatar
    Beidi Chen
    @BeidiChen
    Apr 24, 2024
    ❓Wanna host a Llama2-7B-128K (14GB weight + 64GB KV cache) at home🤔 📢 Introducing TriForce! 🚀Lossless Ultra-Fast Long Seq Generation — training-free Spec Dec! 🌟 🔥 TriForce serves with 0.1s/token on 2 RTX4090s + CPU – only 2x slower on an A100 (~55ms on chip), 8x faster
    Image
    GIF
    53K
  • user avatar
    Beidi Chen
    @BeidiChen
    Dec 10, 2022
    📢My group at @CMU_ECE is looking for Ph.D. students in #Algorithms #MLSys (ddl Dec 15)! Let’s shed new light on classical algorithms, make ML more accessible to the general community, and advance interdisciplinary research (science?!) together! 🙏Plz help spread the world.
  • user avatar
    Beidi Chen
    @BeidiChen
    Jul 10, 2025
    I was asked many times lately what repo to use by students who’re working on test-time scaling with slight modified attention or generation workflow (customized reward model /search). HF is a bit too time consuming esp with tons of token generation and Sglang/vllm is a bit hard
    user avatar
    Infini-AI-Lab
    @InfiniAILab
    Jul 10, 2025
    🧵 Glad to introduce LiteSys the inference framework we used in📄 Kinetics: Rethinking Test-Time Scaling Laws (arxiv.org/abs/2506.05333) to evaluate test-time scaling (32K+ generated tokens) at scale. If you are: ✅ Looking for an inference framework that's easy to extend. 🐢
    Image
    29K
  • user avatar
    Beidi Chen
    @BeidiChen
    Oct 7, 2025
    📢🔥 New off-policy RL for LLMs — now training 32B model with 200+ stale steps for the first time, while still matching on-policy accuracy 💪 A big step toward scalable & decentralized agent training 😉
    user avatar
    Infini-AI-Lab
    @InfiniAILab
    Oct 7, 2025
    🤔Can we train RL on LLMs with extremely stale data? 🚀Our latest study says YES! Stale data can be as informative as on-policy data, unlocking more scalable, efficient asynchronous RL for LLMs. We introduce M2PO, an off-policy RL algorithm that keeps training stable and
    Image
    28K
  • user avatar
    Beidi Chen
    @BeidiChen
    Jul 29, 2023
    Do you know KV cache would easily take 160GB on Llama2-70B, e.g. 8K seqlen + 64batch size, even it has multi-group Attn? Come and see our preliminary work on how to use a super simple cache eviction policy to reduce this bottleneck! There’re huge opportunities in this space 🫵🏻
    user avatar
    Zhenyu (Allen) Zhang
    xAI
    @KyriectionZhang
    Jul 29, 2023
    We will present H2O tomorrow in the poster session of ES-FoMo Workshop #ICML2023 at 1:00 p.m. - 2:00 p.m. (Sat. 29 July). Please join us and chat!
    Image
    71K
  • user avatar
    Beidi Chen
    @BeidiChen
    May 3, 2024
    📢 Our new work LESS leverages the observation that pretrained LLMs Attention has intrinsically sparse+lowrank structure. ☝️So at inference time, we can decompose KV Cache into constant sparse and RNN states (because lowrank attention is RNN). This also explains why the recent
    user avatar
    Harry Dong
    @Real_HDong
    May 3, 2024
    Upgrade your LLM KV cache eviction policy with LESS, our method to retain local and global information during generation with pretrained LLMs! Excited to share this at ICML! Paper: arxiv.org/abs/2402.09398 w/ @Xinyu2ML, @KyriectionZhang , Zhangyang Wang, Yuejie Chi, @BeidiChen
    Image
    46K
  • user avatar
    Beidi Chen
    @BeidiChen
    Aug 21, 2024
    🤯This study explains my year-long confusion on why #GPT4 leak says OpenAI deployed speculative decoding in their serving last June by @dylan522p @SemiAnalysis_ because I thought SD is only useful for small batches... Surprisingly speculative decoding can bring more benefits when
    user avatar
    AK
    @_akhaliq
    Aug 21, 2024
    MagicDec Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding discuss: huggingface.co/papers/2408.11… Large Language Models (LLMs) have become more prevalent in long-context applications such as interactive chatbots, document analysis, and
    Image
    31K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up