Log inSign up
Linden Li
Applied Compute
155 posts
Image
user avatar
Linden Li
Applied Compute
@lindensli
Co-Founder @appliedcompute. Previously scaling @OpenAI, @DbrxMosaicAI, @NVIDIA
Stanford, CA
linden-li.github.io
Joined April 2021
869
Following
2,878
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Linden Li
    Applied Compute
    @lindensli
    Apr 8
    We’ve been heads down on our mission to build Specific Intelligence for the enterprise. Today we’re announcing $80M in new funding to help us get there even faster. We’re hiring across engineering and research and working to deploy AI systems that improve the more you use them.
    user avatar
    Applied Compute
    @appliedcompute
    Apr 8
    Article
    Applied Compute Raises $80M to Help Enterprises Advance from Generalized to Specific Intelligence
    Models keep getting smarter, but there's a massive gap between raw intelligence and actual productivity on specific tasks inside companies. Delivering real value requires knowing how to perform those...
    13K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Sep 30, 2022
    @abhi_venigalla and I turned @karpathy’s minGPT into a GPT-3 quality model with 30 billion parameters—projected to cost only $450k to train. The code to do so is public: it's easily readable and can be launched on however many GPUs you want. Here’s how:
    Image
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Aug 12, 2022
    Recently, @abhi_venigalla and I trained GPTs from scratch to see if we could train LLMs like well-resourced companies do. Here’s what I learned going from 125 million parameters to 1.3 billion. Spoiler: training costs are within reach now. And it’s about to get a lot cheaper.
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Dec 9, 2023
    I’ll be giving a talk tomorrow at NeurIPS about the fundamentals of LLM inference. The talk will start by developing a first principles, systems-approach to reasoning about the inference workload and conclude with a survey of the current state of the art. Some concepts covered:
    61K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Oct 13, 2023
    Inference performance has typically reported one number: tokens/sec. This single number tells an incomplete story, since inference consists of two steps with dramatically different profiles: prefill and decoding. As a result, we think there are two metrics to care about when
    Image
    17K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Oct 29, 2025
    Introducing @appliedcompute. We build Specific Intelligence, to create the first generation of agent workforces. It’s been an incredible six months since @ypatil125, @rhythmrg, and I left OpenAI to work on this problem together. We’ve brought together an insanely talent-dense
    user avatar
    Applied Compute
    @appliedcompute
    Oct 29, 2025
    Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's
    Image
    29K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Aug 12, 2022
    Replying to @lindensli
    I did this all on the @MosaicML cloud, which made this training these models a lot faster and easier than I expected. Check out our blog post with all these findings at: mosaicml.com/blog/billion-p…
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Dec 13, 2023
    I've posted my slides from a recent talk delivered at NeurIPS about the fundamentals of transformer inference onto my website here: linden-li.github.io/posts/inferenc…. Hope it's helpful and happy to answer any questions!
    user avatar
    Linden Li
    Applied Compute
    @lindensli
    Dec 9, 2023
    I’ll be giving a talk tomorrow at NeurIPS about the fundamentals of LLM inference. The talk will start by developing a first principles, systems-approach to reasoning about the inference workload and conclude with a survey of the current state of the art. Some concepts covered:
    19K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Mar 27, 2024
    Excited to release DBRX, a 132 billion parameter mixture of experts language model with 36 billion active parameters. It’s not only a super capable model, but has many nice properties at inference time because of its MoE architecture. Long context (up to 32K tokens), large batch
    9.2K
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Aug 12, 2022
    Replying to @lindensli
    The price-tag (*for now*): ~$4800 on a 4 nodes of AWS p4d on-demand to train a 1.3B GPT model on 20B tokens (according to compute-optimal scaling from the @DeepMind Chinchilla paper). This is just with vanilla HuggingFace models without any optimizations.
    Image
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Sep 30, 2022
    Replying to @lindensli
    Here's the final cost table:
    Image
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Aug 12, 2022
    Replying to @lindensli
    On a 1.3B parameter model, 4 nodes means a 3.9x gain over one node. On 16 nodes, it’s 14.4x.
    Image
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Aug 12, 2022
    Replying to @lindensli
    But we observed that the performance penalty isn’t as harsh as what you might think. Instead, we found near-linear strong scaling: fixing the global batch size and training on more GPUs led to proportional increases in training throughput.
    Image
  • user avatar
    Linden Li
    Applied Compute
    @lindensli
    Sep 30, 2022
    Replying to @lindensli
    @DeepMind’s Chinchilla scaling laws found that large models like GPT3-175B and Gopher-280B could be trained using fewer parameters, but more data. They present an equation that can project the expected pretraining loss for a given number of model params and tokens.
    Image
This post is unavailable.