Linden Li (@lindensli) / X

Linden Li

155 posts

Linden Li

@lindensli

Co-Founder @appliedcompute. Previously scaling @OpenAI, @DbrxMosaicAI, @NVIDIA

Stanford, CA

Joined April 2021

Pinned
Linden Li
@lindensli
Apr 8
We’ve been heads down on our mission to build Specific Intelligence for the enterprise. Today we’re announcing $80M in new funding to help us get there even faster. We’re hiring across engineering and research and working to deploy AI systems that improve the more you use them.
Applied Compute
@appliedcompute
Apr 8
Article
Applied Compute Raises $80M to Help Enterprises Advance from Generalized to Specific Intelligence
Models keep getting smarter, but there's a massive gap between raw intelligence and actual productivity on specific tasks inside companies. Delivering real value requires knowing how to perform those...
13K
Linden Li
@lindensli
Sep 30, 2022
@abhi_venigalla and I turned @karpathy’s minGPT into a GPT-3 quality model with 30 billion parameters—projected to cost only $450k to train. The code to do so is public: it's easily readable and can be launched on however many GPUs you want. Here’s how:
Linden Li
@lindensli
Aug 12, 2022
Recently, @abhi_venigalla and I trained GPTs from scratch to see if we could train LLMs like well-resourced companies do. Here’s what I learned going from 125 million parameters to 1.3 billion. Spoiler: training costs are within reach now. And it’s about to get a lot cheaper.
Linden Li
@lindensli
Dec 9, 2023
I’ll be giving a talk tomorrow at NeurIPS about the fundamentals of LLM inference. The talk will start by developing a first principles, systems-approach to reasoning about the inference workload and conclude with a survey of the current state of the art. Some concepts covered:
61K
Linden Li
@lindensli
Oct 13, 2023
Inference performance has typically reported one number: tokens/sec. This single number tells an incomplete story, since inference consists of two steps with dramatically different profiles: prefill and decoding. As a result, we think there are two metrics to care about when
17K
Linden Li
@lindensli
Oct 29, 2025
Introducing @appliedcompute. We build Specific Intelligence, to create the first generation of agent workforces. It’s been an incredible six months since @ypatil125, @rhythmrg, and I left OpenAI to work on this problem together. We’ve brought together an insanely talent-dense
Applied Compute
@appliedcompute
Oct 29, 2025
Generalists are useful, but it’s not enough to be smart. Advances come from specialists, whether human or machine. To have an edge, agents need specific expertise, within specific companies, built on models trained on specific data. We call this Specific Intelligence. It's
29K
Linden Li
@lindensli
Aug 12, 2022
Replying to @lindensli
I did this all on the @MosaicML cloud, which made this training these models a lot faster and easier than I expected. Check out our blog post with all these findings at: mosaicml.com/blog/billion-p…
Linden Li
@lindensli
Dec 13, 2023
I've posted my slides from a recent talk delivered at NeurIPS about the fundamentals of transformer inference onto my website here: linden-li.github.io/posts/inferenc…. Hope it's helpful and happy to answer any questions!
Linden Li
@lindensli
Dec 9, 2023
I’ll be giving a talk tomorrow at NeurIPS about the fundamentals of LLM inference. The talk will start by developing a first principles, systems-approach to reasoning about the inference workload and conclude with a survey of the current state of the art. Some concepts covered:
19K
Linden Li
@lindensli
Mar 27, 2024
Excited to release DBRX, a 132 billion parameter mixture of experts language model with 36 billion active parameters. It’s not only a super capable model, but has many nice properties at inference time because of its MoE architecture. Long context (up to 32K tokens), large batch
9.2K
Linden Li
@lindensli
Aug 12, 2022
Replying to @lindensli
The price-tag (*for now*): ~$4800 on a 4 nodes of AWS p4d on-demand to train a 1.3B GPT model on 20B tokens (according to compute-optimal scaling from the @DeepMind Chinchilla paper). This is just with vanilla HuggingFace models without any optimizations.
Linden Li
@lindensli
Sep 30, 2022
Replying to @lindensli
Here's the final cost table:
Linden Li
@lindensli
Aug 12, 2022
Replying to @lindensli
On a 1.3B parameter model, 4 nodes means a 3.9x gain over one node. On 16 nodes, it’s 14.4x.
Linden Li
@lindensli
Aug 12, 2022
Replying to @lindensli
But we observed that the performance penalty isn’t as harsh as what you might think. Instead, we found near-linear strong scaling: fixing the global batch size and training on more GPUs led to proportional increases in training throughput.
Linden Li
@lindensli
Sep 30, 2022
Replying to @lindensli
@DeepMind’s Chinchilla scaling laws found that large models like GPT3-175B and Gopher-280B could be trained using fewer parameters, but more data. They present an equation that can project the expected pretraining loss for a given number of model params and tokens.