Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation.
Link: magazine.sebastianraschka.com/p/components-o…
Looks like the first open source equivalent of ChatGPT has arrived: github.com/lucidrains/PaL…
I.e., an implementation of RLHF (Reinforcement Learning with Human Feedback) on top of Google’s 540 billion parameter PaLM architecture
When I started LLMs-from-scratch I just hoped it might help a few people learn.
Just saw the GitHub the repo has now been forked 10k times!
More than the stars, the best part is seeing thousands of people actually use and build on the code ☺️
Maybe a hot take, but what about the following advice to the next gen:
Don't get an AI degree; the curriculum will be outdated before you graduate. Instead, study math, stats, or physics as your foundation, and stay current with AI through code-focused books, blogs, and papers.
Couldn't resist.
Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…
Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering.
Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!
"Simplifying Transformer Blocks" ranks easily among my favorite research papers that I've read this year.
Here, the authors look into how the standard transformer block, essential to LLMs, can be simplified without compromising convergence properties and downstream task
The Llama 3.2 1B and 3B models are my favorite LLMs -- small but very capable.
If you want to understand how the architectures look like under the hood, I implemented them from scratch (one of the best ways to learn): github.com/rasbt/LLMs-fro…
Just reorganized and uploaded all the TensorFlow and PyTorch models and methods I implemented for teaching in a fresh GitHub repo -- 80 Jupyter notebooks in total :) github.com/rasbt/deeplear…
I’ve been working on something new:
📚 Build a Reasoning Model (From Scratch).
The first chapters just went live!
(The book will cover topics from inference-time scaling to reinforcement learning)
"What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance.
The concept is relatively simple. The authors delete attention
One of the best ways to understand LLMs is to code one from scratch!
Last summer, I started working on a new book, "Build a Large Language Model (from Scratch)": manning.com/books/build-a-…
I'm excited to share that the first chapters are now available via Manning's early access
My top-10 study list if I was learning machine learning again:
1. Python
2. Intro Data Science
3. Intro Machine Learning
4. Version Control
5. Intro Algos & Data Structures
6. Intro Linear Algebra
7. Intro Calculus
8. Deep Learning
9. Intro Proba & Stats
10. Parallel Computing
Just put together a short Jupyter notebook with tips and tricks for reducing memory usage when loading larger and larger models (like LLMs) in PyTorch: github.com/rasbt/LLMs-fro…
(PS: This is an LLM example but the same concepts apply to any PyTorch model)