Sebastian Raschka (@rasbt) / X

Sebastian Raschka

19.8K posts

Sebastian Raschka

@rasbt

ML/AI research engineer. Ex stats professor. Author of "Build a Large Language Model From Scratch" (amzn.to/4fqvn0D) & reasoning (mng.bz/lZ5B)

United States

Joined October 2012

Pinned
Sebastian Raschka
@rasbt
Apr 4
Components of a coding agent: a little write-up on the building blocks behind coding agents, from repo context and tool use to memory and delegation. Link: magazine.sebastianraschka.com/p/components-o…
141K
Sebastian Raschka
@rasbt
Dec 28, 2022
Looks like the first open source equivalent of ChatGPT has arrived: github.com/lucidrains/PaL… I.e., an implementation of RLHF (Reinforcement Learning with Human Feedback) on top of Google’s 540 billion parameter PaLM architecture
GitHub - lucidrains/PaLM-rlhf-pytorch: Implementation of RLHF (Reinforcement Learning with Human...
From github.com
1.2M
Sebastian Raschka
@rasbt
Sep 13, 2025
When I started LLMs-from-scratch I just hoped it might help a few people learn. Just saw the GitHub the repo has now been forked 10k times! More than the stars, the best part is seeing thousands of people actually use and build on the code ☺️
234K
Sebastian Raschka
@rasbt
Jul 12, 2025
Kimi K2 is basically DeepSeek V3 but with fewer heads and more experts:
551K
Sebastian Raschka
@rasbt
Feb 9, 2025
Maybe a hot take, but what about the following advice to the next gen: Don't get an AI degree; the curriculum will be outdated before you graduate. Instead, study math, stats, or physics as your foundation, and stay current with AI through code-focused books, blogs, and papers.
285K
Sebastian Raschka
@rasbt
Aug 17, 2025
Couldn't resist. Here's a pure PyTorch from-scratch re-implementation of Gemma 3 270M in a Jupyter Notebook (uses about 1.49 GB RAM): github.com/rasbt/LLMs-fro…
Sebastian Raschka
@rasbt
Aug 14, 2025
Gemma 3 270M! Great to see another awesome, small open-weight LLM for local tinkering. Here's a side-by-side comparison with Qwen3. Biggest surprise that it only has 4 attention heads!
349K
Sebastian Raschka
@rasbt
Nov 27, 2023
"Simplifying Transformer Blocks" ranks easily among my favorite research papers that I've read this year. Here, the authors look into how the standard transformer block, essential to LLMs, can be simplified without compromising convergence properties and downstream task
1M
Sebastian Raschka
@rasbt
Oct 5, 2024
The Llama 3.2 1B and 3B models are my favorite LLMs -- small but very capable. If you want to understand how the architectures look like under the hood, I implemented them from scratch (one of the best ways to learn): github.com/rasbt/LLMs-fro…
293K
Sebastian Raschka
@rasbt
Jun 5, 2019
Just reorganized and uploaded all the TensorFlow and PyTorch models and methods I implemented for teaching in a fresh GitHub repo -- 80 Jupyter notebooks in total :) github.com/rasbt/deeplear…
Sebastian Raschka
@rasbt
Aug 28, 2025
I’ve been working on something new: 📚 Build a Reasoning Model (From Scratch). The first chapters just went live! (The book will cover topics from inference-time scaling to reinforcement learning)
160K
Sebastian Raschka
@rasbt
Oct 22, 2024
"What Matters In Transformers?" is an interesting paper (arxiv.org/abs/2406.15786) that finds you can actually remove half of the attention layers in LLMs like Llama without noticeably reducing modeling performance. The concept is relatively simple. The authors delete attention
195K
Sebastian Raschka
@rasbt
Dec 17, 2023
One of the best ways to understand LLMs is to code one from scratch! Last summer, I started working on a new book, "Build a Large Language Model (from Scratch)": manning.com/books/build-a-… I'm excited to share that the first chapters are now available via Manning's early access
Build a Large Language Model (From Scratch) - Sebastian Raschka
From manning.com
679K
Sebastian Raschka
@rasbt
Oct 26, 2022
My top-10 study list if I was learning machine learning again: 1. Python 2. Intro Data Science 3. Intro Machine Learning 4. Version Control 5. Intro Algos & Data Structures 6. Intro Linear Algebra 7. Intro Calculus 8. Deep Learning 9. Intro Proba & Stats 10. Parallel Computing
Sebastian Raschka
@rasbt
Oct 14, 2024
Just put together a short Jupyter notebook with tips and tricks for reducing memory usage when loading larger and larger models (like LLMs) in PyTorch: github.com/rasbt/LLMs-fro… (PS: This is an LLM example but the same concepts apply to any PyTorch model)
158K