Tim Dettmers (@Tim_Dettmers) / X

Tim Dettmers

3,840 posts

Tim Dettmers

@Tim_Dettmers

Creator of bitsandbytes. Professor @CarnegieMellon and Research Scientist @allen_ai . I blog about deep learning and PhD life at timdettmers.com.

Pittsburgh, PA

timdettmers.com/about

Joined October 2012

Pinned
Tim Dettmers
@Tim_Dettmers
Jul 30, 2024
After 7 months on the job market, I am happy to announce: - I joined @allen_ai - Professor at @CarnegieMellon from Fall 2025 - New bitsandbytes maintainer @Titus_vK My main focus will be to strengthen open-source for real-world problems and bring the best AI to laptops 🧵
258K
Tim Dettmers
@Tim_Dettmers
May 24, 2023
QLoRA: 4-bit finetuning of LLMs is here! With it comes Guanaco, a chatbot on a single GPU, achieving 99% ChatGPT performance on the Vicuna benchmark: Paper: arxiv.org/abs/2305.14314 Code+Demo: github.com/artidoro/qlora Samples: colab.research.google.com/drive/1kK6xasH… Colab: colab.research.google.com/drive/17XEqL1J…
1.6M
Tim Dettmers
@Tim_Dettmers
Nov 12, 2024
This is the most important paper in a long time . It shows with strong evidence we are reaching the limits of quantization. The paper says this: the more tokens you train on, the more precision you need. This has broad implications for the entire field and the future of GPUs🧵
Tanishq Kumar
@tanishqkumar07
Nov 11, 2024
[1/7] New paper alert! Heard about the BitNet hype or that Llama-3 is harder to quantize? Our new work studies both! We formulate scaling laws for precision, across both pre and post-training arxiv.org/pdf/2411.04330. TLDR; - Models become harder to post-train quantize as they
697K
Tim Dettmers
@Tim_Dettmers
Oct 8, 2021
I am excited to share my latest work: 8-bit optimizers – a replacement for regular optimizers. Faster 🚀, 75% less memory 🪶, same performance📈, no hyperparam tuning needed 🔢. 🧵/n Paper: arxiv.org/abs/2110.02861 Library: github.com/facebookresear… Video: youtube.com/watch?v=IxrlHA…
Tim Dettmers
@Tim_Dettmers
Sep 8, 2025
It feels the coding agent frontier is now open-weights: GLM 4.5 costs only $3/month and is on par with Sonnet Kimi K2.1 Turbo is 3x speed, 7x cheaper vs Opus 4.1, but as good Kimi K2.1 feels clean. The best model for me. GPT-5 is only good for complicated specs -- too slow.
243K
Tim Dettmers
@Tim_Dettmers
May 6, 2023
Replying to @karpathy
Super excited to push this even further: - Next week: bitsandbytes 4-bit closed beta that allows you to finetune 30B/65B LLaMA models on a single 24/48 GB GPU (no degradation vs full fine-tuning in 16-bit) - Two weeks: Full release of code, paper, and a collection of 65B models
366K
Tim Dettmers
@Tim_Dettmers
Jun 6, 2023
We present SpQR, which allows lossless LLM inference at 4.75 bits with a 15% speedup. You can run a 33B LLM on a single 24GB GPU fully lossless. SpQR works by isolating sensitive weights with higher precision and roughly doubles improvements from GPTQ: arxiv.org/abs/2306.03078🧵
246K
Tim Dettmers
@Tim_Dettmers
Aug 17, 2022
We release LLM.int8(), the first 8-bit inference method that saves 2x memory and does not degrade performance for 175B models by exploiting emergent properties. Read More: Paper: arxiv.org/abs/2208.07339 Software: huggingface.co/blog/hf-bitsan… Emergence: timdettmers.com/2022/08/17/llm…
Tim Dettmers
@Tim_Dettmers
Apr 8, 2020
How can you successfully train transformers on small datasets like PTB and WikiText-2? Are LSTMs better on small datasets? I ran 339 experiments worth 568 GPU hours and came up with some answers. I do not have time to write a blog post, so here a twitter thread instead. 1/n
Tim Dettmers
@Tim_Dettmers
Dec 26, 2024
Reading the report, this is such clean engineering under resource constraints. The DeepSeek team directly engineered solutions to known problems under hardware constraints. All of this looks so elegant -- no fancy "academic" solutions, just pure, solid engineering. Respect 👏
DeepSeek
@deepseek_ai
Dec 26, 2024
🚀 Introducing DeepSeek-V3! Biggest leap forward yet: ⚡ 60 tokens/second (3x faster than V2!) 💪 Enhanced capabilities 🛠 API compatibility intact 🌍 Fully open-source models & papers 🐋 1/n
GIF
90K
Tim Dettmers
@Tim_Dettmers
Aug 10, 2022
We release the public beta for bnb-int8🟪 for all @huggingface 🤗models, which allows for Int8 inference without performance degradation up to scales of 176B params 📈. You can run OPT-175B/BLOOM-176B easily on a single machine 🖥️. You can try it here: docs.google.com/document/d/1Jx… 1/n
Tim Dettmers
@Tim_Dettmers
Sep 7, 2020
Updated GPU recommendations for the new Ampere RTX 30 series are live! Performance benchmarks, architecture details, Q&A of frequently asked questions, and detailed explanations of how GPUs and Tensor Cores work for those that want to learn more: timdettmers.com/2020/09/07/whi…
Tim Dettmers
@Tim_Dettmers
Sep 12, 2025
I should really write a blog post about how attention sinks relate to outliers and information processing in transformers. Almost all data is out there in papers, and if you pull things together it is easier to understand what is going on
tensorqt
@tensorqt
Aug 24, 2025
attention sinks may be a bias in causal transformers. as some of you know, i've been writing a long blogpost on attention and its properties as a message-passing operation on graphs. while doing so, i figured i might have found an explanation for which attention sinks may be an
76K
Tim Dettmers
@Tim_Dettmers
Jan 16, 2023
In the RTX 40 post, I introduce a GPU recommendation chart and discuss the new Tensor Memory Accelerator (TMA) and FP8 computation. Overall, RTX 40s are faster for inference and shine through their FP8 performance but are inefficient for 16-bit training. timdettmers.com/2023/01/16/whi…
222K