Log inSign up
Jonas Geiping
667 posts
Image
user avatar
Jonas Geiping
@jonasgeiping
Machine Learning Researcher in Tübingen at the ELLIS Institute & Max-Planck for Intelligent Systems // Working on Safety & Efficiency of modern ML
Tübingen AI Center, Germany
jonasgeiping.github.io
Joined September 2021
890
Following
5,269
Followers
  • Pinned
    user avatar
    Jonas Geiping
    @jonasgeiping
    May 13
    We’re training models wrong and it’s due to chatGPT. Even the modern coding agents used daily still use message-based exchanges: They send messages to users, to themselves (CoT) and to tools, and receive messages in turn. This bottlenecks even very intelligent agents to a single
    Image
    GIF
    158K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Ok, so I can finally talk about this! We spent the last year (actually a bit longer) training an LLM with recurrent depth at scale. The model has an internal latent space in which it can adaptively spend more compute to think longer. I think the tech report ...🐦‍⬛
    Image
    370K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    What is it doing when it thinks longer? We find evidence for pretty advanced structures in latent space, such as the tendency to use orbitals (see picture) to compute arithmetic tasks and reasoning about sentence structure So, this model really is a 🔄 shape-rotator 🔄
    Image
    25K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    You can find the model here: huggingface.co/tomg-group-umd… The code here: github.com/seal-rg/recurr… and the tech report here: arxiv.org/abs/2502.05171 All data is public, and intermediate checkpoints are available!
    Image
    tomg-group-umd/huginn-0125 · Hugging Face
    From huggingface.co
    15K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Jun 30, 2025
    (Structured) Model pruning is a nice tool when you really need to deploy a model that is a *bit* smaller, but don't want to deploy a bigger hammer like quantization. We recently published an improved *automated* model pruning method, surprisingly based on model merging:
    Image
    36K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    has something for everyone, new model architecture, optimizer details, AMD training (we trained on 4096 AMD GPUs), our data pipeline, pretraining details, and lots of analysis! Here are a few of my highlights:
    23K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Jan 6, 2023
    Last week, @tomgoldsteincs and me finally put our paper on cramming BERT training into limited resources on arxiv: arxiv.org/abs/2212.14034. Here are some remaining thoughts from my side: 1/9
    arXiv logo
    arxiv.org
    Cramming: Training a Language Model on a Single GPU in One Day
    Recent trends in language modeling have focused on increasing performance through scaling, and have resulted in an environment where training language models is out of reach for most researchers...
    39K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    First, the model (with 3.5B params), even though trained semi-optimally, and for 800B tokens, is competive with 7B open-source models trained for 2-3T tokens (OLMo-v1) - but we can't beat the new OLMo data recipe (yet) This is pretty exciting, for our first large-scale run
    Image
    23K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Oct 19, 2023
    Happy to announce that I've joined the ELLIS Institute and the Max-Planck for Intelligent Systems in Tübingen as a group leader! I'm excited to take a deep dive into the safety, security and efficiency of machine learning in the next years, working with both institutes 🦭.
    Town square Tübingen
    MPI-IS
    Image
    Image
    user avatar
    Intelligent Systems
    @MPI_IS
    Oct 19, 2023
    The first Group Leaders join our ELLIS Institute #Tübingen gGmbH as Hector Endowed Fellows: @orvieto_antonio, @wielandbr, @CeleMenDu, @jonasgeiping They will conduct cutting-edge #research in close collaboration with us and the Tübingen #AI Center: bit.ly/3PYCCSF
    50K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    What is pretty exciting is that simply by training with our arch and objective, a separation emerges from scale - the model's latents converge quicker for some tokens in a sentence than others, In this figure the model takes more time to think about the key parts of the text:
    Image
    18K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Jul 19, 2024
    Modern LLMs have large vocab sizes and long seq lengths which leads to an annoying peak in memory due to logit activations... .... so, I wasted some time last month writing a fused triton kernel to do nn.Linear + nn.CrossEntropyLoss without a memory peak ⬇️
    Image
    23K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Jun 1, 2023
    How can you watermark the output of a diffusion model? Ideally, with a method than can be easily incorporated into existing pipelines, is invisible, and is robust to image manipulations? We look at this question in ***Tree-Ring Watermarks: Fingerprints for Diffusion Images***
    Image
    78K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 10, 2025
    Replying to @jonasgeiping
    We had enough compute for only a single shot to train at scale (and that is the model we've published). On reasoning tasks like GSM8k, the model is pretty competitive, even compared to other pretrained open-source models, even though we have done no post/mid-training...
    Image
    20K
  • user avatar
    Jonas Geiping
    @jonasgeiping
    Feb 28, 2025
    Modern LLMs are strong coders, as measured, for example, by their codeforces rating. But, are they also as capable of finding subtle bugs? We look at debugging codeforce submissions, and find that finding errors (and falsifying) is still very challenging More details here:
    user avatar
    Shiven Sinha
    @shiven_sinha
    Feb 28, 2025
    AI can generate correct-seeming hypotheses (and papers!). Brandolini's law states BS is harder to refute than generate. Can LMs falsify incorrect solutions? o3-mini (high) scores just 9% on our new benchmark REFUTE. Verification is not necessarily easier than generation 🧵
    Image
    18K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up