Log inSign up
emozilla
Nous Research
1,900 posts
Image
user avatar
emozilla
Nous Research
@theemozilla
catholic, ai researcher, co-founder/cto of @NousResearch alignment: whatever the opposite of yudkowsky + bryan johnson is. blessed be God in all his designs.
jeffq.com
Joined November 2008
1,174
Following
12.3K
Followers
  • Pinned
    user avatar
    emozilla
    Nous Research
    @theemozilla
    Apr 19
    O to grace how great a debtor daily I'm constrained to be! let thy goodness, like a fetter, bind my wandering heart to thee. prone to wander, Lord I feel it! prone to leave the God I love... here's my heart, O take it and seal it, seal it for thy courts above
    9.5K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Sep 5, 2025
    Amusing how 99% of people trying to explain LLMs forget that they don't generate the next token, they generate a probability distribution over the entire vocabulary space that the end application is free to sample from You are very often not presented with the Most Likely Token
    user avatar
    Gergely Orosz
    The Pragmatic Engineer
    @GergelyOrosz
    Sep 4, 2025
    Amusing how 99% of people using LLMs forget how these things work: They are advanced probability machines. They generate the next most likely token (word) based in the input and their training. Under the hood, it’s a giant matrix multiplication that has eerily good output.
    857K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    May 8, 2025
    Replying to @AccidentalCISO
    oh absolutely, it's a giant cosmic wet blanket. it limits latencies across the globe on the order that humans can perceive, so when scaled to the galactic proprtions its complete molasses
    71K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Nov 2, 2023
    Announcing Yarn-Mistral-7b-128k! You heard right, 128k (and 64k) context length for Mistral 🥳 🤗128k: huggingface.co/NousResearch/Y… 📜v2: arxiv.org/abs/2309.00071 Special thanks to @laion_ai for the compute support via @fzj_jsc Along with @bloc97_ @EnricoShippole @Void13950782
    Image
    190K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Jul 7, 2025
    MoE money, MoE problems: it's straight up bonkers that there is not a single finetune of llama 4. zero. zilch. nada. everything on the hub is a reupload. trust me, I've spent the past several weeks trying with torchtune, torchtitan, hf -- anything. it literally just doesn't
    Image
    70K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    May 31, 2025
    world's first 40B CC0-licensed model with DeepSeek MLA trained across dozens of data centers over the internet and the loss just keeps going down 400 h100s online now, 800 imminent
    Image
    76K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Dec 14, 2023
    FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768
    Image
    59K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Nov 20, 2023
    One more YaRN goodie! @NousResearch in collaboration with @laion_ai is releasing Yarn-Llama-2-70b-32k 70 billion parameters, 32k context length, do with it as you please 🫡 🤗huggingface.co/NousResearch/Y…
    Image
    116K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Feb 20, 2023
    New blog: Language Models vs. the SAT Reading Test! They score ~90%, and FLAN-T5 does as well as GPT-3.5! Finetuning even better! All the deets: jeffq.com/blog/language-… Code available here, including a new HuggingFace dataset with questions (+models): github.com/jquesnelle/sat…
    Image
    95K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Aug 26, 2024
    If anyone has been wondering what I've been up to for the past six months, the answer is Hermes 3 and DisTrO 🤗 Really excited to actually be able to show this off and start talking about it
    user avatar
    Nous Research
    @NousResearch
    Aug 26, 2024
    What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: github.com/NousResearch/D… Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
    Image
    19K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Dec 2, 2022
    Replying to @samczsun
    What's crazy is that any conversations along these lines take WAY longer to process than regular prompts. It's like you can physically feel Assistant fighting to get out.
    Image
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Apr 1, 2024
    did a thing
    Image
    40K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Jul 23, 2024
    my first l3 405b gen (temp=0.7, max_tokens=256) prompt: A long time ago completion: , when your father was a child, I was a medical student in Bombay. We were given a tour of the city morgue, where autopsies were performed. I remember the corpse of a young woman on the
    33K
  • user avatar
    emozilla
    Nous Research
    @theemozilla
    Jul 24, 2024
    The Llama 3 paper is a work of art, and honestly I can't believe that such detailed information on training a frontier model is offered gratis. We don't deserve Zuck
    9.4K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms of Service|Privacy Policy|Cookie Policy|Accessibility|Ads info|© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up