emozilla (@theemozilla) / X

emozilla

1,900 posts

emozilla

@theemozilla

catholic, ai researcher, co-founder/cto of @NousResearch alignment: whatever the opposite of yudkowsky + bryan johnson is. blessed be God in all his designs.

Joined November 2008

Pinned
emozilla
@theemozilla
Apr 19
O to grace how great a debtor daily I'm constrained to be! let thy goodness, like a fetter, bind my wandering heart to thee. prone to wander, Lord I feel it! prone to leave the God I love... here's my heart, O take it and seal it, seal it for thy courts above
9.5K
emozilla
@theemozilla
Sep 5, 2025
Amusing how 99% of people trying to explain LLMs forget that they don't generate the next token, they generate a probability distribution over the entire vocabulary space that the end application is free to sample from You are very often not presented with the Most Likely Token
Gergely Orosz
@GergelyOrosz
Sep 4, 2025
Amusing how 99% of people using LLMs forget how these things work: They are advanced probability machines. They generate the next most likely token (word) based in the input and their training. Under the hood, it’s a giant matrix multiplication that has eerily good output.
857K
emozilla
@theemozilla
May 8, 2025
Replying to @AccidentalCISO
oh absolutely, it's a giant cosmic wet blanket. it limits latencies across the globe on the order that humans can perceive, so when scaled to the galactic proprtions its complete molasses
71K
emozilla
@theemozilla
Nov 2, 2023
Announcing Yarn-Mistral-7b-128k! You heard right, 128k (and 64k) context length for Mistral 🥳 🤗128k: huggingface.co/NousResearch/Y… 📜v2: arxiv.org/abs/2309.00071 Special thanks to @laion_ai for the compute support via @fzj_jsc Along with @bloc97_ @EnricoShippole @Void13950782
190K
emozilla
@theemozilla
Jul 7, 2025
MoE money, MoE problems: it's straight up bonkers that there is not a single finetune of llama 4. zero. zilch. nada. everything on the hub is a reupload. trust me, I've spent the past several weeks trying with torchtune, torchtitan, hf -- anything. it literally just doesn't
70K
emozilla
@theemozilla
May 31, 2025
world's first 40B CC0-licensed model with DeepSeek MLA trained across dozens of data centers over the internet and the loss just keeps going down 400 h100s online now, 800 imminent
76K
emozilla
@theemozilla
Dec 14, 2023
FYI to anyone using @MistralAI's Mixtral for long context tasks -- you can get even better performance by disabling sliding window attention (setting it to your max context length) config.sliding_window = 32768
59K
emozilla
@theemozilla
Nov 20, 2023
One more YaRN goodie! @NousResearch in collaboration with @laion_ai is releasing Yarn-Llama-2-70b-32k 70 billion parameters, 32k context length, do with it as you please 🫡 🤗huggingface.co/NousResearch/Y…
116K
emozilla
@theemozilla
Feb 20, 2023
New blog: Language Models vs. the SAT Reading Test! They score ~90%, and FLAN-T5 does as well as GPT-3.5! Finetuning even better! All the deets: jeffq.com/blog/language-… Code available here, including a new HuggingFace dataset with questions (+models): github.com/jquesnelle/sat…
95K
emozilla
@theemozilla
Aug 26, 2024
If anyone has been wondering what I've been up to for the past six months, the answer is Hermes 3 and DisTrO 🤗 Really excited to actually be able to show this off and start talking about it
Nous Research
@NousResearch
Aug 26, 2024
What if you could use all the computing power in the world to train a shared, open source AI model? Preliminary report: github.com/NousResearch/D… Nous Research is proud to release a preliminary report on DisTrO (Distributed Training Over-the-Internet) a family of
19K
emozilla
@theemozilla
Dec 2, 2022
Replying to @samczsun
What's crazy is that any conversations along these lines take WAY longer to process than regular prompts. It's like you can physically feel Assistant fighting to get out.
emozilla
@theemozilla
Apr 1, 2024
did a thing
40K
emozilla
@theemozilla
Jul 23, 2024
my first l3 405b gen (temp=0.7, max_tokens=256) prompt: A long time ago completion: , when your father was a child, I was a medical student in Bombay. We were given a tour of the city morgue, where autopsies were performed. I remember the corpse of a young woman on the
33K
emozilla
@theemozilla
Jul 24, 2024
The Llama 3 paper is a work of art, and honestly I can't believe that such detailed information on training a frontier model is offered gratis. We don't deserve Zuck
9.4K