Nora Belrose (@norabelrose) / X

Nora Belrose

10.1K posts

Nora Belrose

@norabelrose

AIs aren't people, they're tools we should use wisely. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther's.

Everywhere all at once

Joined April 2016

Pinned
Nora Belrose
@norabelrose
Oct 11, 2025
If we only care about appearances, outcomes, and results then AI will replace humans everywhere If we care about the process used to create things then humans will still have a meaningful role in the future The idea that ends can be detached from means is the root of many evils
2.8K
Nora Belrose
@norabelrose
Dec 5, 2024
If deep learning can predict weather better than an explicit physics simulation, does that mean that deep learning is more "fundamental" than physics? Or that nothing is fundamental?
Google DeepMind
@GoogleDeepMind
Dec 4, 2024
Today in @Nature, we’re presenting GenCast: our new AI weather model which gives us the probabilities of different weather conditions up to 15 days ahead with state-of-the-art accuracy. ☁️⚡ Here’s how the technology works. 🧵goo.gle/49trAOv
641K
Nora Belrose
@norabelrose
Jun 7, 2023
Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. 🧵
arxiv.org
LEACE: Perfect linear concept erasure in closed form
Concept erasure aims to remove specified features from an embedding. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept...
299K
Nora Belrose
@norabelrose
Oct 21, 2024
This is a great paper. It points out: 1. Humans do not even approximately behave according to rational choice theory 2. There is no reason to think advanced AI will "inevitably" maximize some utility function 3a. Human preferences are derivative / constructed, so aligning AI by
xuan (ɕɥɛn / sh-yen)
@xuanalogue
Sep 3, 2024
Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!
158K
Nora Belrose
@norabelrose
Mar 15, 2023
Ever wonder how a language model decides what to say next? Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵
155K
Nora Belrose
@norabelrose
Mar 14, 2024
Second hand rumor: Sam Altman thinks GPT-4.5 will automate 100 million jobs globally
369K
Nora Belrose
@norabelrose
Oct 13, 2024
i'm trans, and i'm annoyed with both sides of the trans "issue" these days. for years the trans movement argued something like: 1. ⁠gender is an essential real thing 2.⁠ your gender is whatever you say it is 3.⁠ ⁠⁠the law should force everyone to accept your stated gender
Dr Jordan B Peterson
@jordanbpeterson
May 16, 2024
12x the suicide rate post "gender affirming" surgery The butchers and liars were murderously wrong The Cass report indicated this Canada and the US are still enabling this That's you @POTUS and @JustinTrudeau and it is utterly barbarous and inexcusable Putting children to the
106K
Nora Belrose
@norabelrose
Oct 11, 2024
If you make a drawing in the weight matrices of your neural network at initialization, it will likely still be visible at the end of training arxiv.org/abs/2012.02550
185K
Nora Belrose
@norabelrose
Dec 11, 2024
How do a neural network's final parameters depend on its initial ones? In this new paper, we answer this question by analyzing the training Jacobian, the matrix of derivatives of the final parameters with respect to the initial parameters. arxiv.org/abs/2412.07003
62K
Nora Belrose
@norabelrose
Nov 29, 2023
Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.
AI is easy to control
From optimists.ai
363K
Nora Belrose
@norabelrose
Sep 25, 2022
It seems pretty likely that "fake emulations" of people, or AIs trained on boatloads of lifelogging data to imitate a person, will be feasible well before we have safe and reliable mind uploading tech. The implications of this are pretty weird.
Nora Belrose
@norabelrose
Dec 15, 2023
Replying to @gmiller
The terrorism argument against open source AI also applies to anything that increases the effective intelligence of humans: the internet, public education, nutrition, etc. It's a fully general argument against human empowerment.
593K
Nora Belrose
@norabelrose
Dec 28, 2023
I don’t really care what the current law on this is, but we should be working to destroy copyright as thoroughly as possible so I am on OpenAI’s side in this case.
1M
Nora Belrose
@norabelrose
Dec 15, 2024
Willow is zero evidence that there is a quantum multiverse. Every major interpretation of quantum mechanics, including all those that don't posit many worlds (relational quantum mechanics, QBism, etc.) predict that quantum computing should be possible, equally strongly.
Tsarathustra
@tsarnick
Dec 14, 2024
Marc Andreessen says the implication of Google's quantum computer is that it is performing computation across many parallel universes and therefore the multiverse is real
00:00
72K