Log inSign up
Nora Belrose
10.1K posts
Image
user avatar
Nora Belrose
@norabelrose
AIs aren't people, they're tools we should use wisely. Head of interpretability research at @AiEleuther, but tweets are my own views, not Eleuther's.
Everywhere all at once
Joined April 2016
114
Following
11.2K
Followers

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up
  • Pinned
    user avatar
    Nora Belrose
    @norabelrose
    Oct 11, 2025
    If we only care about appearances, outcomes, and results then AI will replace humans everywhere If we care about the process used to create things then humans will still have a meaningful role in the future The idea that ends can be detached from means is the root of many evils
    2.8K
  • user avatar
    Nora Belrose
    @norabelrose
    Dec 5, 2024
    If deep learning can predict weather better than an explicit physics simulation, does that mean that deep learning is more "fundamental" than physics? Or that nothing is fundamental?
    user avatar
    Google DeepMind
    @GoogleDeepMind
    Dec 4, 2024
    Today in @Nature, we’re presenting GenCast: our new AI weather model which gives us the probabilities of different weather conditions up to 15 days ahead with state-of-the-art accuracy. ☁️⚡ Here’s how the technology works. 🧵goo.gle/49trAOv
    Image
    641K
  • user avatar
    Nora Belrose
    @norabelrose
    Jun 7, 2023
    Ever wanted to mindwipe an LLM? Our method, LEAst-squares Concept Erasure (LEACE), provably erases all linearly-encoded information about a concept from neural net activations. It does so surgically, inflicting minimal damage to other concepts. 🧵
    arXiv logo
    arxiv.org
    LEACE: Perfect linear concept erasure in closed form
    Concept erasure aims to remove specified features from an embedding. It can improve fairness (e.g. preventing a classifier from using gender or race) and interpretability (e.g. removing a concept...
    299K
  • user avatar
    Nora Belrose
    @norabelrose
    Oct 21, 2024
    This is a great paper. It points out: 1. Humans do not even approximately behave according to rational choice theory 2. There is no reason to think advanced AI will "inevitably" maximize some utility function 3a. Human preferences are derivative / constructed, so aligning AI by
    user avatar
    xuan (ɕɥɛn / sh-yen)
    @xuanalogue
    Sep 3, 2024
    Should AI be aligned with human preferences, rewards, or utility functions? Excited to finally share a preprint that @MicahCarroll @FranklinMatija @hal_ashton & I have worked on for almost 2 years, arguing that AI alignment has to move beyond the preference-reward-utility nexus!
    Image
    158K
  • user avatar
    Nora Belrose
    @norabelrose
    Mar 15, 2023
    Ever wonder how a language model decides what to say next? Our method, the tuned lens (arxiv.org/abs/2303.08112), can trace an LM’s prediction as it develops from one layer to the next. It's more reliable and applies to more models than prior state-of-the-art. 🧵
    Image
    155K
  • user avatar
    Nora Belrose
    @norabelrose
    Mar 14, 2024
    Second hand rumor: Sam Altman thinks GPT-4.5 will automate 100 million jobs globally
    369K
  • user avatar
    Nora Belrose
    @norabelrose
    Oct 13, 2024
    i'm trans, and i'm annoyed with both sides of the trans "issue" these days. for years the trans movement argued something like: 1. ⁠gender is an essential real thing 2.⁠ your gender is whatever you say it is 3.⁠ ⁠⁠the law should force everyone to accept your stated gender
    user avatar
    Dr Jordan B Peterson
    Peterson Academy
    @jordanbpeterson
    May 16, 2024
    12x the suicide rate post "gender affirming" surgery The butchers and liars were murderously wrong The Cass report indicated this Canada and the US are still enabling this That's you @POTUS and @JustinTrudeau and it is utterly barbarous and inexcusable Putting children to the
    106K
  • user avatar
    Nora Belrose
    @norabelrose
    Oct 11, 2024
    If you make a drawing in the weight matrices of your neural network at initialization, it will likely still be visible at the end of training arxiv.org/abs/2012.02550
    Image
    185K
  • user avatar
    Nora Belrose
    @norabelrose
    Dec 11, 2024
    How do a neural network's final parameters depend on its initial ones? In this new paper, we answer this question by analyzing the training Jacobian, the matrix of derivatives of the final parameters with respect to the initial parameters. arxiv.org/abs/2412.07003
    62K
  • user avatar
    Nora Belrose
    @norabelrose
    Nov 29, 2023
    Introducing AI Optimism: a philosophy of hope, freedom, and fairness for all. We strive for a future where everyone is empowered by AIs under their own control. In our first post, we argue AI is easy to control, and will get more controllable over time.
    Image
    AI is easy to control
    From optimists.ai
    363K
  • user avatar
    Nora Belrose
    @norabelrose
    Sep 25, 2022
    It seems pretty likely that "fake emulations" of people, or AIs trained on boatloads of lifelogging data to imitate a person, will be feasible well before we have safe and reliable mind uploading tech. The implications of this are pretty weird.
  • user avatar
    Nora Belrose
    @norabelrose
    Dec 15, 2023
    Replying to @gmiller
    The terrorism argument against open source AI also applies to anything that increases the effective intelligence of humans: the internet, public education, nutrition, etc. It's a fully general argument against human empowerment.
    593K
  • user avatar
    Nora Belrose
    @norabelrose
    Dec 28, 2023
    I don’t really care what the current law on this is, but we should be working to destroy copyright as thoroughly as possible so I am on OpenAI’s side in this case.
    1M
  • user avatar
    Nora Belrose
    @norabelrose
    Dec 15, 2024
    Willow is zero evidence that there is a quantum multiverse. Every major interpretation of quantum mechanics, including all those that don't posit many worlds (relational quantum mechanics, QBism, etc.) predict that quantum computing should be possible, equally strongly.
    user avatar
    Tsarathustra
    @tsarnick
    Dec 14, 2024
    Marc Andreessen says the implication of Google's quantum computer is that it is performing computation across many parallel universes and therefore the multiverse is real
    Image
    00:00
    72K