Log inSign up
Ankesh Anand
1,042 posts
user avatar
Ankesh Anand
@ankesh_anand
Research scientist @googledeepmind (Gemini Thinking, Post-Training), prev phd @milamontreal. RL for Gemini 2.5, Gemini 3.0 and IMO DeepThink.
ankeshanand.com
Joined December 2011
655
Following
8,715
Followers
  • Pinned
    user avatar
    Ankesh Anand
    @ankesh_anand
    Nov 18, 2025
    Gemini3 Pro is out, very exciting to be able to push the frontier with this one! There was never a dull day post-training this model, I hope the combination of a strong base model with sota reasoning is evident! This is obviously a big leap compared to 2.5 Pro, but I am excited
    Image
    20K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Mar 25, 2025
    shoutout to the believers!
    Image
    201K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Jan 29, 2025
    The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:
    177K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Mar 30, 2025
    MathArena results for gemini-2.5-pro
    Image
    71K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Nov 5, 2019
    ICLR papers with perfect scores (all 8s, total 11 papers): 1. openreview.net/forum?id=Bygzb… "FreeLB: Enhanced Adversarial Training for Language Understanding" 2. openreview.net/forum?id=BJlrF… "BackPACK: Packing more into Backprop"
    openreview.net
    FreeLB: Enhanced Adversarial Training for Natural Language...
    Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models. In this work, we...
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Sep 20, 2022
    Excited to share that I've joined @DeepMind full-time as a Research Scientist. It's an inspiring place with a super ambitious mission, and I am looking forward to be a part of it. I'll be based in London, so if you're around, I would love to catch up ☕️!
    london skyline as seen from the waterloo bridge
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Jul 21, 2025
    We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇
    Image
    37K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Jan 29, 2020
    New blog post: Contrastive Self-Supervised Learning. Contrastive methods learn representations by encoding what makes two things similar or different. I find them very promising and go over some recent works such as DIM, CPC, AMDIM, CMC, MoCo etc.
    Image
    ankeshanand.com
    Contrastive Self-Supervised Learning
    Contrastive self-supervised learning techniques are a promising class of methods that build representations by learning to encode what makes two things similar or different.
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Nov 30, 2017
    Introducing HoME: a Household Multimodal Environment for AI agents. - 45,000 diverse 3D houses - Vision, Audio, Physics and Semantic (text) info - OpenAI Gym integration Paper: arxiv.org/abs/1711.11017 Repo: github.com/HoME-Platform/… Site: home-platform.github.io
    Image
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Dec 19, 2024
    Excited to share an early preview of our gemini 2.0 flash thinking model with all it's raw thoughts visible. Here's the model trying to solve a Putnam 2024 with multiple approaches, and then self-verifies that it's answer was correct.
    Image
    00:00
    80K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Apr 2, 2025
    📈📈📈
    Image
    Image
    user avatar
    Mislav Balunović
    @mbalunovic
    Apr 2, 2025
    Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
    84K
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Jan 18, 2021
    The RL formalism is powerful in its generality, but poses a hard problem: how can we design agents that learn efficiently & generalize well, given only sensory info and a reward signal? Self-supervision might be the answer, join us at the ICLR workshop: sslrlworkshop.github.io
    Image
  • user avatar
    Ankesh Anand
    @ankesh_anand
    May 12, 2022
    Key takeaway from Gato: If we can build specialized AI agents for 100s/1000s of tasks, it's now pretty straightforward to make a general agent that can do it all in a single model. Just tokenize data from all the tasks and feed into a transformer. Another blessing of scale!
    user avatar
    Google DeepMind
    @GoogleDeepMind
    May 12, 2022
    Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: dpmd.ai/Gato Paper: dpmd.ai/Gato-paper 1/
    Image
    00:00
  • user avatar
    Ankesh Anand
    @ankesh_anand
    Nov 8, 2021
    Model-based RL promises generalization by design, but do MBRL agents like MuZero generalize better than model-free, and benefit from self-supervision? The answer is yes! MuZero+SSL gets SotA on Procgen with 10x less data, implicit meta-RL on MetaWorld: arxiv.org/abs/2111.01587
    Image
    Image

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up