Ankesh Anand (@ankesh

Ankesh Anand

1,042 posts

Ankesh Anand

@ankesh_anand

Research scientist @googledeepmind (Gemini Thinking, Post-Training), prev phd @milamontreal. RL for Gemini 2.5, Gemini 3.0 and IMO DeepThink.

Joined December 2011

Pinned
Ankesh Anand
@ankesh_anand
Nov 18, 2025
Gemini3 Pro is out, very exciting to be able to push the frontier with this one! There was never a dull day post-training this model, I hope the combination of a strong base model with sota reasoning is evident! This is obviously a big leap compared to 2.5 Pro, but I am excited
20K
Ankesh Anand
@ankesh_anand
Mar 25, 2025
shoutout to the believers!
201K
Ankesh Anand
@ankesh_anand
Jan 29, 2025
The DeepSeek discourse is simultaneously under-crediting and over-crediting them for what they achieved. So, some quick thoughts:
177K
Ankesh Anand
@ankesh_anand
Mar 30, 2025
MathArena results for gemini-2.5-pro
71K
Ankesh Anand
@ankesh_anand
Nov 5, 2019
ICLR papers with perfect scores (all 8s, total 11 papers): 1. openreview.net/forum?id=Bygzb… "FreeLB: Enhanced Adversarial Training for Language Understanding" 2. openreview.net/forum?id=BJlrF… "BackPACK: Packing more into Backprop"
openreview.net
FreeLB: Enhanced Adversarial Training for Natural Language...
Adversarial training, which minimizes the maximal risk for label-preserving input perturbations, has proved to be effective for improving the generalization of language models. In this work, we...
Ankesh Anand
@ankesh_anand
Sep 20, 2022
Excited to share that I've joined @DeepMind full-time as a Research Scientist. It's an inspiring place with a super ambitious mission, and I am looking forward to be a part of it. I'll be based in London, so if you're around, I would love to catch up ☕️!
Ankesh Anand
@ankesh_anand
Jul 21, 2025
We can finally share this now: A Gemini model trained with new RL techniques and scaled up inference-time compute model has achieved gold-medal level performance at IMO 2025! 🥇
37K
Ankesh Anand
@ankesh_anand
Jan 29, 2020
New blog post: Contrastive Self-Supervised Learning. Contrastive methods learn representations by encoding what makes two things similar or different. I find them very promising and go over some recent works such as DIM, CPC, AMDIM, CMC, MoCo etc.
ankeshanand.com
Contrastive Self-Supervised Learning
Contrastive self-supervised learning techniques are a promising class of methods that build representations by learning to encode what makes two things similar or different.
Ankesh Anand
@ankesh_anand
Nov 30, 2017
Introducing HoME: a Household Multimodal Environment for AI agents. - 45,000 diverse 3D houses - Vision, Audio, Physics and Semantic (text) info - OpenAI Gym integration Paper: arxiv.org/abs/1711.11017 Repo: github.com/HoME-Platform/… Site: home-platform.github.io
Ankesh Anand
@ankesh_anand
Dec 19, 2024
Excited to share an early preview of our gemini 2.0 flash thinking model with all it's raw thoughts visible. Here's the model trying to solve a Putnam 2024 with multiple approaches, and then self-verifies that it's answer was correct.
00:00
80K
Ankesh Anand
@ankesh_anand
Apr 2, 2025
📈📈📈
Mislav Balunović
@mbalunovic
Apr 2, 2025
Big update to our MathArena USAMO evaluation: Gemini 2.5 Pro, which was released *the same day* as our benchmark, is the first model to achieve non-trivial amount of points (24.4%). The speed of progress is really mind-blowing.
84K
Ankesh Anand
@ankesh_anand
Jan 18, 2021
The RL formalism is powerful in its generality, but poses a hard problem: how can we design agents that learn efficiently & generalize well, given only sensory info and a reward signal? Self-supervision might be the answer, join us at the ICLR workshop: sslrlworkshop.github.io
Ankesh Anand
@ankesh_anand
May 12, 2022
Key takeaway from Gato: If we can build specialized AI agents for 100s/1000s of tasks, it's now pretty straightforward to make a general agent that can do it all in a single model. Just tokenize data from all the tasks and feed into a transformer. Another blessing of scale!
Google DeepMind
@GoogleDeepMind
May 12, 2022
Gato🐈a scalable generalist agent that uses a single transformer with exactly the same weights to play Atari, follow text instructions, caption images, chat with people, control a real robot arm, and more: dpmd.ai/Gato Paper: dpmd.ai/Gato-paper 1/
00:00
Ankesh Anand
@ankesh_anand
Nov 8, 2021
Model-based RL promises generalization by design, but do MBRL agents like MuZero generalize better than model-free, and benefit from self-supervision? The answer is yes! MuZero+SSL gets SotA on Procgen with 10x less data, implicit meta-RL on MetaWorld: arxiv.org/abs/2111.01587