Vaishnavh Nagarajan (@_vaishnavh) / X

Vaishnavh Nagarajan

843 posts

Vaishnavh Nagarajan

@_vaishnavh

Foundations of AI. I like simple & minimal examples and creative ideas. I also like thinking about going beyond the next token 🧮🧸 Google DeepMind | PhD, CMU

New York, NY

Joined June 2017

Pinned
Vaishnavh Nagarajan
@_vaishnavh
Jun 19
In my next blogpost, I write about how I view technical communication: it's like trying to communicate an escape route to someone without a map but with a catch: you're not with them. You only have a walkie-talkie. Also, they're in panic.
4.1K
Vaishnavh Nagarajan
@_vaishnavh
Mar 12, 2024
🗣️ “Next-token predictors can’t plan!” ⚔️ “False! Every distribution is expressible as product of next-token probabilities!” 🗣️ In work w/ @GregorBachmann1 , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴
arxiv.org
The pitfalls of next-token prediction
Can a mere next-token predictor faithfully model human intelligence? We crystallize this emerging concern and correct popular misconceptions surrounding it, and advocate a simple multi-token...
55K
Vaishnavh Nagarajan
@_vaishnavh
Dec 9, 2019
Thrilled that our paper w/ @zicokolter on generalization in deep learning has been selected for the Outstanding New Directions Paper Award at #NeurIPS2019. Extremely grateful to the selection committee, reviewers & many others who provided useful feedback to improve our paper.
Hugo Larochelle
@hugo_larochelle
Dec 8, 2019
Want to know which NeurIPS papers were selected for an award, and how the selection was done? Check out our latest blog post on the subject: medium.com/@NeurIPSConf/n…
Vaishnavh Nagarajan
@_vaishnavh
Oct 22, 2021
I guess it's time to let Twitterverse know that I successfully defended my thesis! arxiv.org/abs/2110.08922 It was deeply rewarding to put this document together as it made me reflect on many aspects of my PhD journey, both technical & personal. Really happy to share it!
arxiv.org
Explaining generalization in deep learning: progress and fundamental limits
This dissertation studies a fundamental open challenge in deep learning theory: why do deep networks generalize well even while being overparameterized, unregularized and fitting the training data...
Vaishnavh Nagarajan
@_vaishnavh
Jul 9, 2024
Looking forward to presenting our #ICML paper advocating multi-token prediction and correcting what it really means to say "next-token prediction cannot do what humans do" --- which is often argued poorly. @GregorBachmann1 and I just updated the camera ready version on arxiv.
39K
Vaishnavh Nagarajan
@_vaishnavh
Oct 30, 2020
“Understanding the failure modes of out-of-distribution generalization”, new paper w/ @bneyshabur and @AJAndreassen at Google arxiv.org/abs/2010.15775 We explain why classifiers rely on spurious correlations (e.g. bkgd.) that hold only in training. 1/
Vaishnavh Nagarajan
@_vaishnavh
Jun 29, 2021
New work w/ @yidingjiang @_christinabaek & @zicokolter on estimating the generalization error of neural networks by looking at "disagreement rates" on unlabeled data + a theory of why this works really well arxiv.org/abs/2106.13799 1/
00:00
Vaishnavh Nagarajan
@_vaishnavh
Jul 9, 2019
Excited to share our new blog post **w/ code** (downloadable as Jupyter notebook) locuslab.github.io/2019-07-09-uni… highlighting why current approaches to deriving generalization bounds in deep learning may be severely limited. arxiv.org/abs/1902.04742
Vaishnavh Nagarajan
@_vaishnavh
Dec 5, 2021
I've always felt uncomfortable seeing criteria like "(highly) motivated/passionate", "(exceptionally) talented" and "strong" [background in X] appear in calls for PhD/postdoc applications. (1/)
Vaishnavh Nagarajan
@_vaishnavh
Dec 11, 2023
Is your student a bit disobedient? 🙅 This may be a good thing! Our new paper on knowledge distillation argues why rebellious students are not just good, but can even be better than the teacher! 🧑‍🎓>>> 🧑‍🏫 arxiv.org/abs/2301.12923 #neurips 1/
21K
Vaishnavh Nagarajan
@_vaishnavh
Jun 2, 2025
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue: → LLMs are limited in creativity since they learn to predict the next token → creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
29K
Vaishnavh Nagarajan
@_vaishnavh
Jan 8, 2019
Uploaded our (with @zicokolter) NeurIPS 17(!) workshop spotlight paper "Generalization in Deep Networks: The Role of Distance from Initialization". We argued why it's important to take into account the initialization to explain generalization. arxiv.org/abs/1901.01672
Vaishnavh Nagarajan
@_vaishnavh
Mar 3, 2021
If you're worried about spurious correlations in ML models, check out my CMU AI seminar talk based on ICLR'21 work w/ @AJAndreassen and @bneyshabur on "understanding the failure modes of out-of-distribution generalization". arxiv.org/abs/2010.15775 1/3 youtube.com/watch?v=DhPMq_…
arxiv.org
Understanding the Failure Modes of Out-of-Distribution Generalization
Empirical studies suggest that machine learning models often rely on features, such as the background, that may be spuriously correlated with the label only during training time, resulting in poor...
Vaishnavh Nagarajan
@_vaishnavh
Jun 13, 2025
Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher: making it a habit to read about the ✨past ✨ and learn from it to make sense of the present
16K