Foundations of AI. I like simple & minimal examples and creative ideas. I also like thinking about going beyond the next token 🧮🧸
Google DeepMind | PhD, CMU
In my next blogpost, I write about how I view technical communication:
it's like trying to communicate an escape route to someone without a map but with a catch: you're not with them. You only have a walkie-talkie.
Also, they're in panic.
🗣️ “Next-token predictors can’t plan!” ⚔️ “False! Every distribution is expressible as product of next-token probabilities!” 🗣️
In work w/ @GregorBachmann1 , we carefully flesh out this emerging, fragmented debate & articulate a key new failure. 🔴
Thrilled that our paper w/ @zicokolter on generalization in deep learning has been selected for the Outstanding New Directions Paper Award
at #NeurIPS2019. Extremely grateful to the selection committee, reviewers & many others who provided useful feedback to improve our paper.
Want to know which NeurIPS papers were selected for an award, and how the selection was done? Check out our latest blog post on the subject:
medium.com/@NeurIPSConf/n…
I guess it's time to let Twitterverse know that I successfully defended my thesis!
arxiv.org/abs/2110.08922
It was deeply rewarding to put this document together as it made me reflect on many aspects of my PhD journey, both technical & personal. Really happy to share it!
Looking forward to presenting our #ICML paper advocating multi-token prediction and correcting what it really means to say "next-token prediction cannot do what humans do" --- which is often argued poorly.
@GregorBachmann1 and I just updated the camera ready version on arxiv.
“Understanding the failure modes of out-of-distribution generalization”, new paper w/ @bneyshabur and @AJAndreassen at Google
arxiv.org/abs/2010.15775
We explain why classifiers rely on spurious correlations (e.g. bkgd.) that hold only in training. 1/
Excited to share our new blog post **w/ code** (downloadable as Jupyter notebook) locuslab.github.io/2019-07-09-uni…
highlighting why current approaches to deriving generalization bounds in deep learning may be severely limited. arxiv.org/abs/1902.04742
I've always felt uncomfortable seeing criteria like "(highly) motivated/passionate", "(exceptionally) talented" and "strong" [background in X] appear in calls for PhD/postdoc applications. (1/)
Is your student a bit disobedient? 🙅 This may be a good thing! Our new paper on knowledge distillation argues why rebellious students are not just good, but can even be better than the teacher! 🧑🎓>>> 🧑🏫
arxiv.org/abs/2301.12923#neurips 1/
📢 New paper on creativity & multi-token prediction! We design minimal open-ended tasks to argue:
→ LLMs are limited in creativity since they learn to predict the next token
→ creativity can be improved via multi-token learning & injecting noise ("seed-conditioning" 🌱) 1/ 🧵
Uploaded our (with @zicokolter) NeurIPS 17(!) workshop spotlight paper "Generalization in Deep Networks: The Role of Distance from Initialization". We argued why it's important to take into account the initialization to explain generalization.
arxiv.org/abs/1901.01672
Wrote my first blog post! I wanted to share a powerful yet under-recognized way to develop emotional maturity as a researcher:
making it a habit to read about the ✨past ✨ and learn from it to make sense of the present