Vasu Shyam (@vasud3vshyam) / X

Vasu Shyam

632 posts

Vasu Shyam

@vasud3vshyam

Currently working as a machine learning researcher at a Silicon Valley startup. Former physics postdoc at Stanford and Branco Weiss fellow.

San Francisco CA

Joined January 2023

Pinned
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Ever looked at the attention operation and said "hang on, that's a one-point function!"?
631K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
Well, if the answer to any of these questions was "no" then consider reading: arxiv.org/pdf/2408.04093 that I co authored with @J_Pilault @nshepperd1, @BerenMillidge, @QuentinAnthon15
Jonathan Pilault
@J_Pilault
Aug 12, 2024
Zyphra is proud to release Tree Attention, a fast inference method for extremely large sequence lengths • 8x faster inference speed vs. Ring Attention • 2x less peak memory • low data communication volumes Paper: arxiv.org/abs/2408.04093 Code: github.com/Zyphra/tree_at… A 🧵
17K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
Having noticed that, did you then write down the generating function?
15K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
And then did you see how much faster than Ring Attention this method ends up being for decoding?
12K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
Then, did you happen to recall that thanks to automatic differentiation (timvieira.github.io/blog/post/2016…) the time complexity to compute the gradient of a function is roughly equivalent to the complexity of computing the function itself?
14K
Vasu Shyam
@vasud3vshyam
Jan 10, 2024
Finally managed retire early from my professional physics research career (I know, I know, in time it would have retired me). Eagerly looking forward to going full crackpot as an amateur.
11K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
Consequently, did you realize that the efficient tree reductions here can be done on a DGX cluster via NCCL Allreduce in a topology-aware, efficiently overlapped manner?
12K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
And how much lower the peak memory is?
11K
Vasu Shyam
@vasud3vshyam
Mar 9, 2024
youtu.be/ZCIho8geEfI?si… Podcast is back! Thanks @quantum_geoff and Suvrat Raju for participating!
27K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
If you thought this was interesting, wait till you see what else my incredible colleagues at @ZyphraAI are up to!
12K
Vasu Shyam
@vasud3vshyam
Aug 12, 2024
Replying to @vasud3vshyam
OK that was a fun 15 minutes of twitter fame. Now for the downfall - the tweet at the top's got an embarrassing typo, bonus points for whoever catches it
8.4K
Vasu Shyam
@vasud3vshyam
Aug 23, 2023
John Donoghue @JFdonoghue1033 kindly joined my podcast yesterday and clearly explained in what sense we already have quantum gravity at low energies:
4.7K
Vasu Shyam
@vasud3vshyam
Aug 14, 2024
Replying to @ylecun
Thanks for sharing! Another little trick that might amuse you is that we identified a function which upon minimization produces the forward pass of the attention block:
2.1K
Vasu Shyam
@vasud3vshyam
Jan 26, 2023
Replying to @CburgesCliff
Thanks for this, Cliff. What's a good reference that you'd recommend to learn about the current status of the Baryon Asymmetry problem?
1.1K