I'm a third-year PhD student in computer science at Cornell, where I'm fortunate to be advised by Raaz Dwivedi and Kilian Q. Weinberger.
I'm currently working on using distribution compression (a.k.a. "thinning") to speed up training and inference of large-language models and benchmarking AI agents.
Tl;dr—Express provides state-of-the-art causal attention guarantees, an efficient I/O-aware Triton implementation, and practical improvements for prefill, cache compression, and decoding.
Tl;dr—Developed new analysis of thinning algorithms that adapts to low-rank structures, enabling faster dot-product attention in Transformers (Thinformer), stochastic gradient descent (KH-SGD), and deep kernel hypothesis testing (DeepCTT).