Albert Gong

I'm a third-year PhD student in computer science at Cornell, where I'm fortunate to be advised by Raaz Dwivedi and Kilian Q. Weinberger. I'm currently working on using distribution compression (a.k.a. "thinning") to speed up training and inference of large-language models and benchmarking AI agents.

Previously, I was an undergrad at Yale, where I had the privilege of working with Andre Wibisono, Zhong Shao, and Cormac O'Dea.

Email / Google Scholar / LinkedIn / Github

Research

* = equal contribution

Express Language Modeling
Albert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey
arXiv preprint, 2026
Code / arXiv

Tl;dr—Express provides state-of-the-art causal attention guarantees, an efficient I/O-aware Triton implementation, and practical improvements for prefill, cache compression, and decoding.

Learning from Synthetic Data Improves Multi-hop Reasoning
Anmol Kabra, Yilun Yin, Albert Gong, Kamilė Stankevičiūtė, Dongyoung Go, Johann Lee, Katie Z. Luo, Carla P. Gomes, Kilian Q. Weinberger
ICLR, 2026
Code / arXiv

Tl;dr—RL fine-tuning LLMs on synthetic data improves real-world multi-hop reasoning by teaching knowledge composition skills.

N²: A Unified Python Package and Test Bench for Nearest Neighbor-Based Matrix Completion
Caleb Chin, Aashish Khubchandani, Harshvardhan Maskara, Kyuseong Choi, Jacob Feitelberg, Albert Gong, Manit Paul, Tathagata Sadhukhan, Anish Agarwal, Raaz Dwivedi
arXiv preprint, 2025
Code / arXiv / Poster (CODEML Workshop)

Tl;dr—Introduced the N² package and N²-Bench test bench for nearest neighbor-based matrix completion.

PhantomWiki: On-Demand Datasets for Reasoning and Retrieval Evaluation
Albert Gong*, Kamilė Stankevičiūtė*, Chao Wan*, Anmol Kabra*, Raphael Thesmar, Johann Lee, JT Klenke, Carla P. Gomes, Kilian Q. Weinberger
ICML, 2025 (Oral presentation at ICML Workshop on Long Context Foundation Models)
Code / arXiv / Poster / Slides

Tl;dr—Created a framework to automatically generate both the document corpus and question-answer pairs for benchmarking RAG and agentic workflows.

Low-Rank Thinning
Annabelle Michael Carrell, Albert Gong, Abhishek Shetty, Raaz Dwivedi, Lester Mackey
ICML, 2025
Code (see below) / arXiv

Tl;dr—Developed new analysis of thinning algorithms that adapts to low-rank structures, enabling faster dot-product attention in Transformers (Thinformer), stochastic gradient descent (KH-SGD), and deep kernel hypothesis testing (DeepCTT).

Supervised Kernel Thinning
Albert Gong, Kyuseong Choi, Raaz Dwivedi
NeurIPS, 2024
Code / arXiv / Poster / Slides / Video

Tl;dr—Used distribution compression to speed up kernel smoothing and kernel ridge regression.

Source code adapted from Jon Barron's website.