Rigorous Interpretation Is a Form of Evaluation
If held to scientific standards of falsifiability, reproducibility, and predictability, interpretation can become evaluation.

I’m a 3rd year ML/AI PhD student at USC, working with Dani Yogatama and Yan Liu. Currently, I’m visiting Harvard, working with David Alvarez-Melis and Naomi Saphra. My work is supported by the Viterbi School of Engineering Graduate Fellowship and Coefficient Giving’s Technical AI Safety Research Grant.
I’m broadly interested in training, reasoning, and interpretability - how we make sense of models, and how it might uncover the underlying science of large-scale models. In particular, I aim to (1) predict training, and (2) predict failures.
Currently, I’m exploring dynamical systems and physics-inspired approaches to analyze small-scale toy learning problems, as well as predicting/controlling larger-scale training from developmental perspectives.
If held to scientific standards of falsifiability, reproducibility, and predictability, interpretation can become evaluation.

We introduce a large-scale, complexity-annotated, verified CoT-like reasoning trace dataset showing that current LLMs still struggle with structured inference.