He He (@hhexiy) / X

He He

154 posts

He He

@hhexiy

NLP researcher. Assistant Professor at NYU CS & CDS.

Joined December 2016

Pinned
He He
@hhexiy
Mar 25
Article
What research looks like with agents
I recently gave Codex a real research problem and let it run for hours. The result surprised me. My original goal was modest: I mostly wanted to see how long I could make it run productively on my...
118K
He He
@hhexiy
Dec 14, 2024
Unbelievable. This quote is blatantly false and unnecessary for the argument. And she surely had expected the backlash with the patronizing NOTE. This is racism, not "cultural generalization". @NeurIPSConf
Jiao Sun
@sunjiao123sun_
Dec 14, 2024
Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
24K
He He
@hhexiy
Jun 5, 2018
Excited to join NYU!
Yann LeCun
@ylecun
Jun 5, 2018
Welcome to NYU He He! facebook.com/yann.lecun/pos…
He He
@hhexiy
Aug 17, 2021
@kchonyc and I are hiring a post-doc. Come help us figure out how an agent can learn by reading manuals and watching videos! Looking for expertise in multimodal reasoning, few-shot learning, QA/dialogue. Get in touch or apply at apply.interfolio.com/92494
He He
@hhexiy
Oct 14, 2025
Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇
Xinpeng Wang
@XinpengWang_
Oct 7, 2025
‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual
24K
He He
@hhexiy
Dec 12, 2023
Have LLMs mastered deductive reasoning? Check out PrOntoQA-OOD, a synthetic dataset using a complete set of deduction rules. arxiv.org/abs/2305.15269 Stop by the poster on Wed at 10:45-12:45 and ask Abu Saparov all about reasoning (w or w/o LLMs)! #NeurIPS2023
25K
He He
@hhexiy
Jun 4, 2025
Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.
Jiaxin Wen
@jiaxinwen22
Jun 3, 2025
Most promising-looking AI research ideas don’t pan out, but testing them burns through compute and labor. Can LMs predict idea success without running any experiments? We show that they do it better than human experts!
11K
He He
@hhexiy
Apr 14, 2023
Thanks @_jasonwei for a fantastic and timely lecture! We had a full house and half an hour discussion. Stay tuned for @hwchung27 's lecture on RLHF in two weeks (nyu-cs2590.github.io/spring2023/cal…)!
Jason Wei
@_jasonwei
Apr 13, 2023
I gave an invited lecture at New York University for @hhexiy's class! I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides: docs.google.com/presentation/d…
nyu-cs2590.github.io
Calendar
Listing of course modules and topics.
42K
He He
@hhexiy
Dec 11, 2023
If you are interested in truthfulness/interpretability of LLMs, chat with @javirandor at #NeurIPS2023 !
Javier Rando
@javirandor
Dec 6, 2023
🧵 New paper: “Personas as a Way to Model Truthfulness in Language Models” We introduce empirical evidence suggesting LLMs may use “personas” to model truthfulness and improve generalization. arxiv.org/abs/2310.18168
15K
He He
@hhexiy
Jan 19, 2024
Congratulations again @thtrieu_ ! Thanks for bringing me on this quest and can't wait to see the next rabbit you pull!
NYU Courant
@NYU_Courant
Jan 19, 2024
Congratulations to Trieu Trinh (@thtrieu_ ) on the launch of AlphaGeometry, an AI system capable of solving Olympiad-level geometry problems. Advised by He He (@hhexiy), Dr. Trinh defended his doctoral dissertation on this topic just last week!
18K
He He
@hhexiy
Oct 24, 2025
@haizelabs is one of the few truly tackling the hard problem of LLM eval and oversight. Excited to support their mission!
Leonard Tang
@leonardtang_
Oct 24, 2025
We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, human–AI collaboration, and reasoning.
9.1K
He He
@hhexiy
Jan 29, 2022
It’d be great if the ARR @ReviewAcl meta review provides two scores, one on significance of ideas/results and one on revisions needed. The two are kind of conflated now; what should be the score of a perfectly-executed, low-impact paper?
He He
@hhexiy
Sep 15, 2021
New work on OOD detection with @uditarora09 & @WillHuang93! OODs are notoriously hard to define. We try to construct realistic pairs of ID/OOD sets and find that they reveal distinct failure modes of different detection methods.
Udit Arora
@uditarora09
Sep 15, 2021
1/5 New paper @emnlpmeeting! “Types of Out-of-distribution Texts and How to Detect Them” with @WillHuang93 and @hhexiy: arxiv.org/abs/2109.06827. TL;DR: Our results call for an explicit definition of OOD examples when evaluating different detection methods.
He He
@hhexiy
Sep 20, 2024
Check out Jiaxin's work on how RLHFed model excels at impressing humans, not the actual tasks!
Jiaxin Wen
@jiaxinwen22
Sep 20, 2024
RLHF is a popular method. It makes your human eval score better and Elo rating 🚀🚀. But really❓Your model might be “cheating” you! 😈😈 We show that LLMs can learn to mislead human evaluators via RLHF. 🧵below
5.6K