Log inSign up
He He
154 posts
user avatar
He He
@hhexiy
NLP researcher. Assistant Professor at NYU CS & CDS.
hhexiy.github.io
Joined December 2016
418
Following
8,215
Followers
  • Pinned
    user avatar
    He He
    @hhexiy
    Mar 25
    Article cover image
    Article
    What research looks like with agents
    I recently gave Codex a real research problem and let it run for hours. The result surprised me. My original goal was modest: I mostly wanted to see how long I could make it run productively on my...
    118K
  • user avatar
    He He
    @hhexiy
    Dec 14, 2024
    Unbelievable. This quote is blatantly false and unnecessary for the argument. And she surely had expected the backlash with the patronizing NOTE. This is racism, not "cultural generalization". @NeurIPSConf
    user avatar
    Jiao Sun
    @sunjiao123sun_
    Dec 14, 2024
    Mitigating racial bias from LLMs is a lot easier than removing it from humans! Can’t believe this happened at the best AI conference @NeurIPSConf We have ethical reviews for authors, but missed it for invited speakers? 😡
    Image
    24K
  • user avatar
    He He
    @hhexiy
    Jun 5, 2018
    Excited to join NYU!
    user avatar
    Yann LeCun
    @ylecun
    Jun 5, 2018
    Welcome to NYU He He! facebook.com/yann.lecun/pos…
  • user avatar
    He He
    @hhexiy
    Aug 17, 2021
    @kchonyc and I are hiring a post-doc. Come help us figure out how an agent can learn by reading manuals and watching videos! Looking for expertise in multimodal reasoning, few-shot learning, QA/dialogue. Get in touch or apply at apply.interfolio.com/92494
  • user avatar
    He He
    @hhexiy
    Oct 14, 2025
    Reward hacking means the model is making less effort than expected: it finds the answer long before its fake CoT is finished. TRACE uses this idea to detect hacking when CoT monitoring fails. Work led by @XinpengWang_ @nitishjoshi23 and @rico_angell👇
    user avatar
    Xinpeng Wang
    @XinpengWang_
    Oct 7, 2025
    ‼️Your model may be secretly exploiting your imperfect reward function without telling you in the CoT! How to detect such 'implicit' reward hacking if the model is hiding it?🧐 We introduce TRACE🕵, a method based on a simple premise: hacking is easier than solving the actual
    Image
    24K
  • user avatar
    He He
    @hhexiy
    Dec 12, 2023
    Have LLMs mastered deductive reasoning? Check out PrOntoQA-OOD, a synthetic dataset using a complete set of deduction rules. arxiv.org/abs/2305.15269 Stop by the poster on Wed at 10:45-12:45 and ask Abu Saparov all about reasoning (w or w/o LLMs)! #NeurIPS2023
    Image
    25K
  • user avatar
    He He
    @hhexiy
    Jun 4, 2025
    Automating AI research is bottlenecked by verification speed (running experiments takes time). Our new paper explores whether LLMs can tell which ideas will work before executing them, and they appear to have better research intuition than human researchers.
    user avatar
    Jiaxin Wen
    @jiaxinwen22
    Jun 3, 2025
    Most promising-looking AI research ideas don’t pan out, but testing them burns through compute and labor. Can LMs predict idea success without running any experiments? We show that they do it better than human experts!
    Image
    11K
  • user avatar
    He He
    @hhexiy
    Apr 14, 2023
    Thanks @_jasonwei for a fantastic and timely lecture! We had a full house and half an hour discussion. Stay tuned for @hwchung27 's lecture on RLHF in two weeks (nyu-cs2590.github.io/spring2023/cal…)!
    user avatar
    Jason Wei
    @_jasonwei
    Apr 13, 2023
    I gave an invited lecture at New York University for @hhexiy's class! I covered three ideas driving the LLM revolution: scaling, emergence, and reasoning. I tried to frame them in a way that reveals why large LMs are special in the history of AI. Slides: docs.google.com/presentation/d…
    nyu-cs2590.github.io
    Calendar
    Listing of course modules and topics.
    42K
  • user avatar
    He He
    @hhexiy
    Dec 11, 2023
    If you are interested in truthfulness/interpretability of LLMs, chat with @javirandor at #NeurIPS2023 !
    user avatar
    Javier Rando
    @javirandor
    Dec 6, 2023
    🧵 New paper: “Personas as a Way to Model Truthfulness in Language Models” We introduce empirical evidence suggesting LLMs may use “personas” to model truthfulness and improve generalization. arxiv.org/abs/2310.18168
    Image
    15K
  • user avatar
    He He
    @hhexiy
    Jan 19, 2024
    Congratulations again @thtrieu_ ! Thanks for bringing me on this quest and can't wait to see the next rabbit you pull!
    user avatar
    NYU Courant
    @NYU_Courant
    Jan 19, 2024
    Congratulations to Trieu Trinh (@thtrieu_ ) on the launch of AlphaGeometry, an AI system capable of solving Olympiad-level geometry problems. Advised by He He (@hhexiy), Dr. Trinh defended his doctoral dissertation on this topic just last week!
    18K
  • user avatar
    He He
    @hhexiy
    Oct 24, 2025
    @haizelabs is one of the few truly tackling the hard problem of LLM eval and oversight. Excited to support their mission!
    user avatar
    Leonard Tang
    @leonardtang_
    Oct 24, 2025
    We are thrilled to welcome Professor He He @hhexiy as an advisor to the Haize Labs team! Professor He leads a group at NYU focused on evaluation, scalable oversight, human–AI collaboration, and reasoning.
    Image
    9.1K
  • user avatar
    He He
    @hhexiy
    Jan 29, 2022
    It’d be great if the ARR @ReviewAcl meta review provides two scores, one on significance of ideas/results and one on revisions needed. The two are kind of conflated now; what should be the score of a perfectly-executed, low-impact paper?
  • user avatar
    He He
    @hhexiy
    Sep 15, 2021
    New work on OOD detection with @uditarora09 & @WillHuang93! OODs are notoriously hard to define. We try to construct realistic pairs of ID/OOD sets and find that they reveal distinct failure modes of different detection methods.
    user avatar
    Udit Arora
    @uditarora09
    Sep 15, 2021
    1/5 New paper @emnlpmeeting! “Types of Out-of-distribution Texts and How to Detect Them” with @WillHuang93 and @hhexiy: arxiv.org/abs/2109.06827. TL;DR: Our results call for an explicit definition of OOD examples when evaluating different detection methods.
  • user avatar
    He He
    @hhexiy
    Sep 20, 2024
    Check out Jiaxin's work on how RLHFed model excels at impressing humans, not the actual tasks!
    user avatar
    Jiaxin Wen
    @jiaxinwen22
    Sep 20, 2024
    RLHF is a popular method. It makes your human eval score better and Elo rating 🚀🚀. But really❓Your model might be “cheating” you! 😈😈 We show that LLMs can learn to mislead human evaluators via RLHF. 🧵below
    Image
    5.6K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up