Log inSign up
Alex Robey
721 posts
user avatar
Alex Robey
@AlexRobey23
AI safety research @thinkymachines. Formerly @mldcmu @penn @swarthmore
San Francisco, CA
arobey1.github.io
Joined July 2020
1,475
Following
1,293
Followers
  • Pinned
    user avatar
    Alex Robey
    @AlexRobey23
    Oct 17, 2024
    Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
    Image
    00:00
    111K
  • user avatar
    Alex Robey
    @AlexRobey23
    Jun 1, 2022
    Excited to introduce our #ICML2022 paper “Probabilistically Robust Learning: Balancing Average- and Worst-case Performance” 🚀 We propose a new, high-probability notion of robustness for machine learning models. Code: github.com/arobey1/advben… Paper: arxiv.org/abs/2202.01136 1/n
    Image
  • user avatar
    Alex Robey
    @AlexRobey23
    Jul 28, 2023
    Excited to share that our paper “Adversarial Training Should Be Cast As a Non-Zero-Sum Game” won the *𝐛𝐞𝐬𝐭 𝐩𝐚𝐩𝐞𝐫 𝐚𝐰𝐚𝐫𝐝* at the AdvML workshop at #ICML2023! 🚀 Paper: arxiv.org/abs/2306.11035 Talk: Friday at 10am in Ballroom A Want to know more? Check out this 🧵
    Image
    24K
  • user avatar
    Alex Robey
    @AlexRobey23
    Dec 18, 2024
    After rejections at ICLR, ICML, and NeurIPS, I'm happy to report that "Jailbreaking Black Box LLMs in Twenty Queries" (i.e., the PAIR paper) has been accepted at @satml_conf! 🚀 A quick 🧵 summarizing some thoughts a year on from PAIR's release.
    Image
    GIF
    20K
  • user avatar
    Alex Robey
    @AlexRobey23
    Oct 24, 2023
    Adversarial input prompts can jailbreak LLMs. To address this threat, meet SmoothLLM, a defense algorithm that reduces the success rates of popular jailbreaks to below 1%. 🚀 Paper: arxiv.org/abs/2310.03684 Website/Blog: debugml.github.io/smooth-llm Code: github.com/arobey1/smooth…
    Image
    33K
  • user avatar
    Alex Robey
    @AlexRobey23
    Jun 16, 2021
    Introducing "Model-Based Domain Generalization." We use semi-infinite constrained learning and duality to derive a new scheme for domain generalization that improves by as much as 30% on well-known benchmarks 🚀. Paper: arxiv.org/abs/2102.11436 Code: github.com/arobey1/mbdg 1/n
    arXiv logo
    arxiv.org
    Model-Based Domain Generalization
    Despite remarkable success in a variety of applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data. Toward addressing this...
  • user avatar
    Alex Robey
    @AlexRobey23
    Sep 26, 2022
    Our latest #NeurIPS2022 paper combines quantile optimization with tools from causal inference to achieve probable domain generalization. Check it out on arxiv!
    user avatar
    Julius von Kügelgen
    @JKugelgen
    Sep 26, 2022
    Replying to @JKugelgen
    "Probable Domain Generalization via Quantile Risk Minimization" (arxiv.org/abs/2207.09944) introduces QRM for generalizing to unseen domains with high probability; QRM can also recover the causal predictor. @CianEastwood @AlexRobey23 S Singh @HamedSHassani @pappasg69 @bschoelkopf
    Image
  • user avatar
    Alex Robey
    @AlexRobey23
    Dec 6, 2021
    Looking forward to presenting two papers tomorrow (Tuesday) at #NeurIPS2021: * Adversarial Robustness with Semi-Infinite Constrained Learning @ 11:30am EST (poster session 1) * Model-Based Domain Generalization @ 7:30pm EST (poster session 2)
    Image
    Image
  • user avatar
    Alex Robey
    @AlexRobey23
    Jan 18, 2024
    Our paper on **non-zero-sum adversarial training** will appear at #ICLR2024 🚀🚀🚀 For details, see this thread 🧵
    user avatar
    Alex Robey
    @AlexRobey23
    Jul 28, 2023
    Excited to share that our paper “Adversarial Training Should Be Cast As a Non-Zero-Sum Game” won the *𝐛𝐞𝐬𝐭 𝐩𝐚𝐩𝐞𝐫 𝐚𝐰𝐚𝐫𝐝* at the AdvML workshop at #ICML2023! 🚀 Paper: arxiv.org/abs/2306.11035 Talk: Friday at 10am in Ballroom A Want to know more? Check out this 🧵
    Image
    4.7K
  • user avatar
    Alex Robey
    @AlexRobey23
    Oct 22, 2024
    I'm grateful to have received the Adversarial ML Rising Star Award! 🚀 @AdvMLFrontiers is a fantastic venue. Many thanks to the award committee @pinyuchenTW @uiuc_aisecure @sijialiu17 @cho_jui_hsieh and to the workshop organizers!
    user avatar
    Pin-Yu Chen
    @pinyuchenTW
    Oct 22, 2024
    Please join me in congratulating this year's #AdvML Rising Star Award winners, @AlexRobey23 & @xuandongzhao, for their research accomplishments in AI robustness and safety. Their award talks will be presented at @AdvMLFrontiers @NeurIPSConf 2024 Details: sites.google.com/view/advml/adv…
    Image
    2.8K
  • user avatar
    Alex Robey
    @AlexRobey23
    Jul 20, 2022
    Looking forward to presenting on Probabilistic Robustness at #ICML2022 today (Wednesday)! 🚀 The talk will be at 5pm EDT in the DL: Algorithms session (icml.cc/virtual/2022/s…). And I'll also be at Poster #500 directly afterward.
    user avatar
    Alex Robey
    @AlexRobey23
    Jun 1, 2022
    Excited to introduce our #ICML2022 paper “Probabilistically Robust Learning: Balancing Average- and Worst-case Performance” 🚀 We propose a new, high-probability notion of robustness for machine learning models. Code: github.com/arobey1/advben… Paper: arxiv.org/abs/2202.01136 1/n
    Image
  • user avatar
    Alex Robey
    @AlexRobey23
    Nov 30, 2022
    Interested in domain generalization, causality, or robust optimization? If so, stop by our poster tomorrow in Hall J (#711) at 4pm CST!
    user avatar
    Cian Eastwood
    @CianEastwood
    Oct 18, 2022
    Excited to introduce our #NeurIPS2022 paper “Probable Domain Generalization via Quantile Risk Minimization”🚀 We propose a new probabilistic framework for the problem of domain/out-of-distribution generalization. Paper: arxiv.org/abs/2207.09944 Code: github.com/cianeastwood/q… 1/n
    Image
  • user avatar
    Alex Robey
    @AlexRobey23
    Dec 10, 2024
    I'll be in Vancouver at #NeurIPS2024 all week! Excited to present new results on jailbreaking LLMs & robots. Reach out if you'd like to chat about anything related to AI safety, security, evals, or optimization!
    Image
    Image
    5K
  • user avatar
    Alex Robey
    @AlexRobey23
    Oct 17, 2024
    Replying to @AlexRobey23
    If that doesn't scare you, check out the Thermonator—a robot dog with a *flamethrower*. The Thermonator is built on top of the Unitree Go2, costs < $10k, and can be controlled by ChatGPT. Here's IShowSpeed showing what this robot can do.
    user avatar
    Dexerto
    @Dexerto
    Sep 2, 2024
    IShowSpeed’s robot dog shot flames at him x.com/copiumx/status…
    4.1K

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up