Alex Robey (@AlexRobey23) / X

Alex Robey

721 posts

Alex Robey

@AlexRobey23

AI safety research @thinkymachines. Formerly @mldcmu @penn @swarthmore

San Francisco, CA

Joined July 2020

Pinned
Alex Robey
@AlexRobey23
Oct 17, 2024
Chatbots like ChatGPT can be jailbroken to output harmful text. But what about robots? Can AI-controlled robots be jailbroken to perform harmful actions in the real world? Our new paper finds that jailbreaking AI-controlled robots isn't just possible. It's alarmingly easy. 🧵
00:00
111K
Alex Robey
@AlexRobey23
Jun 1, 2022
Excited to introduce our #ICML2022 paper “Probabilistically Robust Learning: Balancing Average- and Worst-case Performance” 🚀 We propose a new, high-probability notion of robustness for machine learning models. Code: github.com/arobey1/advben… Paper: arxiv.org/abs/2202.01136 1/n
Alex Robey
@AlexRobey23
Jul 28, 2023
Excited to share that our paper “Adversarial Training Should Be Cast As a Non-Zero-Sum Game” won the *𝐛𝐞𝐬𝐭 𝐩𝐚𝐩𝐞𝐫 𝐚𝐰𝐚𝐫𝐝* at the AdvML workshop at #ICML2023! 🚀 Paper: arxiv.org/abs/2306.11035 Talk: Friday at 10am in Ballroom A Want to know more? Check out this 🧵
24K
Alex Robey
@AlexRobey23
Dec 18, 2024
After rejections at ICLR, ICML, and NeurIPS, I'm happy to report that "Jailbreaking Black Box LLMs in Twenty Queries" (i.e., the PAIR paper) has been accepted at @satml_conf! 🚀 A quick 🧵 summarizing some thoughts a year on from PAIR's release.
GIF
20K
Alex Robey
@AlexRobey23
Oct 24, 2023
Adversarial input prompts can jailbreak LLMs. To address this threat, meet SmoothLLM, a defense algorithm that reduces the success rates of popular jailbreaks to below 1%. 🚀 Paper: arxiv.org/abs/2310.03684 Website/Blog: debugml.github.io/smooth-llm Code: github.com/arobey1/smooth…
33K
Alex Robey
@AlexRobey23
Jun 16, 2021
Introducing "Model-Based Domain Generalization." We use semi-infinite constrained learning and duality to derive a new scheme for domain generalization that improves by as much as 30% on well-known benchmarks 🚀. Paper: arxiv.org/abs/2102.11436 Code: github.com/arobey1/mbdg 1/n
arxiv.org
Model-Based Domain Generalization
Despite remarkable success in a variety of applications, it is well-known that deep learning can fail catastrophically when presented with out-of-distribution data. Toward addressing this...
Alex Robey
@AlexRobey23
Sep 26, 2022
Our latest #NeurIPS2022 paper combines quantile optimization with tools from causal inference to achieve probable domain generalization. Check it out on arxiv!
Julius von Kügelgen
@JKugelgen
Sep 26, 2022
Replying to @JKugelgen
"Probable Domain Generalization via Quantile Risk Minimization" (arxiv.org/abs/2207.09944) introduces QRM for generalizing to unseen domains with high probability; QRM can also recover the causal predictor. @CianEastwood @AlexRobey23 S Singh @HamedSHassani @pappasg69 @bschoelkopf
Alex Robey
@AlexRobey23
Dec 6, 2021
Looking forward to presenting two papers tomorrow (Tuesday) at #NeurIPS2021: * Adversarial Robustness with Semi-Infinite Constrained Learning @ 11:30am EST (poster session 1) * Model-Based Domain Generalization @ 7:30pm EST (poster session 2)
Alex Robey
@AlexRobey23
Jan 18, 2024
Our paper on **non-zero-sum adversarial training** will appear at #ICLR2024 🚀🚀🚀 For details, see this thread 🧵
Alex Robey
@AlexRobey23
Jul 28, 2023
Excited to share that our paper “Adversarial Training Should Be Cast As a Non-Zero-Sum Game” won the *𝐛𝐞𝐬𝐭 𝐩𝐚𝐩𝐞𝐫 𝐚𝐰𝐚𝐫𝐝* at the AdvML workshop at #ICML2023! 🚀 Paper: arxiv.org/abs/2306.11035 Talk: Friday at 10am in Ballroom A Want to know more? Check out this 🧵
4.7K
Alex Robey
@AlexRobey23
Oct 22, 2024
I'm grateful to have received the Adversarial ML Rising Star Award! 🚀 @AdvMLFrontiers is a fantastic venue. Many thanks to the award committee @pinyuchenTW @uiuc_aisecure @sijialiu17 @cho_jui_hsieh and to the workshop organizers!
Pin-Yu Chen
@pinyuchenTW
Oct 22, 2024
Please join me in congratulating this year's #AdvML Rising Star Award winners, @AlexRobey23 & @xuandongzhao, for their research accomplishments in AI robustness and safety. Their award talks will be presented at @AdvMLFrontiers @NeurIPSConf 2024 Details: sites.google.com/view/advml/adv…
2.8K
Alex Robey
@AlexRobey23
Jul 20, 2022
Looking forward to presenting on Probabilistic Robustness at #ICML2022 today (Wednesday)! 🚀 The talk will be at 5pm EDT in the DL: Algorithms session (icml.cc/virtual/2022/s…). And I'll also be at Poster #500 directly afterward.
Alex Robey
@AlexRobey23
Jun 1, 2022
Excited to introduce our #ICML2022 paper “Probabilistically Robust Learning: Balancing Average- and Worst-case Performance” 🚀 We propose a new, high-probability notion of robustness for machine learning models. Code: github.com/arobey1/advben… Paper: arxiv.org/abs/2202.01136 1/n
Alex Robey
@AlexRobey23
Nov 30, 2022
Interested in domain generalization, causality, or robust optimization? If so, stop by our poster tomorrow in Hall J (#711) at 4pm CST!
Cian Eastwood
@CianEastwood
Oct 18, 2022
Excited to introduce our #NeurIPS2022 paper “Probable Domain Generalization via Quantile Risk Minimization”🚀 We propose a new probabilistic framework for the problem of domain/out-of-distribution generalization. Paper: arxiv.org/abs/2207.09944 Code: github.com/cianeastwood/q… 1/n
Alex Robey
@AlexRobey23
Dec 10, 2024
I'll be in Vancouver at #NeurIPS2024 all week! Excited to present new results on jailbreaking LLMs & robots. Reach out if you'd like to chat about anything related to AI safety, security, evals, or optimization!
5K
Alex Robey
@AlexRobey23
Oct 17, 2024
Replying to @AlexRobey23
If that doesn't scare you, check out the Thermonator—a robot dog with a *flamethrower*. The Thermonator is built on top of the Unitree Go2, costs < $10k, and can be controlled by ChatGPT. Here's IShowSpeed showing what this robot can do.
Dexerto
@Dexerto
Sep 2, 2024
IShowSpeed’s robot dog shot flames at him x.com/copiumx/status…
4.1K