Ilia Shumailov🦔 (@iliaishacked) / X

Ilia Shumailov🦔

1,093 posts

Ilia Shumailov🦔

@iliaishacked

Now: @Meta, Past: {CEO @aisequrity, Senior Scientist @GoogleDeepMind, JRF @ChCh_Oxford @UniofOxford, Fellow @VectorInst, PhD @Cambridge_Uni}

[email protected]

iliaishacked.github.io

Joined December 2017

Pinned
Ilia Shumailov🦔
@iliaishacked
Jan 28
Article
SEQURITY.AI makes MoltBot unhackable with indirect prompt injections (and much more)
Friends, we love everything MoltBot. To ensure your creations stay safe as they scale, we are introducing Sequrity Control – a new way to secure your AI on demand. Integration is as simple as adding a...
5.6K
Ilia Shumailov🦔
@iliaishacked
Oct 17, 2023
Our team is looking for student researchers to study things at the intersection of ML, Security, Safety, and Privacy. To express interest please fill in the form:
docs.google.com
DeepMind Student Intern Interest Form
With this form we are trying to measure interest in in-person (!) student internships at Google Deepmind, London. During the internship the student will be expected to conduct research at the...
70K
Ilia Shumailov🦔
@iliaishacked
Jul 20, 2023
Is censorship of LLMs even possible? Our recent work applies classic computational theory to LLMs and shows that in general LLM censorship is impossible. We show that Rice theorem applies to interactions with augmented LLMs, implying that semantic censorship is undecidable.
42K
Ilia Shumailov🦔
@iliaishacked
May 27, 2023
What happens when generated data of one LLM becomes training data of another LLM? Turns out that models start forgetting the real distribution and as the process repeats models develop dementia. cl.cam.ac.uk/~is410/Papers/…
65K
Ilia Shumailov🦔
@iliaishacked
Jun 3, 2025
🤯 Our new @GoogleDeepMind paper reveals a vulnerability in the AI supply chain. Our paper, "Cascading Adversarial Bias," shows how tiny, malicious changes to a large "teacher" language model can create amplified biases in smaller "student" models after distillation.
23K
Ilia Shumailov🦔
@iliaishacked
Dec 9, 2023
Replying to @ibab
We actually studied what happens in the limit here — variance is lost and models degenerate
arxiv.org
The Curse of Recursion: Training on Generated Data Makes Models Forget
Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such...
42K
Ilia Shumailov🦔
@iliaishacked
Nov 4, 2024
📢 New security risk for Mixture-of-Experts (MoE)! 📢 @GoogleDeepMind research reveals a new kind of vulnerability that could leak user prompts in MoE models. Our "MoE Tiebreak Leakage" attack exploits the Expert Choice Routing strategy. arxiv.org/pdf/2410.22884
27K
Ilia Shumailov🦔
@iliaishacked
Jun 2, 2025
Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
19K
Ilia Shumailov🦔
@iliaishacked
Jul 21, 2025
Just saw our Nature paper on model collapse passed 500k accesses. To put that in perspective, the Nobel-winning AlphaFold paper has 2.3M accesses—only 4.6x more. I wanted to reflect back on it and progress broadly in the past year.
17K
Ilia Shumailov🦔
@iliaishacked
Jul 3, 2024
Unlearning, originally for privacy, today is often discussed as a content-regulation tool. If my model doesnt know X, it is safe. We argue that unlearning provides illusion of safety, since adversaries can inject malicious knowledge back into the models. arxiv.org/pdf/2407.00106
19K
Ilia Shumailov🦔
@iliaishacked
Jul 14, 2025
My friends, I want to organise Secure AI Club in London -- gig for people interested in (practical!) AI Security. Not just academic toy setups, but actually making systems reliable. Trying to gauge interest, please sign up here:
docs.google.com
Secure AI Club London Meetup
I want to organise a Secure AI Club meetup in London Need to figure out how many people will come to find appropriate venue
15K
Ilia Shumailov🦔
@iliaishacked
Jun 10, 2025
Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)
21K
Ilia Shumailov🦔
@iliaishacked
Jul 17, 2025
Replying to @elder_plinius
We actually theoretically kinda describe this in 6.4 in arxiv.org/pdf/2503.18813, its a kind of polymorphic crypter
17K
Ilia Shumailov🦔
@iliaishacked
Apr 11, 2025
Folks, our @GoogleDeepMind team is cooking exciting security privacy tooling and we need your help. We are looking to hire more folks! Please reach out to me with cv if you want to contribute to making Gemini secure
Andreas Terzis
@aterzis
Apr 11, 2025
1/3 🚨 AGI agents are venturing into untrusted territories, but current LLMs face vulnerabilities like prompt injections. How do we ensure their safety? 🤔
17K