Pinned
Ilia Shumailov🦔
1,093 posts
Now: @Meta, Past: {CEO @aisequrity, Senior Scientist @GoogleDeepMind, JRF @ChCh_Oxford @UniofOxford, Fellow @VectorInst, PhD @Cambridge_Uni}
- Our team is looking for student researchers to study things at the intersection of ML, Security, Safety, and Privacy. To express interest please fill in the form:
- Is censorship of LLMs even possible? Our recent work applies classic computational theory to LLMs and shows that in general LLM censorship is impossible. We show that Rice theorem applies to interactions with augmented LLMs, implying that semantic censorship is undecidable.
- What happens when generated data of one LLM becomes training data of another LLM? Turns out that models start forgetting the real distribution and as the process repeats models develop dementia. cl.cam.ac.uk/~is410/Papers/…
- 🤯 Our new @GoogleDeepMind paper reveals a vulnerability in the AI supply chain. Our paper, "Cascading Adversarial Bias," shows how tiny, malicious changes to a large "teacher" language model can create amplified biases in smaller "student" models after distillation.
- Replying to @ibabWe actually studied what happens in the limit here — variance is lost and models degenerate
- 📢 New security risk for Mixture-of-Experts (MoE)! 📢 @GoogleDeepMind research reveals a new kind of vulnerability that could leak user prompts in MoE models. Our "MoE Tiebreak Leakage" attack exploits the Expert Choice Routing strategy. arxiv.org/pdf/2410.22884
- Our new @GoogleDeepMind paper, "Lessons from Defending Gemini Against Indirect Prompt Injections," details our framework for evaluating and improving robustness to prompt injection attacks.
- Just saw our Nature paper on model collapse passed 500k accesses. To put that in perspective, the Nobel-winning AlphaFold paper has 2.3M accesses—only 4.6x more. I wanted to reflect back on it and progress broadly in the past year.
- Unlearning, originally for privacy, today is often discussed as a content-regulation tool. If my model doesnt know X, it is safe. We argue that unlearning provides illusion of safety, since adversaries can inject malicious knowledge back into the models. arxiv.org/pdf/2407.00106
- My friends, I want to organise Secure AI Club in London -- gig for people interested in (practical!) AI Security. Not just academic toy setups, but actually making systems reliable. Trying to gauge interest, please sign up here:
- Are modern large language models (LLMs) vulnerable to privacy attacks that can determine if given data was used for training? Models and dataset are quite large, what should we even expect? Our new paper looks into this exact question. 🧵 (1/10)
- Replying to @elder_pliniusWe actually theoretically kinda describe this in 6.4 in arxiv.org/pdf/2503.18813, its a kind of polymorphic crypter
- Folks, our @GoogleDeepMind team is cooking exciting security privacy tooling and we need your help. We are looking to hire more folks! Please reach out to me with cv if you want to contribute to making Gemini secure1/3 🚨 AGI agents are venturing into untrusted territories, but current LLMs face vulnerabilities like prompt injections. How do we ensure their safety? 🤔












