Deep learning research involves a lot of hypeparameter tuning
Vitaly πΊπ¦ Feldman
342 posts
Proving theorems about machine learning. Worried about data privacy. Research scientist at @Apple
- If reading the NeurIPS/ICML reviews makes you wish the your ML theory paper was reviewed by experts you are welcome to submit to ALT 2021! Deadline Sep 30. Check out the CFP, amazing PC and invited speakers at
- π€ Sorry to rain on this parade but from a quick look at this paper I see that the analysis of privacy guarantees makes no sense: the authors apparently do not realize that their (unsubstantiated) assumption implies stronger privacy guarantees then what they prove from it.This Post is from an account that no longer exists. Learn more
- If you are interested in hearing some of the latest and greatest work on learning theory plus invited talks by Joelle Pineau and @KonstDaskalakis and a tutorial by Shay Moran, check out ALT 2021 on March 16-19. Registration is free!
- Happy to announce that myself and Jan Vondrak finally proved that uniform stability leads to generalization with high probability and almost no overhead in the rate. So no more excuses to showing that SGD and friends generalize only in expectation.
- Nice empirical work, although the authors appear unaware that the same formal notion of memorization was introduced, formally studied and empirically evaluated in arxiv.org/abs/2012.06421 (STOC 2021). There is even a recent follow-up arxiv.org/abs/2506.01855 (COLT 2025 to appear)new paper from our work at Meta! **GPT-style language models memorize 3.6 bits per param** we compute capacity by measuring total bits memorized, using some theory from Shannon (1953) shockingly, the memorization-datasize curves look like this: ___________ / / (π§΅)arxiv.orgWhen is Memorization of Irrelevant Training Data Necessary for...Modern machine learning models are complex and frequently encode surprising amounts of information about individual inputs. In extreme cases, complex models appear to memorize entire input...
- The whole point of @icmlconf (and especially awards) is to curate the attention of ML community. Directing that attention to fundamentally flawed work on data privacy can do major damage to the area. So it's the duty of privacy experts to report on the flaws they discover π§΅
- New and insightful paper on memorization of private training data by GPT-2: arxiv.org/abs/2012.07805 And coincidentally, on why such memorization can by necessary for achieving near optimal accuracy: arxiv.org/abs/2012.06421 (w/ @markmbun, Gavin Brown, Adam Smith, @_kunal_talwar_)
- 1/? Recent work shows that privacy and model compression have a "disparate effect" on subgroups with different frequencies (arxiv.org/abs/1911.05248 arxiv.org/abs/1905.12101). The explanation is interesting and relevant to ML more generally. It is also simple enough for tweeting:
- I'm pretty sure some of my NeurIPS reviews were generated by GPT-3 with "in the style of Reviewer 2" in the prompt
- If you ever wondered (1) why NNs memorize noisy examples and sometimes even interpolate the dataset, (2) how to reconcile these phenomena with generalization, (3) how to scalably compute influences of all the training examples. More visualizations at pluskid.github.io/influence-memoβ¦What Neural Networks Memorize and Why: Discovering the Long Tail via Influence Estimation deepai.org/publication/whβ¦ by Vitaly Feldman et al. #Statistics #Outlier
- Time to test that bleach theory
- ALT 2021 accepted papers are up: algorithmiclearningtheory.org/alt2021/accept⦠A huge thank you to the entire PC and especially to my co-chair Katrina Ligett! Still can't believe we managed to stick to the original timeline despite the record number of submissions and the stresses of the pandemic.






