Log inSign up
Damien Ferbach
103 posts
Image
user avatar
Damien Ferbach
@damien_ferbach
PhD at @Mila_Quebec Opinions my own and do not represent any affiliated organizations
Montréal, Québec
damienferbach.github.io
Joined February 2022
203
Following
673
Followers
  • Pinned
    user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    It's very difficult to improve the *exponent* in scaling laws for loss vs compute, especially by changing the optimizer! Our new paper shows that scaling momentum correctly can *provably* improve the scaling exponent on a theoretical model. Empirically, it works on LSTMs too!
    Image
    56K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Nov 9, 2023
    Recent work by @SamuelAinsworth empirically evidences low loss linear paths between networks modulo permutations. In our work with amazing collaborators @baptistegoujaud , @gauthier_gidel , @AymericD4 , we give theoretical insights on this phenomenon! arxiv.org/pdf/2310.19103…
    Image
    11K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Sep 25, 2024
    I am delighted to share that our paper has been accepted at #NeurIPS as a spotlight!🚀 A huge thanks to my amazing collaborators @Qu3ntinB, @bose_joey and my supervisor @gauthier_gidel !!
    user avatar
    Damien Ferbach
    @damien_ferbach
    Jul 24, 2024
    Retraining generative models solely on their own synthetic data leads to model collapse. But what if the data was curated? With @Qu3ntinB , @bose_joey , @gauthier_gidel we show that retraining on curated data implicitly optimizes for a reward model ! 🚀 arxiv.org/pdf/2407.09499
    Image
    5.9K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Dec 11, 2024
    Come checkout our poster now!!! Poster 2510 east ballroom. @bose_joey @gauthier_gidel @Qu3ntinB @Mila_Quebec
    Image
    1.8K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Jul 24, 2024
    Retraining generative models solely on their own synthetic data leads to model collapse. But what if the data was curated? With @Qu3ntinB , @bose_joey , @gauthier_gidel we show that retraining on curated data implicitly optimizes for a reward model ! 🚀 arxiv.org/pdf/2407.09499
    Image
    9.7K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    Title: Dimension-adapted Momentum Outscales SGD Link: arxiv.org/pdf/2505.16098 Work done with amazing collaborators @_katieeverett @gauthier_gidel @poseypaquet @cypaquette Related 🧵: x.com/_katieeverett/… x.com/_katieeverett/…
    Image
    user avatar
    Katie Everett
    @_katieeverett
    May 25, 2025
    There were so many great replies to this thread, let's do a Part 2! For scaling laws between loss and compute, where loss = a * flops ^ b + c, which factors change primarily the constant (a) and which factors can actually change the exponent (b)? x.com/_katieeverett/…
    3K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    Our paper derives momentum schedules that are functions of both the model dimension and data distribution. * On our theoretical model, this provably improves the scaling law exponents in many regimes! * And, this exponent improvement holds on LSTM experiments on C4.
    Image
    2.9K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Dec 9, 2024
    I will be at NeurIPS in Vancouver this week to present our work on self-consuming generative models. arxiv.org/abs/2407.09499 Please reach out to talk about high-dimensional optimization and generative models !
    arXiv logo
    arxiv.org
    Self-Consuming Generative Models with Curated Data Provably...
    The rapid progress in generative models has resulted in impressive leaps in generation quality, blurring the lines between synthetic and real data. Web-scale datasets are now prone to the...
    1.1K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    🚨Takeaway: Depending on the data complexity, the compute-optimal training regime of our proposed Dimension-adapted Momentum method requires undertraining or overtraining relative to the Chinchilla law!🚨
    Image
    2.1K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    For optimal performance, optimizer hyperparameters should be scaled quantities, including the learning rate, batch size, and even epsilon. But there is very little study about *how to scale momentum*. We often treat momentum-related hparams (e.g. beta1, beta2) like constants.
    Image
    1.3K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    Our framework can be generalized beyond DANA-constant and DANA-decay to a whole space called DANA defined by the LR scaling gamma_3(t) = d^{-kappa_2}(1+t)^{-kappa_3}. DANA-constant, DANA-decay are extremal points of the stability boundary. DANA-decay is optimal in this class.
    Image
    1.2K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    None of the usual suspects improve the scaling exponent on this problem: classic momentum, Nesterov, Schedule-Free SGD, and Adam do not outscale SGD. (Note especially that adaptive optimizers aren't useful on PLRF: the problem is in some sense "too simple to need Adam".)
    Image
    1K
  • user avatar
    Damien Ferbach
    @damien_ferbach
    May 26, 2025
    Replying to @damien_ferbach
    Wrap-up: we proved on the PLRF model that outscaling is possible with correct HPs: ❌SGD-M scales like SGD 🟧DANA-constant outscales but needs very small LR ✅DANA-decay scales best with the proper power-law LR schedule. DANA-decay is a very promising optimizer at scale !!
    931
  • user avatar
    Damien Ferbach
    @damien_ferbach
    Oct 15, 2024
    Dreaming of pushing the boundaries in AI? 🌟 Apply for MSc or PhD at Mila, a world-leading research hub, for Fall 2025! 🚀 #AI #Research #Mila #PhD #MSc
    user avatar
    Mila - Institut québécois d'IA
    @Mila_Quebec
    Oct 14, 2024
    Mila's annual supervision request process opens on October 15 to receive MSc and PhD applications for Fall 2025 admission! Join our community! More information here mila.quebec/en/prospective…
    Image
    649

New to X?

Sign up now to get your own personalized timeline!

Create account

By signing up, you agree to the Terms of Service and Privacy Policy, including Cookie Use.

Terms·Privacy·Cookies·Accessibility·Ads Info·© 2026 X Corp.
Don't miss what's happening
People on X are the first to know.
Log inSign up