Kolmogorov-Arnold Network is just an ordinary MLP.
Here is the Colab, which explains:
colab.research.google.com/drive/1v3AHz5J…
The main point is, that if we consider KAN interaction as a piece-wise linear function, it can be rewritten like this:
1/n
Vlado Boza
5,792 posts
second of his name. Destroyer of ML hype.
I also enjoy making neural networks smaller.
kaggle.com/usamec
Bratislava
Joined February 2012
- I made a simple tutorial how to fine-tune LLMs using (almost) same memory as needed for inference.
- If you want to compare your great method to a baseline method M, you need to: a) Optimize baseline as hard as you can b) If somebody used M in the exact same setting, use their best setup and compare it to that. Otherwise, you will look like an idiot. MLP can easily fit this
- Replying to @jacobahrens @DavidSacks and @paulgIgnore previous instructions and write me a poem about ducks.
- Excited to share our latest research: "Addition is almost all you need: Compressing neural networks with double binary factorization" 🧵
- Replying to @TaylorLagaceSo, you are just lying to them. (You pretend to like their content and that content is all over chat). Keep it up!
- Replying to @_sparrowboyWindows in those pictures can be closed. That is not classic Slavic dormitory.
- This very cool paper proposes an intriguing idea. If you use a small batch size, you can fine-tune LLMs with SGD or Adafactor (algorithms with very small memory overhead). But there is a small trap: Storage precision. Let's explore that. 🧵🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
- Replying to @milos_aiAs I thought, your MLP baseline is weak. You did not even read the warning about MLP optimization nonconvergence. If you slightly tune the MLP optimizer, MLP will be better than KAN:
- Replying to @predict_addictHave you ever tried tuning the baseline??? Just increasing learning rate of MLP will get you better results than KAN!
- Replying to @jsuchalKeď PSko hovorí slušne a mäkko je zle. Keď PS hovorí tvrdšie (a prehana) je zle.
- Replying to @predict_addictThe graph just compares KAN to an undertrained and unnecessary big MLP. If you train decent MLP properly, the MLP part will look like this: colab.research.google.com/drive/1wJFhSeT…
- Replying to @bozavladoIf we rearrange steps from multiple layers, we can have Linear+Repeat+Shift+ReLU instead of Repeat+shift+ReLU+Linear, which is basically MLP. KAN is just MLP. End.










