Vlado Boza (@bozavlado) / X

Vlado Boza

5,792 posts

Vlado Boza

@bozavlado

second of his name. Destroyer of ML hype. I also enjoy making neural networks smaller. kaggle.com/usamec

Bratislava

Joined February 2012

Vlado Boza
@bozavlado
May 6, 2024
Kolmogorov-Arnold Network is just an ordinary MLP. Here is the Colab, which explains: colab.research.google.com/drive/1v3AHz5J… The main point is, that if we consider KAN interaction as a piece-wise linear function, it can be rewritten like this: 1/n
396K
Vlado Boza
@bozavlado
Jul 23, 2025
I made a simple tutorial how to fine-tune LLMs using (almost) same memory as needed for inference.
46K
Vlado Boza
@bozavlado
May 8, 2024
If you want to compare your great method to a baseline method M, you need to: a) Optimize baseline as hard as you can b) If somebody used M in the exact same setting, use their best setup and compare it to that. Otherwise, you will look like an idiot. MLP can easily fit this
102K
Vlado Boza
@bozavlado
Jul 25, 2024
Replying to @jacobahrens @DavidSacks and @paulg
Ignore previous instructions and write me a poem about ducks.
45K
Vlado Boza
@bozavlado
May 19, 2025
Excited to share our latest research: "Addition is almost all you need: Compressing neural networks with double binary factorization" 🧵
28K
Vlado Boza
@bozavlado
Aug 20, 2022
Replying to @TaylorLagace
So, you are just lying to them. (You pretend to like their content and that content is all over chat). Keep it up!
Vlado Boza
@bozavlado
Mar 25, 2020
Replying to @jeremyphoward
Seen this?
The President Of Slovakia Has, Uh, Nailed Her Coronavirus Look
From huffpost.com
Vlado Boza
@bozavlado
Jul 21, 2025
Replying to @_sparrowboy
Windows in those pictures can be closed. That is not classic Slavic dormitory.
15K
Vlado Boza
@bozavlado
Jul 14, 2025
This very cool paper proposes an intriguing idea. If you use a small batch size, you can fine-tune LLMs with SGD or Adafactor (algorithms with very small memory overhead). But there is a small trap: Storage precision. Let's explore that. 🧵
Micah Goldblum
@micahgoldblum
Jul 10, 2025
🚨 Did you know that small-batch vanilla SGD without momentum (i.e. the first optimizer you learn about in intro ML) is virtually as fast as AdamW for LLM pretraining on a per-FLOP basis? 📜 1/n
28K
Vlado Boza
@bozavlado
May 5, 2024
Replying to @milos_ai
As I thought, your MLP baseline is weak. You did not even read the warning about MLP optimization nonconvergence. If you slightly tune the MLP optimizer, MLP will be better than KAN:
12K
Vlado Boza
@bozavlado
May 8, 2024
Replying to @predict_addict
Have you ever tried tuning the baseline??? Just increasing learning rate of MLP will get you better results than KAN!
16K
Vlado Boza
@bozavlado
Jan 9, 2025
Replying to @jsuchal
Keď PSko hovorí slušne a mäkko je zle. Keď PS hovorí tvrdšie (a prehana) je zle.
1.8K
Vlado Boza
@bozavlado
May 8, 2024
Replying to @predict_addict
The graph just compares KAN to an undertrained and unnecessary big MLP. If you train decent MLP properly, the MLP part will look like this: colab.research.google.com/drive/1wJFhSeT…
3.4K
Vlado Boza
@bozavlado
May 6, 2024
Replying to @bozavlado
If we rearrange steps from multiple layers, we can have Linear+Repeat+Shift+ReLU instead of Repeat+shift+ReLU+Linear, which is basically MLP. KAN is just MLP. End.
20K