Pinned
Over the weekend, I managed to beat the NanoGPT world record by 0.3s, by untying Value Embeddings (VE) 😊
There’s a growing trend in sparse embeddings (Deepseek engram, Meta STEM), and @karpathy himself has extensively ablated VEs in nanochat
New NanoGPT Speedrun WR at 99.0 (-0.3s) from @photon_mz, with an update from 3 to 5 value embeddings, enabling 1.5% fewer training steps! The trend of fewer steps with higher sparsity continues.
github.com/KellerJordan/m…


















