Pinned
Finally, GPT-2 in 2 minutes :)
New sub-2minute NanoGPT Speedrun WR at 119.3 (-2.9s) led by @varunneal! It includes multi-token prediction, untying embed/lm_head mid-training, and updating CWD to not decay zero-grad embeds. The speedrun limit appears further away the closer it gets. github.com/KellerJordan/m…









