Pinned
Outperform GPT-3 with @karpathy's llm.c using just 1/3 training tokens ✨
Another day has passed, and I trained GPT-2 (124M) with llm.c for 150B tokens, achieving 35.5% accuracy on HellaSwag. This surpasses the GPT-3 paper’s 33.7% accuracy trained for 300B tokens. It matched the
Apparently today is the 4th year anniversary of GPT-3!
arxiv.org/abs/2005.14165
Which I am accidentally celebrating by re-training the smallest model in the miniseries right now :). HellaSwag 33.7 (Appendix H) almost reached this a few steps ago (though this is only 45% of the










