Karpathy's observations on Llama 3

This part stand out a lot for me:

The LLMs we work with all the time are significantly undertrained by a factor of maybe 100-1000X or more, nowhere near their point of convergence. Actually, I really hope people carry forward the trend and start training and releasing even more long-trained, even smaller models.

https://twitter.com/karpathy/status/1781028605709234613?s=61&t=7Au1gTKP-zUn18LipYk8xQ