Overfitting

Also known as: overfit, memorization, overfitted model

research beginner

What is Overfitting?

Overfitting occurs when a machine learning model learns the training data too well, memorizing specific examples including their noise and idiosyncrasies rather than learning the underlying patterns. An overfitted model performs excellently on its training data but fails to generalize to new, unseen data. It is one of the most fundamental challenges in machine learning and a concept every practitioner must understand.

The Generalization Trade-Off

Think of overfitting like a student who memorizes every answer on a practice exam but cannot solve novel problems on the real test. The model has learned what to say for specific inputs rather than understanding why. The goal of machine learning is generalization, the ability to perform well on data the model has never seen before. Overfitting is the failure mode where a model sacrifices generalization for training performance. The tension between fitting the training data closely enough to learn useful patterns, but not so closely that it memorizes noise, is central to the entire discipline.

Causes and Prevention

Overfitting is more likely when the model has too many parameters relative to the amount of training data, when training runs for too many epochs, or when the data lacks diversity. Common prevention techniques include regularization (adding penalties for model complexity), dropout (randomly disabling neurons during training), early stopping (halting training before the model begins memorizing), data augmentation (artificially increasing dataset diversity), and cross-validation (testing on held-out portions of data).

Overfitting in the LLM Era

Large language models present an interesting relationship with overfitting. Their massive parameter counts would traditionally suggest extreme overfitting risk, but scaling laws have shown that sufficiently large models trained on sufficiently large datasets actually generalize better. The risk surfaces during fine-tuning, where a foundation model is adapted to a small specialized dataset and can quickly memorize it. Careful learning rate scheduling, small dataset augmentation, and evaluation on held-out data remain essential practices.