Scaling Laws
SKAY-ling lawz
Definition
Scaling laws describe the empirical relationship between model performance and three key variables: model size (parameters), dataset size, and compute budget. The famous insight: performance improves predictably as you scale these factors.
The Scaling Era (2020-2025)
From GPT-3 to GPT-4, the dominant strategy was simple: make everything bigger.
- More parameters
- More training data
- More compute
This worked remarkably well, leading to dramatic capability improvements with each generation.
Signs of Diminishing Returns
Key figures are now questioning whether scaling alone can continue:
"Is the belief really that if you just 100x the scale everything would be transformed? I don't think that's true." — Ilya Sutskever
"There's a lot of room between exponential and asymptotic." — Demis Hassabis
The New Formula
Demis Hassabis describes DeepMind's approach:
"We operate on 50% scaling, 50% innovation. Both are required for AGI."
What's Changing
- Pre-training data is finite - we're running out of high-quality text
- Returns aren't exponential - improvements are incremental, not revolutionary
- Research matters again - breakthroughs require innovation, not just resources
The Eras of AI
Ilya Sutskever's framing:
- 2012-2020: Research era (deep learning breakthroughs)
- 2020-2025: Scaling era (bigger is better)
- 2025+: Return to research (new paradigms needed)
Related Terms
- Pre-training - The phase where scaling matters most
- Chinchilla - The paper that optimized scaling ratios

