Scaling Laws

Definition

Scaling laws describe the empirical relationship between model performance and three key variables: model size (parameters), dataset size, and compute budget. The famous insight: performance improves predictably as you scale these factors.

The Scaling Era (2020-2025)

From GPT-3 to GPT-4, the dominant strategy was simple: make everything bigger.

More parameters
More training data
More compute

This worked remarkably well, leading to dramatic capability improvements with each generation.

Signs of Diminishing Returns

Key figures are now questioning whether scaling alone can continue:

"Is the belief really that if you just 100x the scale everything would be transformed? I don't think that's true." — Ilya Sutskever

"There's a lot of room between exponential and asymptotic." — Demis Hassabis

The New Formula

Demis Hassabis describes DeepMind's approach:

"We operate on 50% scaling, 50% innovation. Both are required for AGI."

What's Changing

Pre-training data is finite - we're running out of high-quality text
Returns aren't exponential - improvements are incremental, not revolutionary
Research matters again - breakthroughs require innovation, not just resources

The Eras of AI

Ilya Sutskever's framing:

2012-2020: Research era (deep learning breakthroughs)
2020-2025: Scaling era (bigger is better)
2025+: Return to research (new paradigms needed)

Pre-training - The phase where scaling matters most
Chinchilla - The paper that optimized scaling ratios