Scaling Laws

SKAY-ling lawz

research intermediate

Definition

Scaling laws describe the empirical relationship between model performance and three key variables: model size (parameters), dataset size, and compute budget. The famous insight: performance improves predictably as you scale these factors.

The Scaling Era (2020-2025)

From GPT-3 to GPT-4, the dominant strategy was simple: make everything bigger.

  • More parameters
  • More training data
  • More compute

This worked remarkably well, leading to dramatic capability improvements with each generation.

Signs of Diminishing Returns

Key figures are now questioning whether scaling alone can continue:

“Is the belief really that if you just 100x the scale everything would be transformed? I don’t think that’s true.” — Ilya Sutskever

“There’s a lot of room between exponential and asymptotic.” — Demis Hassabis

The New Formula

Demis Hassabis describes DeepMind’s approach:

“We operate on 50% scaling, 50% innovation. Both are required for AGI.”

What’s Changing

  1. Pre-training data is finite - we’re running out of high-quality text
  2. Returns aren’t exponential - improvements are incremental, not revolutionary
  3. Research matters again - breakthroughs require innovation, not just resources

The Eras of AI

Ilya Sutskever’s framing:

  • 2012-2020: Research era (deep learning breakthroughs)
  • 2020-2025: Scaling era (bigger is better)
  • 2025+: Return to research (new paradigms needed)

Mentioned In

Video thumbnail

Ilya Sutskever

Is the belief really that if you just 100x the scale everything would be transformed? I don't think that's true.

Video thumbnail

Demis Hassabis

There's a lot of room between exponential and asymptotic. We operate on 50% scaling, 50% innovation.

Video thumbnail

Nathan Lambert

I still think most of the compute is going in at pre-training... I would say we will see a $2,000 subscription this year.

Video thumbnail

Jared Kaplan

Kaplan, co-author of the foundational scaling laws paper, describes how the scaling hypothesis — that bigger models predictably get better — drove the creation of Anthropic and remains central to its research strategy.

Video thumbnail

Dario Amodei

The scaling laws just tell you that if you put in the ingredients to the chemical reaction — data and model size — what you get out is intelligence. Intelligence is the product of a chemical reaction.

Video thumbnail

Satya Nadella

Nadella credits Dario Amodei's scaling laws paper at OpenAI as the pivotal moment. When OpenAI switched from RL to natural language and scaling, Microsoft recognized the regime change and invested $1B.

Video thumbnail

Geoffrey Hinton

Every time they made the neural net bigger and gave it more data, it got better in a very predictable way. You could predict ahead of time it's going to get this much better. It's an open question whether that's petering out.