Backpropagation

/ˌbækˌprɒpəˈɡeɪʃən/

Also known as: backprop, back-propagation, backward propagation of errors

research intermediate

What is Backpropagation?

Backpropagation is the fundamental algorithm that enables neural networks to learn. Short for “backward propagation of errors,” it works by computing how much each connection in a network contributed to an error, then adjusting those connections to reduce the error next time. Without backpropagation, modern AI as we know it — from image recognition to large language models — would not exist.

The algorithm was independently discovered multiple times: by Seppo Linnainmaa in Finland in the early 1970s, by Paul Werbos at Harvard in the late 1970s, and by control theorists Bryson and Ho for spacecraft landing applications. But it was Geoffrey Hinton and David Rumelhart’s group in San Diego in the mid-1980s who demonstrated that backpropagation could learn meaningful representations, including the meanings of words — a result published in Nature that launched the modern deep learning era.

How It Works

Geoffrey Hinton explains backpropagation using a physics analogy: imagine attaching a piece of elastic between a neural network’s actual output and the desired output. The elastic creates a force pulling the output toward the correct answer. Backpropagation sends that force backward through the network’s layers, telling each neuron and connection how to adjust.

More technically:

  1. Forward pass: Input data flows through the network, producing an output
  2. Error calculation: The difference between the output and the correct answer is computed
  3. Backward pass: Using calculus (the chain rule), the algorithm computes how much each connection contributed to the error
  4. Weight update: Each connection strength is adjusted to reduce the error

The key insight is that all connection strengths can be updated simultaneously, making the process vastly more efficient than trial-and-error approaches.

Key Characteristics

  • Enables supervised learning: The network needs labeled data (correct answers) to learn
  • Works on multiple layers: Before backpropagation, researchers could only train the last layer of a network
  • Requires differentiable functions: The math only works if each operation in the network is smooth enough for calculus
  • Scales with compute: The algorithm existed since the 1970s but needed modern computing power to reach its potential

Why Backpropagation Matters

As Hinton told StarTalk: “It turns out it was the magic answer to everything if you have enough data and enough compute power.” The algorithm is the engine behind every major AI system in use today. When organizations deploy AI agents, every prediction those agents make traces back to weights trained by backpropagation.

The algorithm’s history also illustrates a broader lesson about AI: the theoretical foundations often exist decades before practical breakthroughs. Backpropagation waited for data and compute to catch up with the math.

Historical Context

DateEvent
1970sLinnainmaa publishes automatic differentiation (Finland)
Late 1970sWerbos applies backpropagation to neural networks (Harvard)
1986Rumelhart, Hinton, and Williams publish in Nature, showing learned word representations
2012AlexNet (trained with backpropagation) wins ImageNet, launching the deep learning revolution

Mentioned In

Video thumbnail

Geoffrey Hinton

You want to take that force imposed by the elastic on that output neuron and send it backwards to the neurons in the layer before. That's called backpropagation.