Do LLMs Understand? Yann LeCun vs. DeepMind's Adam Brown

2025-12-12 World Science Festival

llm debate meta deepmind understanding world-models

Two of the world’s leading AI researchers sit down for a candid debate on the most contested question in AI today: do these systems actually understand anything?

How LeCun and Brown Frame the Understanding Question

This debate crystallizes the core philosophical and technical divide running through AI research right now. On one side, Adam Brown from DeepMind argues that LLMs do understand - not perfectly, but genuinely. On the other, Yann LeCun contends their understanding is “superficial” because it’s not grounded in physical reality. The nuance between their positions reveals far more than either extreme.

The most revealing moment comes early when the moderator asks a binary question: “Do LLMs understand?” Brown says yes. LeCun says “sort of.” That gradient between binary positions is where the truth lives.

LeCun’s central argument is grounded in information theory and sample efficiency. He points out that training a competitive LLM requires 30 trillion tokens - roughly 10^14 bytes of text data. That’s effectively all the freely available text on the internet, representing half a million years of human reading time. Compare that to visual data: those same 10^14 bytes represent just 16,000 hours of video - exactly what a four-year-old child has seen in their entire waking life (assuming 2MB/s through the optic nerve).

This isn’t just about data volume. It’s about information density and grounding. A child learning physics doesn’t need to read millions of descriptions of falling objects. They see things fall, they drop things, they build intuitive models of gravity, inertia, and causality through continuous, high-dimensional sensory experience. LLMs only have language - a symbolic compression of reality, not reality itself.

Brown counters with a crucial insight: sample efficiency isn’t everything. A cat learns to walk in a week; a human takes a year. That doesn’t make the cat smarter than a human or an LLM. What matters is ultimate capability, not learning speed. And on almost every metric that counts - accumulated knowledge, problem-solving range, linguistic sophistication - LLMs have already surpassed cat intelligence and are pushing well beyond human performance on specific tasks.

His evidence is compelling. At the 2025 International Mathematics Olympiad, Google’s system scored better than all but the top dozen humans on the planet. These are completely novel problems, not pattern matching against training data. The system combined different mathematical ideas in ways it had never seen before. That’s not memorization - it’s genuine reasoning at an elevated level of abstraction.

The interpretability argument is particularly interesting. Brown points out that we actually have better access to LLM neurons than human neurons. We can freeze them, replay them, prod them, and trace exactly what’s happening. When you feed an LLM a math problem, mechanistic interpretability research reveals actual computational circuits forming to solve it - circuits the model learned to build on its own while being trained to predict the next token. It didn’t memorize math answers; it learned how to do math.

LeCun doesn’t dispute this. His critique is more subtle. He’s saying yes, LLMs can accumulate knowledge and perform superhuman feats on linguistic tasks. But they fundamentally lack the grounded, physical understanding that comes from embodied learning. They don’t have common sense in the way humans understand it - the intuitive physics of how objects interact, how actions have consequences, how the world actually works beyond its linguistic description.

The chess analogy cuts both ways. Brown is right that AlphaZero needed to play far more games than any human grandmaster to reach superhuman performance, but sample efficiency didn’t matter - it won. LeCun is right that this proves computers “suck at chess” in a fundamental way compared to human learning efficiency, and that delta matters when we talk about general intelligence.

The real disagreement isn’t about current LLM capabilities. It’s about what’s required to reach human-level or animal-level general intelligence. LeCun’s position: you cannot get there through text alone. You need world models trained on continuous, high-dimensional data like video. You need systems that can predict consequences in abstract representation spaces, not just predict the next token.

His evidence is stark: we have LLMs that pass the bar exam and solve college-level calculus, but we still don’t have domestic robots that can learn to clean a kitchen or self-driving cars that learn to drive in 20 hours like a teenager. The methods that work for text don’t scale to embodied intelligence.

Brown’s position is more optimistic about the current trajectory. LLMs are already demonstrating emergent capabilities that weren’t explicitly programmed - mathematical reasoning, creative problem-solving, sophisticated conversational understanding. As we scale compute, data, and architectural innovations, these capabilities will continue to expand.

The consciousness question is telling. Both say no (or “probably not”). LeCun is absolute: “absolutely not.” Brown hedges: “probably not, for appropriate definitions of consciousness.” Neither believes we’re on the precipice of doomsday - both say “renaissance” is more likely than robot overlords.

What makes this debate so valuable is that both researchers are deeply technical, deeply informed, and fundamentally disagree about what understanding requires. LeCun’s background in computer vision, convolutional networks, and now world models shapes his conviction that intelligence requires grounded, embodied learning. Brown’s work at DeepMind on systems like AlphaGo and now Gemini demonstrates what’s possible when you scale up pattern matching to unprecedented levels.

The throughline in LeCun’s argument - from his famous “machine learning sucks” slide to his new startup AMI focused on world models - is that deep learning and backpropagation are fantastic, but we need to combine them with fundamentally different training paradigms. Not next-token prediction on text, but joint embedding predictive architectures (JEPA) trained on video and other high-bandwidth sensory data.

The question isn’t binary. LLMs do understand - they extract patterns, build internal representations, perform reasoning. But their understanding is constrained by the poverty of their training signal. Language is humanity’s compressed, symbolic representation of reality. It’s lossy compression. You can recover a lot from it - more than most people expected - but you can’t recover everything.

12 Insights From LeCun and Brown on AI Understanding

The core divide: Brown argues LLMs genuinely understand through pattern matching at elevated abstraction; LeCun argues their understanding is superficial without physical grounding
Information density gap: 10^14 bytes trains an LLM on all internet text OR a vision model on what a 4-year-old has seen (16,000 hours of visual data at 2MB/s)
Sample efficiency vs. ultimate capability: Cats learn to walk faster than humans, but that doesn’t make them smarter - what matters is final performance
Mathematical reasoning: 2025 IMO results show LLMs solving novel problems at top-dozen-human level by combining concepts, not just pattern matching training data
Interpretability advantage: We have better access to LLM neurons than human neurons - can freeze, replay, and trace computational circuits forming during problem-solving
Grounding problem: LLMs pass bar exams but we still don’t have robots that learn household tasks or self-driving cars that learn in 20 hours like teenagers
Chess analogy: AlphaZero needed more games than human grandmasters to reach superhuman performance - proves both “sample inefficiency” and “ultimate superiority”
Consciousness consensus: Both researchers agree LLMs are not conscious (or “probably not”) despite understanding debate
Future outlook: Both predict “renaissance” over “doomsday” - neither fears robot overlords, both see transformative positive potential
LeCun’s path forward: World models trained on high-dimensional continuous data (video) using JEPA architectures, not just text-based next-token prediction
Mechanistic interpretability: LLMs spontaneously develop internal computational circuits to solve math problems while being trained only to predict next tokens
The binary trap: The “do they understand” question demands a gradient answer - LeCun’s “sort of” is more accurate than either yes or no

What This Means for AI Research and World Models

Do LLMs understand? “Sort of” is the honest answer. They extract patterns and perform reasoning at elevated abstraction, but their understanding is constrained by training on language - humanity’s lossy compression of reality. You can recover a lot from text, but not physical intuition. That’s why we have bar-exam-passing models but no kitchen-cleaning robots.