Why Karpathy's "Ghosts vs Animals" Framing Matters

This is Andrej Karpathy at his most philosophical - not teaching neural networks, but wrestling with what we're actually building. The "ghosts, not animals" framing is provocative and important.

The core insight: LLMs emerged from a fundamentally different optimization process than biological intelligence. Animals are evolved - they come with massive amounts of hardcoded hardware. A zebra runs minutes after birth. That's not reinforcement learning, that's millions of years of evolution encoding weights into DNA through some mechanism we don't understand. LLMs, by contrast, are trained by imitating internet documents. They're "ethereal spirit entities" - fully digital, mimicking humans, starting from a completely different point in the space of possible intelligences.

"Decade of agents, not year of agents" is Karpathy pushing back on lab hype. He's been in AI for 15 years, watched predictions fail repeatedly, and has calibrated intuitions. The problems are tractable but difficult. When would you actually hire Claude as an intern? You wouldn't today because it just doesn't work reliably enough. That gap will take a decade to close.

Pre-training as "crappy evolution" is a useful mental model. Evolution gives animals a starting point with built-in algorithms and representations. Pre-training does something analogous but through a practically achievable process - pattern completion on internet documents. The interesting nuance: pre-training does two things simultaneously: (1) picks up knowledge, and (2) boots up intelligence circuits through observing algorithmic patterns. Karpathy thinks the knowledge part might actually be holding models back - making them rely too much on memorization rather than reasoning.

The compression difference explains a lot. Llama 3 stores about 0.7 bits per token from its 15 trillion token training set. The KV cache during inference stores 320 kilobytes per token - a 35 million fold difference. Anything in the weights is a "hazy recollection." Anything in context is working memory, directly accessible. This explains why in-context learning feels more intelligent than what's baked into weights.

8 Insights From Karpathy on LLMs and Agent Development

"Ghosts, not animals" - LLMs are digital entities mimicking humans, not evolved intelligences with hardcoded hardware
Decade of agents, not year - Current agents are impressive but cognitively lacking; reliable "AI employees" are 10 years out
Pre-training is crappy evolution - A practically achievable way to get starting representations, but very different from biological optimization
Knowledge might hurt - Models that rely less on memorized knowledge and more on reasoning might be better at novel problems
Working memory vs hazy recollection - KV cache (context) is 35 million times more information-dense than weights per token
In-context learning may run internal gradient descent - Some papers suggest attention layers implement something like optimization
Missing brain parts - Transformer ≈ cortical tissue, reasoning traces ≈ prefrontal cortex, but many structures remain unexplored
Early agent attempts were premature - Universe project (2016) failed because models lacked representational power; had to get LLMs first

What This Means for AI Architecture

We're not building artificial humans - we're building something entirely new. LLMs are "ghosts" that emerged from imitating text, not "animals" shaped by evolution. Understanding this difference is essential for building systems that complement rather than poorly imitate human intelligence.

Andrej Karpathy: We're Building Ghosts, Not Animals

Why Karpathy's "Ghosts vs Animals" Framing Matters

8 Insights From Karpathy on LLMs and Agent Development

What This Means for AI Architecture