Newsfeed / Glossary / World Models
architecture

World Models

Pronunciation

wurld MOD-els

Definition

World models are AI systems that learn to simulate and predict how the physical world works - including spatial dynamics, intuitive physics, and cause-effect relationships that can't be learned from text alone.

Why It Matters

Current language models learn from text, which captures a lot about the world but misses embodied knowledge - how objects fall, how forces interact, how space works. World models aim to fill this gap.

Key Concepts

Beyond Language

"Language is richer than we thought, but spatial dynamics, intuitive physics, and sensorimotor experience can't be captured in text." — Demis Hassabis

Genie + Simma

Google DeepMind's approach: drop AI agents (Simma) into AI-generated worlds (Genie) and let them interact, creating infinite training environments.

"The two AIs are kind of interacting in the minds of each other."

Physics Accuracy

Generated videos may look realistic but aren't physics-accurate enough for robotics. True world models need to predict physical outcomes correctly.

Applications

  • Robotics: Agents need intuitive physics to navigate real environments
  • Planning: Understanding cause and effect enables better long-term reasoning
  • Simulation: Training in simulated worlds before deploying in reality

Current Limitations

  • Video generation looks realistic but doesn't obey physics
  • Models lack grounded understanding of spatial relationships
  • Online learning (continuing to learn after deployment) is still missing

Mentioned In

Cosmos is an open frontier world foundation model for physical AI, pre-trained on internet scale video, real driving and robotics data and 3D simulation. It turns compute into data.

Jensen Huang at 00:45:00

"Cosmos is an open frontier world foundation model for physical AI, pre-trained on internet scale video, real driving and robotics data and 3D simulation. It turns compute into data."

Language is richer than we thought, but spatial dynamics, intuitive physics, and sensorimotor experience can't be captured in text.

Demis Hassabis at 00:15:00

"Language is richer than we thought, but spatial dynamics, intuitive physics, and sensorimotor experience can't be captured in text."

Related Terms