architecture

JEPA

Pronunciation

/ˈdʒepə/

Also known as:Joint Embedding Predictive ArchitectureI-JEPAV-JEPA

What is JEPA?

Joint Embedding Predictive Architecture (JEPA) is Yann LeCun's proposed framework for building more human-like AI systems. First outlined in his 2022 paper "A Path Towards Autonomous Machine Intelligence," JEPA represents an alternative to the autoregressive approach used by LLMs.

The key insight: Predict abstract representations, not raw pixels or tokens. This allows the system to ignore irrelevant details while focusing on semantic understanding.

How JEPA Works

Traditional generative models (like GPT) predict the next token or pixel directly. JEPA takes a different approach:

Encode parts of an input into abstract representations (embeddings)
Predict the embedding of one part from another part
Learn by comparing predicted embeddings to actual embeddings

This happens in "embedding space" rather than "pixel/token space"—a crucial distinction that eliminates the need to model irrelevant details.

Why Not Generative Models?

LeCun argues that autoregressive generative models (LLMs, diffusion models) have fundamental limitations:

Computational waste: Predicting every pixel/token, even irrelevant ones
Uncertainty handling: Struggle with multiple valid futures
Brittleness: Sensitive to exact input formulations

JEPA can handle uncertainty by predicting distributions in embedding space, naturally accommodating multiple possible outcomes.

I-JEPA (Images)

Meta's Image-based JEPA learns by:

Taking an image and masking parts of it
Predicting the embedding of masked regions from visible regions
Comparing predicted vs. actual embeddings

Results: A 632M parameter model trained on 16 A100 GPUs in under 72 hours achieved state-of-the-art low-shot classification on ImageNet with only 12 labeled examples per class. Other methods take 2-10x more compute for worse results.

V-JEPA (Video)

V-JEPA extends the architecture to video:

"V-JEPA is a step toward a more grounded understanding of the world so machines can achieve more generalized reasoning and planning." — Yann LeCun

V-JEPA 2 has been successfully applied to robotics planning, demonstrating how JEPA can serve as a world model for real-world decision making.

Key Advantages

Aspect	Generative Models	JEPA
Prediction target	Raw pixels/tokens	Abstract embeddings
Irrelevant details	Must model everything	Can ignore noise
Uncertainty	Single output	Multiple valid outcomes
Efficiency	High compute	More efficient
Semantic focus	Surface patterns	Deeper meaning

JEPA vs. Transformers

JEPA is not an alternative to transformers—many JEPA implementations use transformer modules. It's an alternative to autoregressive generation as a learning paradigm, regardless of the underlying architecture.

The Vision

LeCun positions JEPA as the core of his vision for achieving human-level reasoning:

World model: JEPA learns how the world works
Planning: Use the world model to simulate action consequences
Reasoning: Navigate complex decision spaces

This contrasts with the "scale up LLMs" approach dominant in the industry.

Yann LeCun - Chief AI Scientist at Meta, JEPA architect
World Models - What JEPA aims to build