Newsfeed / Glossary / GDP val
technical

GDP val

Pronunciation

/ˌdʒiː diː ˈpiː væl/

Also known as:GDP evaluationGDP benchmarkknowledge work benchmark

What is GDP val?

GDP val is an AI evaluation benchmark introduced by OpenAI in late 2025 to measure how well AI models perform on economically valuable, real-world knowledge work tasks. The name derives from Gross Domestic Product (GDP), as the benchmark draws tasks from occupations and industries that contribute most to economic output.

Unlike traditional AI benchmarks that test abstract reasoning or standardized test performance (where models have largely saturated human-level scores), GDP val focuses on practical professional deliverables.

Key Characteristics

Real Work Products: Tasks produce actual deliverables like legal briefs, engineering blueprints, customer support conversations, nursing plans, slides, spreadsheets, and multimedia.

Expert Evaluation: Experienced professionals from relevant occupations blindly compare AI outputs against human-generated work, not knowing which is which.

Comprehensive Scope: The full dataset includes 1,300+ specialized tasks across 44 occupations.

Context-Rich Tasks: Unlike simple prompts, GDP val tasks include reference files and context, mimicking real work scenarios.

Why GDP val Matters

GDP val represents a shift in how AI progress is measured. Traditional IQ-style benchmarks have become saturated—frontier models already match or exceed top human performance on standardized tests. GDP val instead measures:

  1. Economic Impact: Direct connection to tasks that drive GDP
  2. Professional Competition: Head-to-head comparison with industry experts
  3. Practical Value: Real deliverables, not abstract problem-solving

As Wharton professor Ethan Mollick noted, GPT-5.2's 71% GDP val score means the model now beats human experts 71% of the time on tasks requiring 4-8 hours of work.

Historical Context

OpenAI introduced GDP val in September 2025, notably publishing results showing Claude outperformed their own best model at launch—a rare display of transparency about competitive positioning.

By December 2025, GPT-5.2 achieved 71% on GDP val, up from 39% for GPT-5.1 released just one month prior, demonstrating rapid progress on knowledge work capability.

Mentioned In

GDP val basically measures how good AI is at real-world knowledge work tasks, spanning legal briefs, engineering blueprints, customer support, and nursing plans.

Paul Ritzer at 00:14:00

"GDP val basically measures how good AI is at real-world knowledge work tasks, spanning legal briefs, engineering blueprints, customer support, and nursing plans."

GPT-5.2 thinking achieved a score of roughly 71%, up from 39% for GPT-5.1 thinking which came out in November.

Paul Ritzer at 00:16:00

"GPT-5.2 thinking achieved a score of roughly 71%, up from 39% for GPT-5.1 thinking which came out in November."

That eval is like 40 something different verticals that a business has to do. Make a PowerPoint, do this legal analysis, write up this little web app... a coworker that you can assign an hour's worth of tasks to and get something you like better back 74 or 70% of the time.

Sam Altman at 00:28:00

"That eval is like 40 something different verticals that a business has to do. Make a PowerPoint, do this legal analysis, write up this little web app... a coworker that you can assign an hour's worth of tasks to and get something you like better back 74 or 70% of the time."

Related Terms

See Also