OpenAI Codex Code Review: How the Model Catches Bugs Your Team Might Miss

2025-11-04 OpenAI

developer-tools agents tutorial

How Codex Reviews Code by Writing Code

This is OpenAI demonstrating Codex Code Review - the feature that automatically reviews your PRs. Maya from the alignment team and Roma walk through how it works and why it matters for AI safety.

“Human verification is becoming the bottleneck.” As AI capabilities grow and coding agents produce more code, you need verification to scale proportionally. That’s the alignment motivation behind code review models - making sure “verification abilities are scaling as fast as AI capabilities.”

It’s not static analysis. The model has access to the full repository, not just the diff. It can track down dependencies, understand broader codebase context, and - critically - write Python code to test its own hypotheses. “It decided to form some hypothesis and write some Python code to test the hypothesis and check whether it’s actually correct.”

Trained for high precision. They specifically trained for bugs “that actually matter and people would be willing to fix in real life” while aiming for very low incorrect comment rate. The evaluation: much lower false positives than previous models, but “the most important evaluation is just people using it in practice.”

Already catching real bugs at OpenAI. It saved them from “critical training run bugs that would potentially delay important model releases” and configuration issues not visible from the diff alone. Alex, the Codex PM, got caught on a React/CSS bug when contributing to the VS Code extension - then asked “@Codex fair enough, fix it up.”

agents.md for custom instructions. The model looks for agents.md in your codebase for custom code review guidelines. You can specify what to pay attention to, what to ignore, even the response style. Maya’s example: “I wanted Codex to tell me every time I make a bug that I’m still an amazing programmer.”

CLI review before you push. /review in Codex CLI reviews your local changes before they make it to GitHub - catch bugs before your co-workers even see the PR.

10 Features of OpenAI’s AI Code Review System

Verification must scale with capabilities - Alignment motivation: as agents produce more code, review must keep pace
Full repo access, not just diff - Tracks dependencies, understands broader context
Writes code to test hypotheses - Not static analysis; actively verifies assumptions
High precision training - Lower false positive rate than previous models
Real bugs caught at OpenAI - Training run bugs, config issues, cross-codebase contributions
@Codex comments - Can trigger review manually with custom instructions
agents.md support - Add repo-specific review guidelines
CLI /review command - Review local changes before pushing
“Fix it up” workflow - After review, ask Codex to fix the issue it found
Draft PR technique - Review in draft stage before requesting human review

What AI Code Review Means for Software Quality

As AI writes more code, human verification becomes the bottleneck. Code review that writes code to test its own hypotheses - not just static analysis - is an alignment bet: verification must scale as fast as generation. Already catching bugs that would delay OpenAI’s training runs.