OpenAI logo

Harness · OpenAI

Codex.

OpenAI's agentic harness. The one we reach for when a mission crosses modalities — text, screenshots, audio, video — or needs to drive software that has no clean API.

What it is

Codex is OpenAI's agentic harness — the runtime that wraps GPT-5-class models with planning loops, tool use, browser automation, and computer use. Originally pitched as a coding agent, it has grown into the general-purpose harness for any mission that requires "use this software until the task is finished."

Models it runs

  • GPT-5.5 — released April 23, 2026. Unified text/image/audio/video architecture in a single model. Read the model card.
  • GPT-5.5 Pro — same model with extended inference-time reasoning. The choice when accuracy matters more than latency.
  • GPT-5.3 Codex — the previous coding flagship from February 2026.
  • GPT-5 — the original GPT-5 base model.

What makes it distinct

  • Unified multimodal. One call processes text, screenshots, audio, and video — no stitching together GPT + Whisper + Sora behind the agent.
  • Computer use. Operates software, fills forms, drives a browser end-to-end. Strongest of the three harnesses on this dimension.
  • Knowledge work. Researching online, analyzing data, creating documents and spreadsheets — moving across tools until the task is done.
  • Web search built in. Real-time fact-checking without a separate connector.

Capabilities at a glance

  • Sub-agents: yes — parent agents call spawn_agent, message child agents, and await completion.
  • Skills: yes — system skills and per-project skills with per-tool approval modes.
  • MCP servers: yes — both as client (config.toml, parallel tool calls toggle) and as server (codex mcp-server for other MCP clients to consume Codex tools).
  • Hooks: partial — only a notify hook fires on turn completion; no full lifecycle event system.
  • Slash commands: yes — /apps, /exec, /sandbox, /mcp, /debug, plus user-defined via skills and AGENTS.md.
  • Permissions / sandboxing: three modes (read-only, workspace-write, danger-full-access); OS-level sandboxing via seatbelt (macOS), landlock (Linux), AppContainer (Windows).
  • Plugins: yes — marketplace system with OpenAI-curated and bundled marketplaces; installable from remote sources.
  • Multi-model: OpenAI Responses API, Amazon Bedrock, Ollama, and any OpenAI-compatible endpoint via config.toml.
  • Sessions: persisted to disk via SQLite; codex exec --ephemeral for stateless runs; rollout-trace bundle for diagnostics.
  • Surfaces: Ratatui TUI, headless CLI, web app, desktop app, IDE extensions (VS Code/Cursor/Windsurf), and an HTTP App Server.
  • Headless / SDK: yes — codex exec, the App Server protocol, plus TypeScript and Python client SDKs.
  • License: Apache 2.0; full Rust source on github.com/openai/codex.

How TeamDay uses it

Selecting Codex as a TeamDay agent's harness unlocks every GPT-5-family model. The day OpenAI ships a new variant, the dropdown updates — no integration release on our side.

  1. Open an agent → Settings → Harness → Codex.
  2. Pick the model. Default is GPT-5.5 for most missions; Pro for high-accuracy work.
  3. Attach MCP servers — including the media MCP for visual generation.
  4. Run a mission. Codex agents play well alongside Claude Code agents on the same workspace.

When to pick Codex

  • Mixed-modality missions — process a customer call, transcribe, extract action items, draft follow-up email — in one mission.
  • Computer-use heavy work — drive software, fill forms, navigate dashboards.
  • Knowledge work that ranges across tools (research → spreadsheet → document → email).
  • Anything where you want the agent to watch a screen and react.

How it's benchmarked

Codex is evaluated on Terminal-Bench (tbench.ai) — the standard suite for measuring how well a model-plus-harness combination completes real terminal tasks end-to-end. The leaderboard tracks how each new GPT-5 release moves Codex's score on Terminal-Bench Pro and Verified.

When to pick something else

  • Claude Code — for long-horizon coding and missions requiring self-verification.
  • Gemini CLI — when you need 2M context or Google-stack integration.