Codex — The OpenAI Agentic Harness in TeamDay

What it is

Codex is OpenAI's agentic harness — the runtime that wraps GPT-5-class models with planning loops, tool use, browser automation, and computer use. Originally pitched as a coding agent, it has grown into the general-purpose harness for any mission that requires "use this software until the task is finished."

Models it runs

GPT-5.5 — released April 23, 2026. Unified text/image/audio/video architecture in a single model. Read the model card.
GPT-5.5 Pro — same model with extended inference-time reasoning. The choice when accuracy matters more than latency.
GPT-5.3 Codex — the previous coding flagship from February 2026.
GPT-5 — the original GPT-5 base model.

What makes it distinct

Unified multimodal. One call processes text, screenshots, audio, and video — no stitching together GPT + Whisper + Sora behind the agent.
Computer use. Operates software, fills forms, drives a browser end-to-end. Strongest of the three harnesses on this dimension.
Knowledge work. Researching online, analyzing data, creating documents and spreadsheets — moving across tools until the task is done.
Web search built in. Real-time fact-checking without a separate connector.

Capabilities at a glance

Sub-agents: yes — parent agents call spawn_agent, message child agents, and await completion.
Skills: yes — system skills and per-project skills with per-tool approval modes.
MCP servers: yes — both as client (config.toml, parallel tool calls toggle) and as server (codex mcp-server for other MCP clients to consume Codex tools).
Hooks: partial — only a notify hook fires on turn completion; no full lifecycle event system.
Slash commands: yes — /apps, /exec, /sandbox, /mcp, /debug, plus user-defined via skills and AGENTS.md.
Permissions / sandboxing: three modes (read-only, workspace-write, danger-full-access); OS-level sandboxing via seatbelt (macOS), landlock (Linux), AppContainer (Windows).
Plugins: yes — marketplace system with OpenAI-curated and bundled marketplaces; installable from remote sources.
Multi-model: OpenAI Responses API, Amazon Bedrock, Ollama, and any OpenAI-compatible endpoint via config.toml.
Sessions: persisted to disk via SQLite; codex exec --ephemeral for stateless runs; rollout-trace bundle for diagnostics.
Surfaces: Ratatui TUI, headless CLI, web app, desktop app, IDE extensions (VS Code/Cursor/Windsurf), and an HTTP App Server.
Headless / SDK: yes — codex exec, the App Server protocol, plus TypeScript and Python client SDKs.
License: Apache 2.0; full Rust source on github.com/openai/codex.

How TeamDay uses it

Selecting Codex as a TeamDay agent's harness unlocks every GPT-5-family model. The day OpenAI ships a new variant, the dropdown updates — no integration release on our side.

Open an agent → Settings → Harness → Codex.
Pick the model. Default is GPT-5.5 for most missions; Pro for high-accuracy work.
Attach MCP servers — including the media MCP for visual generation.
Run a mission. Codex agents play well alongside Claude Code agents on the same workspace.

When to pick Codex

Mixed-modality missions — process a customer call, transcribe, extract action items, draft follow-up email — in one mission.
Computer-use heavy work — drive software, fill forms, navigate dashboards.
Knowledge work that ranges across tools (research → spreadsheet → document → email).
Anything where you want the agent to watch a screen and react.

How it's benchmarked

Codex is evaluated on Terminal-Bench (tbench.ai) — the standard suite for measuring how well a model-plus-harness combination completes real terminal tasks end-to-end. The leaderboard tracks how each new GPT-5 release moves Codex's score on Terminal-Bench Pro and Verified.

When to pick something else

Claude Code — for long-horizon coding and missions requiring self-verification.
Gemini CLI — when you need 2M context or Google-stack integration.

Codex.