technical

Agent Transcript

Pronunciation

/ˈeɪdʒənt ˈtrænskrɪpt/

Also known as:agent traceexecution traceagent trajectoryagent log

What is an Agent Transcript?

An agent transcript (also called a trace or trajectory) is the complete record of everything an AI agent did during an execution—every model output, tool call, reasoning step, intermediate result, and environmental interaction.

If an agent is a worker, the transcript is its detailed timesheet showing not just what it accomplished, but exactly how it got there.

Why Transcripts Matter

Agents are black boxes without transcripts. You see the input and output, but what happened in between? When an agent:

Fails to complete a task—why?
Produces an unexpected result—what went wrong?
Succeeds brilliantly—what should we replicate?

The transcript tells the story.

What's in a Transcript?

A comprehensive agent transcript includes:

Model Interactions

[Turn 1] User: "Schedule a meeting with Sarah for next week"
[Turn 1] Assistant thinking: "I need to check Sarah's availability first..."
[Turn 1] Assistant response: "Let me check Sarah's calendar."

Tool Calls and Results

[Turn 1] Tool call: get_calendar(user="[email protected]", range="next_week")
[Turn 1] Tool result: {
  "available_slots": ["Mon 2pm", "Wed 10am", "Thu 3pm"]
}

Reasoning and Planning

[Turn 2] Assistant thinking: "Sarah has 3 available slots. I should
ask the user which works best, or check their calendar too..."

State Changes

[Turn 3] Tool call: create_event(
  title="Meeting with Sarah",
  time="Wed 10am",
  attendees=["[email protected]", "[email protected]"]
)
[Turn 3] State change: Calendar event #12345 created

Errors and Recovery

[Turn 4] Tool call: send_invite(event_id="12345")
[Turn 4] Error: "Rate limit exceeded, retry in 30s"
[Turn 4] Recovery: Waiting 30 seconds...
[Turn 5] Tool call: send_invite(event_id="12345")
[Turn 5] Success: Invites sent

Transcript vs. Outcome

A critical distinction:

Concept	Definition	Example
Transcript	What the agent did and said	"I've scheduled your meeting"
Outcome	What actually changed in the world	Calendar event exists, invites sent

Never trust the transcript alone. Agents can claim they did something without actually doing it. Always verify outcomes independently.

Using Transcripts

Debugging Failures

When an agent fails:

Find the divergence point: Where did the agent go wrong?
Check tool call validity: Were parameters correct?
Examine reasoning: Did the agent misunderstand the task?
Identify recovery failures: Did error handling work?

Evaluating Quality

Transcripts enable nuanced evaluation:

Efficiency: Did the agent take unnecessary steps?
Reasoning quality: Was the logic sound?
Tool selection: Did it use the right tools?
Error handling: Did it recover gracefully?

Training and Improvement

Good transcripts become training data:

Successful patterns to reinforce
Failure patterns to avoid
Edge cases to handle better

Compliance and Audit

For enterprise agents:

What actions were taken on whose behalf?
What data was accessed?
Were policies followed?

Transcript Storage Patterns

Structured Logging

{
  "session_id": "abc123",
  "turns": [
    {
      "turn_id": 1,
      "timestamp": "2025-01-12T10:30:00Z",
      "model_input": "...",
      "model_output": "...",
      "tool_calls": [...],
      "tool_results": [...],
      "thinking": "..."
    }
  ]
}

Event Streams

10:30:00 | USER_INPUT | "Schedule meeting..."
10:30:01 | MODEL_THINKING | "I need to check..."
10:30:02 | TOOL_CALL | get_calendar(...)
10:30:03 | TOOL_RESULT | {available_slots: [...]}

Human-Readable Logs

=== Agent Session abc123 ===
User: Schedule a meeting with Sarah for next week

Agent thinking: I need to check Sarah's availability first.
Let me query her calendar.

[Calling get_calendar for [email protected]]
Result: 3 slots available (Mon 2pm, Wed 10am, Thu 3pm)

Transcript Best Practices

Capture Everything

Don't filter "unimportant" steps—you don't know what matters until you debug
Include thinking/reasoning even if not shown to users
Log timing information for performance analysis

Make it Searchable

Index by session, user, timestamp, tool, error type
Enable correlation across related sessions
Support both full-text and structured queries

Protect Sensitive Data

Redact PII before storage (but keep for immediate debugging)
Implement retention policies
Control access to production transcripts

Enable Replay

Store enough context to reproduce issues
Link to relevant environment state
Support "what-if" analysis

The Transcript Review Skill

Anthropic emphasizes that reviewing transcripts is a critical skill:

"Reading transcripts is essential for confirming that graders are working correctly and that evaluations are actually testing the intended behaviors."

Learning to read agent transcripts quickly and identify failure patterns is foundational for anyone building or operating AI agents.

AI Agents - Systems that produce transcripts
Agent Evaluation - Using transcripts for testing
Agent Harness - Infrastructure that records transcripts

Mentioned In

Anthropic Engineering at 00:00:00

"The transcript is the complete record of a trial, including outputs, tool calls, reasoning, intermediate results, and any other interactions."