Newsfeed / Glossary / Agent Transcript
technical

Agent Transcript

Pronunciation

/ˈeɪdʒənt ˈtrænskrɪpt/

Also known as:agent traceexecution traceagent trajectoryagent log

What is an Agent Transcript?

An agent transcript (also called a trace or trajectory) is the complete record of everything an AI agent did during an execution—every model output, tool call, reasoning step, intermediate result, and environmental interaction.

If an agent is a worker, the transcript is its detailed timesheet showing not just what it accomplished, but exactly how it got there.

Why Transcripts Matter

Agents are black boxes without transcripts. You see the input and output, but what happened in between? When an agent:

  • Fails to complete a task—why?
  • Produces an unexpected result—what went wrong?
  • Succeeds brilliantly—what should we replicate?

The transcript tells the story.

What's in a Transcript?

A comprehensive agent transcript includes:

Model Interactions

[Turn 1] User: "Schedule a meeting with Sarah for next week"
[Turn 1] Assistant thinking: "I need to check Sarah's availability first..."
[Turn 1] Assistant response: "Let me check Sarah's calendar."

Tool Calls and Results

[Turn 1] Tool call: get_calendar(user="[email protected]", range="next_week")
[Turn 1] Tool result: {
  "available_slots": ["Mon 2pm", "Wed 10am", "Thu 3pm"]
}

Reasoning and Planning

[Turn 2] Assistant thinking: "Sarah has 3 available slots. I should
ask the user which works best, or check their calendar too..."

State Changes

[Turn 3] Tool call: create_event(
  title="Meeting with Sarah",
  time="Wed 10am",
  attendees=["[email protected]", "[email protected]"]
)
[Turn 3] State change: Calendar event #12345 created

Errors and Recovery

[Turn 4] Tool call: send_invite(event_id="12345")
[Turn 4] Error: "Rate limit exceeded, retry in 30s"
[Turn 4] Recovery: Waiting 30 seconds...
[Turn 5] Tool call: send_invite(event_id="12345")
[Turn 5] Success: Invites sent

Transcript vs. Outcome

A critical distinction:

ConceptDefinitionExample
TranscriptWhat the agent did and said"I've scheduled your meeting"
OutcomeWhat actually changed in the worldCalendar event exists, invites sent

Never trust the transcript alone. Agents can claim they did something without actually doing it. Always verify outcomes independently.

Using Transcripts

Debugging Failures

When an agent fails:

  1. Find the divergence point: Where did the agent go wrong?
  2. Check tool call validity: Were parameters correct?
  3. Examine reasoning: Did the agent misunderstand the task?
  4. Identify recovery failures: Did error handling work?

Evaluating Quality

Transcripts enable nuanced evaluation:

  • Efficiency: Did the agent take unnecessary steps?
  • Reasoning quality: Was the logic sound?
  • Tool selection: Did it use the right tools?
  • Error handling: Did it recover gracefully?

Training and Improvement

Good transcripts become training data:

  • Successful patterns to reinforce
  • Failure patterns to avoid
  • Edge cases to handle better

Compliance and Audit

For enterprise agents:

  • What actions were taken on whose behalf?
  • What data was accessed?
  • Were policies followed?

Transcript Storage Patterns

Structured Logging

{
  "session_id": "abc123",
  "turns": [
    {
      "turn_id": 1,
      "timestamp": "2025-01-12T10:30:00Z",
      "model_input": "...",
      "model_output": "...",
      "tool_calls": [...],
      "tool_results": [...],
      "thinking": "..."
    }
  ]
}

Event Streams

10:30:00 | USER_INPUT | "Schedule meeting..."
10:30:01 | MODEL_THINKING | "I need to check..."
10:30:02 | TOOL_CALL | get_calendar(...)
10:30:03 | TOOL_RESULT | {available_slots: [...]}

Human-Readable Logs

=== Agent Session abc123 ===
User: Schedule a meeting with Sarah for next week

Agent thinking: I need to check Sarah's availability first.
Let me query her calendar.

[Calling get_calendar for [email protected]]
Result: 3 slots available (Mon 2pm, Wed 10am, Thu 3pm)

Transcript Best Practices

Capture Everything

  • Don't filter "unimportant" steps—you don't know what matters until you debug
  • Include thinking/reasoning even if not shown to users
  • Log timing information for performance analysis

Make it Searchable

  • Index by session, user, timestamp, tool, error type
  • Enable correlation across related sessions
  • Support both full-text and structured queries

Protect Sensitive Data

  • Redact PII before storage (but keep for immediate debugging)
  • Implement retention policies
  • Control access to production transcripts

Enable Replay

  • Store enough context to reproduce issues
  • Link to relevant environment state
  • Support "what-if" analysis

The Transcript Review Skill

Anthropic emphasizes that reviewing transcripts is a critical skill:

"Reading transcripts is essential for confirming that graders are working correctly and that evaluations are actually testing the intended behaviors."

Learning to read agent transcripts quickly and identify failure patterns is foundational for anyone building or operating AI agents.

Mentioned In

The transcript is the complete record of a trial, including outputs, tool calls, reasoning, intermediate results, and any other interactions.

Anthropic Engineering at 00:00:00

"The transcript is the complete record of a trial, including outputs, tool calls, reasoning, intermediate results, and any other interactions."

Related Terms

See Also