Agent Transcript
/ˈeɪdʒənt ˈtrænskrɪpt/
What is an Agent Transcript?
An agent transcript (also called a trace or trajectory) is the complete record of everything an AI agent did during an execution—every model output, tool call, reasoning step, intermediate result, and environmental interaction.
If an agent is a worker, the transcript is its detailed timesheet showing not just what it accomplished, but exactly how it got there.
Why Transcripts Matter
Agents are black boxes without transcripts. You see the input and output, but what happened in between? When an agent:
- Fails to complete a task—why?
- Produces an unexpected result—what went wrong?
- Succeeds brilliantly—what should we replicate?
The transcript tells the story.
What's in a Transcript?
A comprehensive agent transcript includes:
Model Interactions
[Turn 1] User: "Schedule a meeting with Sarah for next week"
[Turn 1] Assistant thinking: "I need to check Sarah's availability first..."
[Turn 1] Assistant response: "Let me check Sarah's calendar."
Tool Calls and Results
[Turn 1] Tool call: get_calendar(user="[email protected]", range="next_week")
[Turn 1] Tool result: {
"available_slots": ["Mon 2pm", "Wed 10am", "Thu 3pm"]
}
Reasoning and Planning
[Turn 2] Assistant thinking: "Sarah has 3 available slots. I should
ask the user which works best, or check their calendar too..."
State Changes
[Turn 3] Tool call: create_event(
title="Meeting with Sarah",
time="Wed 10am",
attendees=["[email protected]", "[email protected]"]
)
[Turn 3] State change: Calendar event #12345 created
Errors and Recovery
[Turn 4] Tool call: send_invite(event_id="12345")
[Turn 4] Error: "Rate limit exceeded, retry in 30s"
[Turn 4] Recovery: Waiting 30 seconds...
[Turn 5] Tool call: send_invite(event_id="12345")
[Turn 5] Success: Invites sent
Transcript vs. Outcome
A critical distinction:
| Concept | Definition | Example |
|---|---|---|
| Transcript | What the agent did and said | "I've scheduled your meeting" |
| Outcome | What actually changed in the world | Calendar event exists, invites sent |
Never trust the transcript alone. Agents can claim they did something without actually doing it. Always verify outcomes independently.
Using Transcripts
Debugging Failures
When an agent fails:
- Find the divergence point: Where did the agent go wrong?
- Check tool call validity: Were parameters correct?
- Examine reasoning: Did the agent misunderstand the task?
- Identify recovery failures: Did error handling work?
Evaluating Quality
Transcripts enable nuanced evaluation:
- Efficiency: Did the agent take unnecessary steps?
- Reasoning quality: Was the logic sound?
- Tool selection: Did it use the right tools?
- Error handling: Did it recover gracefully?
Training and Improvement
Good transcripts become training data:
- Successful patterns to reinforce
- Failure patterns to avoid
- Edge cases to handle better
Compliance and Audit
For enterprise agents:
- What actions were taken on whose behalf?
- What data was accessed?
- Were policies followed?
Transcript Storage Patterns
Structured Logging
{
"session_id": "abc123",
"turns": [
{
"turn_id": 1,
"timestamp": "2025-01-12T10:30:00Z",
"model_input": "...",
"model_output": "...",
"tool_calls": [...],
"tool_results": [...],
"thinking": "..."
}
]
}
Event Streams
10:30:00 | USER_INPUT | "Schedule meeting..."
10:30:01 | MODEL_THINKING | "I need to check..."
10:30:02 | TOOL_CALL | get_calendar(...)
10:30:03 | TOOL_RESULT | {available_slots: [...]}
Human-Readable Logs
=== Agent Session abc123 ===
User: Schedule a meeting with Sarah for next week
Agent thinking: I need to check Sarah's availability first.
Let me query her calendar.
[Calling get_calendar for [email protected]]
Result: 3 slots available (Mon 2pm, Wed 10am, Thu 3pm)
Transcript Best Practices
Capture Everything
- Don't filter "unimportant" steps—you don't know what matters until you debug
- Include thinking/reasoning even if not shown to users
- Log timing information for performance analysis
Make it Searchable
- Index by session, user, timestamp, tool, error type
- Enable correlation across related sessions
- Support both full-text and structured queries
Protect Sensitive Data
- Redact PII before storage (but keep for immediate debugging)
- Implement retention policies
- Control access to production transcripts
Enable Replay
- Store enough context to reproduce issues
- Link to relevant environment state
- Support "what-if" analysis
The Transcript Review Skill
Anthropic emphasizes that reviewing transcripts is a critical skill:
"Reading transcripts is essential for confirming that graders are working correctly and that evaluations are actually testing the intended behaviors."
Learning to read agent transcripts quickly and identify failure patterns is foundational for anyone building or operating AI agents.
Related Reading
- AI Agents - Systems that produce transcripts
- Agent Evaluation - Using transcripts for testing
- Agent Harness - Infrastructure that records transcripts
