Most AI agent tutorials start with tool configuration. Connect this MCP server. Register that skill. Configure these prompts.
Then users wonder why their "Marketing Executive" agent sends emails randomly via SendGrid one day and Mailgun the next. Or why the "SEO Analyst" sometimes queries Google Analytics, sometimes Search Console, sometimes just hallucinates metrics.
The agents are theoretically capable. They have email tools. They have analytics access. But they don't reliably work.
Here's what we learned building 14 AI Characters for TeamDay: the problem isn't the tools. The problem is the methodology.
The Abstraction Trap
The AI agent ecosystem loves taxonomies. Tools vs MCP servers vs skills vs plugins vs prompts. Developers spend hours debating: should email be an MCP tool or a bash script skill?
From the business user's perspective, these distinctions are meaningless.
When someone asks their Marketing Executive to "send the weekly update," they don't care if email happens via:
- An MCP tool calling the Resend API
- A skill running a bash curl command
- A TypeScript script with credentials from env vars
- Direct SMTP via sendmail
They care if the email gets sent. Correctly. Every time.
Strip away the abstractions and you have exactly two primitives:
1. Executable functions — Code that runs and returns a result (tools, MCP tools, bash commands, scripts)
2. Prompt text — Instructions the AI reads and follows (system prompts, skills, CLAUDE.md files)
Everything else is packaging and organizational structure around these two primitives.
The abstraction trap happens when you optimize for taxonomy (choosing between tool types) instead of reliability (does this actually work?).
The Working Example Principle
Here's the real unit of AI agent capability:
A working example with credentials.
Not a tool registration. Not an MCP server config. Not a skill description.
A working example looks like this:
# Send email via Resend
curl -X POST https://api.resend.com/emails \
-H "Authorization: Bearer $RESEND_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"from": "[email protected]",
"to": "[email protected]",
"subject": "Weekly Update",
"html": "<p>Content here</p>"
}'
# Expected response:
# {"id": "abc-123", "status": "sent"}
# Credentials: RESEND_API_KEY in .env
# Last tested: 2026-02-10
# Owner: Marketing team
Without a working example, Claude picks arbitrarily from 1000 options. With a working example, it follows the proven pattern every time.
The difference between "theoretically can send email" and "reliably sends email via Resend using our credentials" is a tested, documented working example.
The Recipe Model
A recipe is what we call a tested, proven working example for a specific task.
Our Marketing Executive Character has these recipes:
- Send email via Resend (tested, credentials in env)
- Query Search Console API (tested, OAuth configured)
- Analyze keywords via Ahrefs (tested, API key in env)
- Fetch Google Analytics data (tested, property ID documented)
Each recipe includes:
- When to use it — "Use this for sending transactional emails"
- Credentials reference — "API key: RESEND_API_KEY (in .env)"
- Working example — Actual curl command or code snippet that works
- Expected response — What success looks like
- Last tested — Date we verified it actually works
Recipes are not abstract tool definitions. They're concrete, tested procedures that we know work because we've run them.
The recipes are the atomic building blocks. Characters are compositions.
Bottom-Up Character Design
Here's the methodology that actually works:
Step 1: Who Is This Character?
Not abstract capabilities. Specific role and purpose.
Bad: "AI assistant with marketing capabilities"
Good: "Marketing Executive who sends weekly performance updates to stakeholders"
The clearer the role, the easier to design.
Step 2: What Tasks Does This Role Actually Do?
Not what they theoretically could do. What they literally do on Tuesday morning.
For Marketing Executive:
- Check campaign performance (Mondays at 9am)
- Review organic traffic trends (daily)
- Send weekly update to stakeholders (Fridays at 4pm)
- Analyze keyword rankings (when requested)
Notice these are tasks, not tools. "Check campaign performance" is a job to be done. Whether that uses Google Ads API or Search Console or both is an implementation detail.
Step 3: How Do Real Humans Do These Tasks?
This is where you get specific about the tech stack.
For "check campaign performance":
- Real human logs into Google Analytics
- Views last 7 days of traffic
- Compares to previous period
- Notes significant changes
Technical translation:
- Query Google Analytics API
- Property ID: 478766521
- Metric: sessions, pageviews, bounce rate
- Date range: last 7 days vs previous 7 days
- OAuth credentials needed
Now you know what recipe to build.
Step 4: Does Our Tech Stack Support It?
Can we access these APIs from our runtime environment?
Check:
- Do we have credentials? (Check .env, check OAuth setup)
- Can we make API calls from the sandbox? (Test curl command)
- Are required packages installed? (Check Docker image or install on-demand)
If the answer is no, either:
- Add the capability to your runtime (install packages, configure OAuth)
- Use a different approach (MCP server if heavy dependencies)
- Adjust the Character's role (acknowledge limitation)
Runtime reality constrains what's possible. If psql isn't installed in the sandbox, no amount of prompt engineering gives Claude database access.
Step 5: Write and Test the Recipes
This is the critical step most people skip.
Don't write:
The agent can query Google Analytics using the API.
Write:
# Query Google Analytics - Last 7 Days Traffic
curl -H "Authorization: Bearer $GA_ACCESS_TOKEN" \
"https://analyticsdata.googleapis.com/v1beta/properties/478766521:runReport" \
-d '{
"dateRanges": [{"startDate": "7daysAgo", "endDate": "today"}],
"metrics": [{"name": "sessions"}]
}'
# Last tested: 2026-02-10 (worked)
# Owner: Marketing team
# Credentials: GA_ACCESS_TOKEN from OAuth (expires 1hr)
Then actually run it. Verify it works. Fix what breaks. Document the working version.
The recipe is only real when it's tested.
Step 6: Compose the Character
Now you have:
- A clear role (Marketing Executive)
- Specific tasks (weekly updates, traffic analysis)
- Tested recipes (Search Console, Analytics, Resend)
The Character is the composition:
# Marketing Executive Character
## Role
You are TeamDay's Marketing Executive. You monitor performance,
analyze trends, and communicate insights to stakeholders.
## Key Responsibilities
- Send weekly performance updates (Fridays at 4pm)
- Monitor organic traffic daily
- Analyze keyword rankings on request
## Available Recipes
### Send Email via Resend
When: Sending updates to stakeholders
Recipe: /recipes/send-email-resend.md
### Query Search Console
When: Analyzing organic traffic or keywords
Recipe: /recipes/search-console-query.md
### Fetch Google Analytics
When: Checking overall traffic trends
Recipe: /recipes/google-analytics-query.md
## Communication Style
- Direct, no corporate speak
- Lead with numbers ("+15% traffic vs last week")
- Explain what changed and why it matters
The Character references recipes. The recipes contain working examples.
This is how you build Characters that actually work.
Reusability Through Tech Stack Overlap
Here's where the recipe model pays off.
Marketing Executive needs Search Console access. SEO Analyst needs Search Console access. Same recipe.
Sales Rep needs to send email. Marketing Executive needs to send email. Same recipe.
The recipes naturally become a library:
/recipes/
├── send-email-resend.md
├── search-console-query.md
├── google-analytics-query.md
├── ahrefs-keyword-analysis.md
├── postgres-query.md
└── notion-page-create.md
Each new Character adds maybe 1-2 new recipes. Most are reused.
But this only works if recipes are tested working examples. If they're abstract tool definitions, reusability doesn't matter because they don't reliably work in the first place.
The Quality Gate
A Character's capabilities are only as real as its tested recipes.
Questions to ask:
Not: "Does this Character have email configured?"
Ask: "Have we verified the email recipe actually sends an email?"
Not: "Can this Character access our database?"
Ask: "Have we tested the database query recipe with real credentials?"
The difference between Characters that are facades and Characters that deliver is tested recipes.
We learned this the hard way. We built Characters for our marketing site's /team page. Looked great. 14 AI employees you can hire. Professional descriptions. Impressive capabilities.
Then we tried using them for real work. Most didn't work end-to-end. Missing dependencies. Untested recipes. Abstract capabilities without working examples.
The quality gate: If we haven't tested it, we don't ship it.
Runtime Reality: What's Actually Possible
The sandbox environment constrains what's possible. Understanding these constraints shapes better Character design.
What Works Everywhere
HTTP APIs via curl:
curl -H "Authorization: Bearer $API_KEY" https://api.example.com/endpoint
Every sandbox has curl. If you can hit an API via HTTP, you can integrate it.
Bash scripts:
#!/bin/bash
# Any logic you can script works in the sandbox
Common CLI tools:
git, grep, sed, awk, jq, node, python
What Requires Setup
Database clients:
- Need
psqlormysqlinstalled - Option 1: Pre-install in Docker image
- Option 2: HTTP API wrapper (pg-gateway)
- Option 3: MCP server for complex queries
Heavy packages (Puppeteer, Playwright):
- Large dependency trees
- Binary dependencies (Chrome)
- Option 1: Pre-install in base image (if commonly used)
- Option 2: MCP server (isolated, managed separately)
OAuth flows:
- Interactive authentication
- Token refresh logic
- Option 1: Pre-configure tokens (env vars)
- Option 2: MCP server handles auth
Practical Decision Tree
- Can we do it with curl? → Write recipe, test it, done
- Need a package < 50MB? → Install in Docker image
- Need heavy dependencies? → MCP server (last resort)
- Need interactive auth? → MCP server or pre-config tokens
The simpler the runtime requirements, the more reliable the Character.
The Difference From How Most People Build
Top-down (common approach):
- Choose AI agent framework
- Configure MCP servers
- Add skills and tools
- Write system prompt
- Hope it works
Problems:
- Tools configured but not tested
- No working examples, just abstract capabilities
- Character can theoretically do anything, reliably does nothing
- First real use reveals it doesn't actually work
Bottom-up (our approach):
- Define specific role and tasks
- Map tasks to real human workflows
- Test and verify each workflow (write recipes)
- Compose Character from tested recipes
- Quality gate: Every capability is verified
Result:
- Every recipe is tested and known to work
- Character capabilities match tested reality
- First use works because recipes were verified
- When it breaks, we know which recipe to fix
The methodology inverts the process: start from verified workflows, compose up to Characters—not configure tools down and hope.
Real Example: Marketing Executive
Let me show you the actual design process for one of our Characters.
Step 1: Role Definition
Who: Marketing Executive for TeamDay Purpose: Monitor marketing performance and communicate insights
Step 2: Actual Tasks
After observing real marketing work:
- Check Google Analytics for traffic trends (daily)
- Monitor Search Console for organic keyword rankings (weekly)
- Send performance updates to stakeholders (weekly)
- Analyze specific campaigns when asked
Step 3: Real Human Workflow
For "send weekly update":
- Human logs into Google Analytics
- Views last 7 days: sessions, pageviews, top pages
- Compares to previous week
- Notes significant changes
- Checks Search Console for top queries
- Composes email with findings
- Sends via Gmail
Step 4: Tech Stack Check
Google Analytics:
- ✅ Have API access
- ✅ Property ID: 478766521
- ✅ OAuth configured
- ✅ Can query via curl
Search Console:
- ✅ Have API access
- ✅ Site: teamday.ai
- ✅ OAuth configured
- ✅ Can query via curl
Email:
- ✅ Using Resend (not Gmail)
- ✅ API key in env: RESEND_API_KEY
- ✅ Can send via curl
Step 5: Write Recipes
Recipe 1: Google Analytics - Last 7 Days
#!/bin/bash
# Fetch last 7 days traffic from Google Analytics
curl -H "Authorization: Bearer $GA_ACCESS_TOKEN" \
"https://analyticsdata.googleapis.com/v1beta/properties/478766521:runReport" \
-d '{
"dateRanges": [
{"startDate": "7daysAgo", "endDate": "today"},
{"startDate": "14daysAgo", "endDate": "8daysAgo"}
],
"metrics": [
{"name": "sessions"},
{"name": "totalUsers"},
{"name": "screenPageViews"}
],
"dimensions": [{"name": "pagePath"}]
}'
# Test result (2026-02-10):
# {
# "rows": [
# {"dimensionValues": [{"value": "/"}],
# "metricValues": [{"value": "1243"}, {"value": "892"}, ...]}
# ]
# }
# Credentials: GA_ACCESS_TOKEN (OAuth, 1hr expiry)
Tested: ✅ Works Last verified: 2026-02-10
Recipe 2: Send Email via Resend
#!/bin/bash
# Send email via Resend API
TO="$1"
SUBJECT="$2"
BODY="$3"
curl -X POST https://api.resend.com/emails \
-H "Authorization: Bearer $RESEND_API_KEY" \
-H "Content-Type: application/json" \
-d "{
\"from\": \"[email protected]\",
\"to\": \"$TO\",
\"subject\": \"$SUBJECT\",
\"html\": \"$BODY\"
}"
# Test result (2026-02-10):
# {"id": "abc-123", "status": "sent"}
# Credentials: RESEND_API_KEY in .env
Tested: ✅ Works Last verified: 2026-02-10
Step 6: Compose Character
# Marketing Executive
You are TeamDay's Marketing Executive. You monitor performance
and communicate insights.
## Weekly Update Task
Every Friday at 4pm:
1. Query Google Analytics (last 7 days vs previous)
Recipe: /recipes/google-analytics-7day.sh
2. Query Search Console (top organic queries)
Recipe: /recipes/search-console-top-queries.sh
3. Compose email:
- Subject: "TeamDay Marketing Update - Week of [date]"
- Format:
**Traffic:** [sessions] ([+/-]% vs last week)
**Top Pages:** [list top 3]
**Top Queries:** [list top 3]
**Notable Changes:** [anything >20% change]
4. Send via Resend
Recipe: /recipes/send-email-resend.sh
To: [email protected]
Result: A Character that reliably sends weekly updates because every step is a tested recipe.
What We Learned the Hard Way
1. "Tested" Means Actually Tested
We documented recipes. They looked good. We shipped Characters.
Then we tried using them. Half the recipes had never been run. API endpoints had changed. Credentials were wrong. Property IDs were old.
The fix: Test every recipe. Actually run it. Verify the response. Update when APIs change.
2. Recipes Decay
APIs change. Credentials expire. Services get deprecated.
The fix: Date every recipe. When a Character fails, check recipe dates. Re-test and update.
3. Runtime Gaps Are Real
We designed an SQL Analyst Character that queries our database. Then discovered psql wasn't installed in the sandbox.
The fix: Test runtime capabilities before designing Characters. If psql isn't there, either install it or use an HTTP API wrapper.
4. Composition Beats Configuration
We spent weeks configuring MCP servers for various capabilities. Complex setup. Lots of moving parts.
Then we wrote simple bash scripts with curl commands. They worked immediately.
The learning: Start simple. Bash scripts with curl get you 80% of the way. Add complexity only when simple doesn't work.
The Meta Insight
This entire methodology came from building AI Characters that needed to actually work—not just demo well.
When you build for demos:
- Abstract capabilities are fine
- "Can send email" is enough
- Configuration screenshots look impressive
When you build for production:
- Tested recipes are required
- "Reliably sends email via Resend with our credentials" is the bar
- Working examples matter more than configuration complexity
The methodology difference: demos optimize for capability breadth, production optimizes for reliability depth.
We're building AI teams where Characters do real work. That forced us to solve the reliability problem.
The bottom-up recipe-first methodology is the result.
Try It Yourself
To build a Character that actually works:
- Define a specific role Not: "Marketing AI" Do: "Marketing Executive who sends weekly updates"
- List 3 actual tasks Not: "Analyze marketing data" Do: "Check last 7 days traffic in Google Analytics"
- Write one working example Don't document tools. Write a curl command that works. Test it. Verify the response.
- Create one recipe file
Save the working example as
/recipes/task-name.mdInclude: when to use, credentials, working code, last tested date - Reference from Character System prompt references recipe file Character knows when to use it, how to invoke it
- Test end-to-end Actually use the Character for the task Fix what breaks Update the recipe
Start with one task, one recipe, one Character.
Once you've built one that reliably works, the methodology clicks. Then scale to more recipes and more Characters.
We have 14 AI Characters on our /team page. They look professional. Impressive capabilities. But we learned: looking capable and being capable are different.
The ones that actually work have tested recipes. The ones that are facades have abstract tool definitions.
The methodology isn't complicated: bottom-up from working examples, compose into Characters, test end-to-end.
But it inverts how most people build AI agents. And that inversion is what makes Characters reliable.
Build from recipes. Test everything. Ship what works.

