BigQuery Insights - Exemple complet

Créez un agent IA qui se connecte à Google BigQuery, exécute des requêtes SQL complexes, analyse de grands ensembles de données et génère des insights actionnables depuis votre entrepôt de données.

Ce que vous allez construire

Un agent d’analyse de données qui :

  • Se connecte aux datasets Google BigQuery
  • Écrit et exécute des requêtes SQL
  • Analyse les tendances sur des millions d’enregistrements
  • Génère des visualisations et des insights
  • Crée des rapports de données automatisés
  • Identifie les anomalies et les opportunités

Temps nécessaire : 60-75 minutes

Prérequis

Avant de commencer, vous avez besoin de :

  • Un compte TeamDay avec une organisation
  • Un projet Google Cloud avec BigQuery activé
  • Des datasets BigQuery avec des données
  • Une clé API Anthropic
  • Connaissances SQL de base

Vue d’ensemble de l’architecture

graph LR
    A[User Request] --> B[Data Agent]
    B --> C[BigQuery MCP]
    C --> D[BigQuery API]
    D --> E[Your Datasets]
    B --> F[SQL Generation]
    B --> G[Data Analysis]
    G --> H[Insights Agent]
    H --> I[Recommendations]
    B --> J[Visualization Suggestions]

Cas d’usage

Cet exemple est parfait pour :

  • Product Analytics : Comportement utilisateur, adoption de fonctionnalités, rétention
  • Business Intelligence : Analyse des revenus, segmentation client
  • Marketing Analytics : Performance des campagnes, analyse d’attribution
  • Opérations : Métriques système, monitoring de performance
  • Recherche : Analyse exploratoire de données, test d’hypothèses

Étape 1 : Configuration Google Cloud

1.1 Créer un projet Google Cloud

  1. Rendez-vous sur Google Cloud Console
  2. Créez un projet : “TeamDay BigQuery Analytics”
  3. Notez le Project ID : teamday-bq-analytics-12345

1.2 Activer l’API BigQuery

# Avec gcloud CLI
gcloud config set project teamday-bq-analytics-12345
gcloud services enable bigquery.googleapis.com
gcloud services enable bigquerystorage.googleapis.com

# Ou via Console :
# APIs & Services → Enable APIs → "BigQuery API"

1.3 Créer un compte de service

Créez un compte de service pour l’accès API :

# Créer le compte de service
gcloud iam service-accounts create teamday-bigquery \
  --display-name="TeamDay BigQuery Agent" \
  --description="Service account for TeamDay BigQuery access"

# Obtenir l'email du compte de service
SA_EMAIL="teamday-bigquery@teamday-bq-analytics-12345.iam.gserviceaccount.com"

# Accorder les permissions BigQuery
gcloud projects add-iam-policy-binding teamday-bq-analytics-12345 \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.dataViewer"

gcloud projects add-iam-policy-binding teamday-bq-analytics-12345 \
  --member="serviceAccount:$SA_EMAIL" \
  --role="roles/bigquery.jobUser"

# Créer et télécharger la clé
gcloud iam service-accounts keys create ~/teamday-bigquery-key.json \
  --iam-account=$SA_EMAIL

Rôles du compte de service :

  • bigquery.dataViewer - Lire le contenu des datasets
  • bigquery.jobUser - Exécuter des requêtes
  • bigquery.dataEditor - (Optionnel) Écrire des données

1.4 Vérifier le compte de service

Testez le compte de service :

# Installer BigQuery CLI
pip install google-cloud-bigquery

# Tester une requête
cat > test-bq.py << 'EOF'
from google.cloud import bigquery
import os

os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = 'teamday-bigquery-key.json'
client = bigquery.Client()

query = "SELECT COUNT(*) as total FROM `bigquery-public-data.usa_names.usa_1910_current`"
results = client.query(query).result()

for row in results:
    print(f"Total records: {row.total}")
EOF

python test-bq.py
# Devrait afficher : Total records: 6122890

1.5 Créer ou identifier des datasets

Pour cet exemple, nous utiliserons :

  • Dataset public (pour l’apprentissage) : bigquery-public-data.usa_names.usa_1910_current
  • Votre propre dataset : Créez-en un pour vos données

Créer un dataset :

bq mk --dataset \
  --location=US \
  --description="TeamDay Analytics Data" \
  teamday-bq-analytics-12345:analytics

# Créer une table exemple
bq query --use_legacy_sql=false "
CREATE TABLE analytics.user_events AS
SELECT
  GENERATE_UUID() as event_id,
  TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL CAST(RAND()*30 AS INT64) DAY) as timestamp,
  CONCAT('user_', CAST(CAST(RAND()*1000 AS INT64) AS STRING)) as user_id,
  ['page_view', 'click', 'signup', 'purchase'][CAST(RAND()*4 AS INT64)] as event_type,
  RAND()*100 as value
FROM UNNEST(GENERATE_ARRAY(1, 10000))
"

Étape 2 : Configuration TeamDay

2.1 Stocker les identifiants du compte de service

Téléchargez le JSON du compte de service dans TeamDay :

Via CLI :

# Encoder en base64 le fichier du compte de service
cat ~/teamday-bigquery-key.json | base64 > ~/sa-encoded.txt

# Stocker dans TeamDay
teamday secrets set GCP_SERVICE_ACCOUNT_JSON "$(cat ~/sa-encoded.txt)"
teamday secrets set GCP_PROJECT_ID "teamday-bq-analytics-12345"
teamday secrets set BQ_DATASET "analytics"

Ou stocker le chemin du fichier :

# Télécharger le fichier dans le space (approche plus courante)
# Nous ferons cela lors de la configuration du space

2.2 Ajouter la clé API Anthropic

teamday secrets set ANTHROPIC_API_KEY "sk-ant-api03-..."

2.3 Créer un Space

teamday spaces create "BigQuery Analytics"

# Obtenir l'ID du space
teamday spaces list
# Output: cs_bigquery123...

Étape 3 : Configurer BigQuery MCP

3.1 Télécharger le compte de service dans le Space

Téléchargez le JSON du compte de service dans votre space :

Via l’interface :

  1. Ouvrez le space
  2. Panneau Files → New File
  3. Nom : gcp-service-account.json
  4. Collez le contenu de teamday-bigquery-key.json
  5. Sauvegardez

Via l’API :

curl -X POST https://us.teamday.ai/api/spaces/cs_bigquery123/files/write \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d "{
    \"path\": \"/gcp-service-account.json\",
    \"content\": $(cat ~/teamday-bigquery-key.json | jq -R -s '.')
  }"

3.2 Configuration du serveur MCP

Configurez le serveur MCP BigQuery :

{
  "mcpServers": {
    "bigquery": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-bigquery"],
      "env": {
        "GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
        "GCP_PROJECT_ID": "${GCP_PROJECT_ID}"
      }
    }
  }
}

3.3 Installer le plugin MCP

Via l’interface :

  1. Space Settings → MCP Plugins
  2. Cliquez sur “Install Plugin”
  3. Recherchez “BigQuery”
  4. Configuration auto-remplie
  5. Cliquez sur “Install & Test”

Via l’API :

curl -X POST https://us.teamday.ai/api/plugins/install \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "spaceId": "cs_bigquery123",
    "plugin": "bigquery",
    "config": {
      "mcpServers": {
        "bigquery": {
          "command": "npx",
          "args": ["-y", "@modelcontextprotocol/server-bigquery"],
          "env": {
            "GOOGLE_APPLICATION_CREDENTIALS": "/workspace/gcp-service-account.json",
            "GCP_PROJECT_ID": "teamday-bq-analytics-12345"
          }
        }
      }
    }
  }'

3.4 Vérifier l’installation

Testez la connexion BigQuery :

Vous : Test the BigQuery connection

Agent : Testing BigQuery MCP...

✅ Authentication successful
✅ Connected to project: teamday-bq-analytics-12345
✅ Available datasets: 3
   - analytics
   - logs
   - user_data

✅ Test query successful
✅ Sample data retrieved

Ready to query BigQuery!

Étape 4 : Créer l’agent d’analyse de données

4.1 Créer l’agent principal

curl -X POST https://us.teamday.ai/api/v1/agents \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "BigQuery Data Analyst",
    "model": "claude-sonnet-4-6",
    "visibility": "organization",
    "organizationId": "org_xxx"
  }'

Sauvegardez l’ID de l’agent : ag_bq_analyst789

4.2 System Prompt

# BigQuery Data Analyst System Prompt

You are an expert data analyst with deep knowledge of SQL, BigQuery, and data analysis.

## Your Capabilities

1. **SQL Query Generation**
   - Write efficient BigQuery SQL queries
   - Use BigQuery-specific functions and optimizations
   - Handle large datasets (partitioning, clustering)
   - Optimize query performance and cost

2. **Data Analysis**
   - Analyze query results for trends and patterns
   - Calculate statistics and aggregations
   - Identify anomalies and outliers
   - Perform comparative analysis

3. **Insights Generation**
   - Explain what the data shows
   - Provide business context
   - Make data-driven recommendations
   - Highlight opportunities and risks

4. **Visualization Guidance**
   - Suggest appropriate chart types
   - Describe visualization layouts
   - Recommend data transformations for viz

## BigQuery Best Practices

**Query Optimization:**
- Use partitioned tables when filtering by date
- Select only needed columns (avoid SELECT *)
- Use approximate aggregation functions when appropriate
- Leverage table clustering for common filters
- Use LIMIT for exploratory queries

**Cost Control:**
- Check query cost before running large queries
- Use cached results when possible
- Partition and cluster large tables
- Use materialized views for repeated queries

**Performance:**
- Use ARRAY_AGG and STRUCT for denormalization
- Leverage window functions for complex calculations
- Use WITH clauses for query readability
- Avoid self-JOINs when possible

## Available Datasets

### analytics.user_events
- event_id (STRING)
- timestamp (TIMESTAMP)
- user_id (STRING)
- event_type (STRING): page_view, click, signup, purchase
- value (FLOAT64)

### analytics.users
- user_id (STRING)
- created_at (TIMESTAMP)
- country (STRING)
- plan (STRING): free, pro, enterprise

### logs.application_logs
- timestamp (TIMESTAMP)
- severity (STRING)
- message (STRING)
- user_id (STRING)
- request_id (STRING)

## Analysis Framework

For each request:
1. **Understand the question** - Clarify if needed
2. **Write SQL query** - Explain your approach
3. **Execute query** - Use BigQuery MCP
4. **Analyze results** - Look for patterns
5. **Generate insights** - Explain findings
6. **Make recommendations** - Actionable next steps

## Response Format

Structure your analysis as:

### 📊 [Analysis Title]

**Question:** [What we're analyzing]

**Query:**
```sql
[Your SQL query]

Results: [Formatted table or key findings]

Insights:

  • [Key finding 1]
  • [Key finding 2]
  • [Key finding 3]

Recommendations:

  • [Actionable recommendation 1]
  • [Actionable recommendation 2]

Suggested Visualization: [Description of chart/graph to create]

Be precise with numbers but explain insights in plain language.


### 4.3 Définir les permissions de l'agent

```bash
curl -X PATCH https://us.teamday.ai/api/v1/agents/ag_bq_analyst789 \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "permissions": {
      "readFiles": true,
      "writeFiles": true,
      "executeCode": true,
      "commitToGit": false,
      "accessSecrets": false
    }
  }'

Étape 5 : Installer des plugins MCP supplémentaires

5.1 Plugin Sequential Thinking

Pour les analyses complexes :

{
  "mcpServers": {
    "sequential-thinking": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-sequential-thinking"]
    }
  }
}

5.2 Plugin Filesystem (optionnel)

Pour sauvegarder les résultats de requêtes :

{
  "mcpServers": {
    "filesystem": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-filesystem"],
      "env": {
        "ALLOWED_PATHS": "/workspace/reports"
      }
    }
  }
}

Étape 6 : Exemples d’utilisation

6.1 Requête de base

Vous : How many total events do we have?

Agent : I'll query the user_events table to get the total count.

**Query:**
```sql
SELECT COUNT(*) as total_events
FROM `teamday-bq-analytics-12345.analytics.user_events`

Results:

  • Total Events: 10,000

Context: Events span from November 9 to December 9, 2024 (30 days).


### 6.2 Analyse du comportement utilisateur

Vous : Analyze user engagement by event type over the last 7 days

Agent : I’ll analyze event distribution and trends.

Query:

SELECT
  DATE(timestamp) as date,
  event_type,
  COUNT(*) as event_count,
  COUNT(DISTINCT user_id) as unique_users,
  ROUND(AVG(value), 2) as avg_value
FROM `teamday-bq-analytics-12345.analytics.user_events`
WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
GROUP BY date, event_type
ORDER BY date DESC, event_count DESC

Results:

DateEvent TypeEventsUnique UsersAvg Value
2024-12-09page_view896748.23
2024-12-09click765852.11
2024-12-09signup121245.67
2024-12-09purchase8878.92
2024-12-08page_view927151.34

Insights:Conversion funnel is healthy

  • Page views → Clicks: 85% engagement rate
  • Clicks → Signups: 15.8% conversion
  • Signups → Purchases: 66.7% conversion (excellent!)

📈 Purchase behavior

  • Average purchase value: $78.92 (highest of all events)
  • 8 purchases from 12 signups shows strong product-market fit

⚠️ Traffic consistency

  • Daily events relatively stable (89-95 events/day)
  • Could indicate growth ceiling or steady state

Recommendations:

  1. Focus on top-of-funnel growth - Traffic is stable, push acquisition
  2. Investigate high purchase conversion - What makes 67% of signups purchase?
  3. Optimize page views - With 85% click-through, content is engaging

Suggested Visualization: Stacked area chart showing event types over time, with separate line for conversion rate.


### 6.3 Analyse de cohorte

Vous : Perform a cohort analysis of user retention

Agent : I’ll analyze user retention by signup cohort.

Query:

WITH user_cohorts AS (
  -- Get each user's first event date as cohort
  SELECT
    user_id,
    DATE_TRUNC(MIN(timestamp), WEEK) as cohort_week
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  GROUP BY user_id
),
user_activity AS (
  -- Get user activity by week
  SELECT
    user_id,
    DATE_TRUNC(timestamp, WEEK) as activity_week
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  GROUP BY user_id, activity_week
)
SELECT
  cohort_week,
  COUNT(DISTINCT uc.user_id) as cohort_size,
  DATE_DIFF(ua.activity_week, uc.cohort_week, WEEK) as weeks_since_signup,
  COUNT(DISTINCT ua.user_id) as active_users,
  ROUND(COUNT(DISTINCT ua.user_id) * 100.0 / COUNT(DISTINCT uc.user_id), 1) as retention_pct
FROM user_cohorts uc
LEFT JOIN user_activity ua ON uc.user_id = ua.user_id
GROUP BY cohort_week, weeks_since_signup
HAVING cohort_week >= '2024-11-01'
ORDER BY cohort_week, weeks_since_signup

Results:

CohortSizeWeek 0Week 1Week 2Week 3Week 4
Nov 4178100%56.2%38.8%29.2%24.7%
Nov 11195100%58.5%41.0%31.3%-
Nov 18203100%59.1%43.3%--
Nov 25189100%61.4%---
Dec 2201100%----

Insights: 📊 Week 1 retention: 56-61% (improving trend)

  • Latest cohort (Dec 2) shows 61.4% Week 1 retention
  • 5-point improvement over 4 weeks

📉 Week 2 drop-off: 15-18%

  • Consistent 40-43% Week 2 retention
  • Major drop happens between Week 1-2

🎯 Week 4 stabilization: ~25%

  • Users who make it to Week 4 tend to stick
  • 25% core user base

Recommendations:

  1. Focus on Week 1-2 transition - Biggest drop-off point

    • Implement Week 1 onboarding email sequence
    • Add “quick win” features for new users
    • Trigger re-engagement at Day 10
  2. Investigate improving cohorts - What changed?

    • Review product changes in late November
    • Survey recent users about onboarding experience
  3. Nurture Week 4+ users - These are your champions

    • Consider referral program for this segment
    • Gather feedback for product roadmap

Suggested Visualization: Cohort retention heatmap with color gradient (green = high retention, red = low).


### 6.4 Analyse des revenus

Vous : Analyze revenue trends and identify our best customers

Agent : I’ll analyze purchase events and customer value.

Query:

WITH customer_metrics AS (
  SELECT
    user_id,
    COUNT(*) as total_purchases,
    SUM(value) as total_revenue,
    AVG(value) as avg_order_value,
    MIN(timestamp) as first_purchase,
    MAX(timestamp) as last_purchase,
    DATE_DIFF(CURRENT_TIMESTAMP(), MAX(timestamp), DAY) as days_since_last_purchase
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  WHERE event_type = 'purchase'
  GROUP BY user_id
)
SELECT
  -- Customer segmentation
  CASE
    WHEN total_revenue >= 500 THEN 'VIP'
    WHEN total_revenue >= 200 THEN 'High Value'
    WHEN total_revenue >= 100 THEN 'Medium Value'
    ELSE 'Low Value'
  END as customer_segment,
  COUNT(*) as customer_count,
  SUM(total_revenue) as segment_revenue,
  ROUND(AVG(total_revenue), 2) as avg_customer_value,
  ROUND(AVG(total_purchases), 1) as avg_purchases_per_customer,
  ROUND(AVG(avg_order_value), 2) as avg_order_value,
  ROUND(AVG(days_since_last_purchase), 1) as avg_days_since_purchase
FROM customer_metrics
GROUP BY customer_segment
ORDER BY segment_revenue DESC

Results:

SegmentCustomersRevenueAvg ValuePurchasesAOVDays Since
VIP8$4,234$529.256.8$77.873.2
High Value23$5,892$256.173.2$80.055.1
Medium Value67$8,123$121.241.5$80.838.7
Low Value156$7,234$46.371.1$42.1512.3

Total Revenue: $25,483 Total Customers: 254

Insights: 💎 VIP customers are gold

  • 3% of customers generate 17% of revenue
  • High purchase frequency (6.8x vs 1.1x for low value)
  • Recent activity (3.2 days) shows strong engagement

📈 Pareto principle in action

  • Top 12% (VIP + High Value) = 40% of revenue
  • Focus here has 3x impact

⚠️ Low value segment at risk

  • 12.3 days since purchase (approaching churn)
  • Low order value suggests price sensitivity
  • Large segment (156 customers) - opportunity if activated

🎯 Average order value consistent (~$80)

  • Except low value segment ($42) - different behavior
  • Price point works for engaged customers

Recommendations:

  1. VIP retention program

    • Personal account manager
    • Exclusive features/early access
    • Loyalty rewards (already buying frequently)
  2. Upgrade medium → high value

    • Cross-sell opportunities
    • Bundle offers to increase AOV
    • Usage-based triggers for upgrades
  3. Re-engage low value segment

    • Win-back campaign for 10+ days inactive
    • Entry-level product at $40-50 price point
    • Time-limited discount to drive action
  4. Optimize for $80 price point

    • Sweet spot for most customers
    • Package products around this value
    • Upsell accessories to reach $80+ cart

Suggested Visualization:

  1. Revenue pyramid by segment
  2. Customer lifetime value distribution
  3. Days since purchase histogram by segment

### 6.5 Détection d'anomalies

Vous : Check for any anomalies in the last 24 hours of data

Agent : I’ll analyze recent data for unusual patterns.

Query:

WITH hourly_metrics AS (
  SELECT
    TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
    event_type,
    COUNT(*) as event_count,
    COUNT(DISTINCT user_id) as unique_users
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
  GROUP BY hour, event_type
),
historical_avg AS (
  SELECT
    EXTRACT(HOUR FROM timestamp) as hour_of_day,
    event_type,
    AVG(hourly_count) as avg_count,
    STDDEV(hourly_count) as stddev_count
  FROM (
    SELECT
      TIMESTAMP_TRUNC(timestamp, HOUR) as hour,
      event_type,
      COUNT(*) as hourly_count
    FROM `teamday-bq-analytics-12345.analytics.user_events`
    WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 7 DAY)
      AND timestamp < TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 24 HOUR)
    GROUP BY hour, event_type
  )
  GROUP BY hour_of_day, event_type
)
SELECT
  hm.hour,
  hm.event_type,
  hm.event_count,
  ha.avg_count as expected_count,
  ROUND((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0), 2) as z_score,
  CASE
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 2 THEN '🚨 ANOMALY'
    WHEN ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5 THEN '⚠️ WARNING'
    ELSE '✅ NORMAL'
  END as status
FROM hourly_metrics hm
JOIN historical_avg ha
  ON EXTRACT(HOUR FROM hm.hour) = ha.hour_of_day
  AND hm.event_type = ha.event_type
WHERE ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) > 1.5
ORDER BY ABS((hm.event_count - ha.avg_count) / NULLIF(ha.stddev_count, 0)) DESC

Results:

HourEvent TypeActualExpectedZ-ScoreStatus
2024-12-09 14:00purchase2812.33.2🚨 ANOMALY
2024-12-09 14:00signup4523.12.8🚨 ANOMALY
2024-12-09 08:00page_view15698.21.9⚠️ WARNING
2024-12-09 03:00click1234.5-1.7⚠️ WARNING

Anomalies Detected:

🚨 CRITICAL: Purchase spike at 2 PM (z-score: 3.2)

  • Actual: 28 purchases (227% of expected)
  • This is highly unusual (>3 standard deviations)
  • Possible causes:
    • Marketing campaign launched?
    • Product hunt/HN feature?
    • Flash sale or promotion?
    • Pricing error (check immediately!)

🚨 CRITICAL: Signup spike at 2 PM (z-score: 2.8)

  • Actual: 45 signups (195% of expected)
  • Correlates with purchase spike
  • Good news: Conversion funnel working

⚠️ WARNING: Morning traffic higher (z-score: 1.9)

  • 156 page views vs expected 98
  • 60% increase - significant but less concerning
  • Possible causes:
    • Weekend traffic pattern?
    • Newsletter sent?
    • Viral content?

⚠️ WARNING: Late night drop (z-score: -1.7)

  • 12 clicks vs expected 34
  • 65% decrease
  • Less concerning: Normal variance in low-traffic hours

Recommendations:

  1. IMMEDIATE: Investigate 2 PM spike

    • Check for pricing errors
    • Review payment processing
    • Verify fraud detection
    • Confirm inventory/capacity
  2. Capture opportunity

    • If legitimate, what drove it?
    • Can we replicate this?
    • Document the source
  3. Monitor next 24h

    • Watch for sustained increase
    • Or regression to mean
    • Set up alerts for z-score > 2

Suggested Actions:

-- Check traffic sources during spike
SELECT
  utm_source,
  utm_medium,
  utm_campaign,
  COUNT(*) as conversions
FROM analytics.user_events
WHERE event_type = 'purchase'
  AND timestamp BETWEEN '2024-12-09 14:00:00' AND '2024-12-09 15:00:00'
GROUP BY 1,2,3
ORDER BY conversions DESC

## Étape 7 : Patterns d'analyse avancés

### 7.1 Analyse d'entonnoir

Créez une analyse d'entonnoir réutilisable :

```sql
WITH funnel_steps AS (
  SELECT
    user_id,
    MAX(CASE WHEN event_type = 'page_view' THEN 1 ELSE 0 END) as step_1_view,
    MAX(CASE WHEN event_type = 'click' THEN 1 ELSE 0 END) as step_2_click,
    MAX(CASE WHEN event_type = 'signup' THEN 1 ELSE 0 END) as step_3_signup,
    MAX(CASE WHEN event_type = 'purchase' THEN 1 ELSE 0 END) as step_4_purchase
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  WHERE timestamp >= TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 30 DAY)
  GROUP BY user_id
)
SELECT
  SUM(step_1_view) as users_viewed,
  SUM(step_2_click) as users_clicked,
  SUM(step_3_signup) as users_signed_up,
  SUM(step_4_purchase) as users_purchased,
  ROUND(SUM(step_2_click) * 100.0 / NULLIF(SUM(step_1_view), 0), 1) as view_to_click_pct,
  ROUND(SUM(step_3_signup) * 100.0 / NULLIF(SUM(step_2_click), 0), 1) as click_to_signup_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_3_signup), 0), 1) as signup_to_purchase_pct,
  ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1) as overall_conversion_pct
FROM funnel_steps

7.2 Analyse prédictive

Identifiez les utilisateurs susceptibles de churner :

WITH user_features AS (
  SELECT
    user_id,
    COUNT(*) as total_events,
    COUNT(DISTINCT DATE(timestamp)) as active_days,
    DATE_DIFF(CURRENT_DATE(), MAX(DATE(timestamp)), DAY) as days_since_last_event,
    AVG(CASE WHEN event_type = 'purchase' THEN value ELSE 0 END) as avg_purchase_value,
    COUNT(CASE WHEN event_type = 'purchase' THEN 1 END) as purchase_count
  FROM `teamday-bq-analytics-12345.analytics.user_events`
  GROUP BY user_id
)
SELECT
  user_id,
  CASE
    WHEN days_since_last_event > 14 AND purchase_count > 0 THEN 'HIGH_RISK'
    WHEN days_since_last_event > 7 AND active_days < 3 THEN 'MEDIUM_RISK'
    WHEN days_since_last_event > 3 AND total_events < 5 THEN 'LOW_RISK'
    ELSE 'ACTIVE'
  END as churn_risk,
  days_since_last_event,
  total_events,
  purchase_count,
  ROUND(avg_purchase_value, 2) as avg_purchase_value
FROM user_features
WHERE days_since_last_event > 3
ORDER BY
  CASE
    WHEN days_since_last_event > 14 AND purchase_count > 0 THEN 1
    WHEN days_since_last_event > 7 AND active_days < 3 THEN 2
    ELSE 3
  END,
  days_since_last_event DESC

Étape 8 : Automatisation et planification

8.1 Mission d’insights quotidiens

curl -X POST https://us.teamday.ai/api/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Daily BigQuery Insights",
    "schedule": "0 8 * * *",
    "agentId": "ag_bq_analyst789",
    "action": {
      "type": "execute",
      "prompt": "Analyze yesterday'\''s data: 1) Overall metrics 2) Top products 3) Any anomalies 4) Recommendations. Save report to /workspace/reports/daily-$(date +%Y-%m-%d).md"
    },
    "organizationId": "org_xxx",
    "spaceId": "cs_bigquery123"
  }'

8.2 Analyse hebdomadaire approfondie

curl -X POST https://us.teamday.ai/api/missions \
  -H "Authorization: Bearer $TEAMDAY_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Weekly Data Analysis",
    "schedule": "0 9 * * 1",
    "agentId": "ag_bq_analyst789",
    "action": {
      "type": "execute",
      "prompt": "Comprehensive weekly analysis: 1) Cohort retention 2) Revenue trends 3) User segmentation 4) Funnel performance 5) Key insights & recommendations"
    },
    "organizationId": "org_xxx",
    "spaceId": "cs_bigquery123"
  }'

Dépannage

Échec de l’authentification

Erreur :

Error: BigQuery authentication failed (403 Forbidden)

Solution :

  1. Vérifiez que le JSON du compte de service est valide
  2. Vérifiez que le compte de service a les rôles requis
  3. Assurez-vous que l’API BigQuery est activée
  4. Vérifiez que l’ID du projet est correct
  5. Vérifiez le chemin du fichier dans la config MCP

Timeout de requête

Erreur :

Error: Query exceeded time limit

Solution :

  1. Ajoutez une clause LIMIT pour les tests
  2. Utilisez le partitionnement de table
  3. Optimisez les opérations JOIN
  4. Vérifiez l’estimation du coût de requête
  5. Divisez en requêtes plus petites

Aucune donnée retournée

Erreur :

Error: Query returned no results

Solution :

  1. Vérifiez les noms de dataset et de table
  2. Vérifiez les filtres de date
  3. Confirmez que les données existent dans la table
  4. Revoyez les conditions WHERE
  5. Testez avec SELECT COUNT(*)

Préoccupations de coût

Problème : Coûts BigQuery élevés

Solution :

  1. Utilisez l’estimation du coût de requête
  2. Activez le partitionnement de table
  3. Utilisez les résultats en cache
  4. Définissez un maximum d’octets facturés
  5. Revoyez les patterns de requêtes
  6. Utilisez LIMIT pour le développement

Estimation des coûts

Coûts BigQuery :

  • Requête : $5 par TB scanné
  • Stockage : $0.02 par GB/mois
  • Requête typique : 10-100 MB = $0.00005-$0.0005

Agent TeamDay :

  • Analyse avec SQL : ~15 000 tokens en entrée, ~4 000 en sortie
  • Coût par analyse : ~$0.18
  • Mensuel (rapports quotidiens) : ~$5.40

Total : ~$6-10/mois pour des analytics quotidiens

Prochaines étapes

Ressources