Lab 079: Deep Research Agents β Multi-Step Knowledge SynthesisΒΆ
What You'll LearnΒΆ
- How Deep Research Agents use a multi-agent pipeline for knowledge synthesis
- The Planner β Researcher β Writer β Reviewer architecture and role responsibilities
- How citation tracking ensures every claim maps back to a source
- Analyze a 14-step research trace with agent roles, token usage, and quality scores
- Identify bottlenecks, token distribution, and quality patterns across the pipeline
IntroductionΒΆ
Deep Research Agents implement a multi-step pipeline for producing well-sourced, comprehensive research reports. Instead of a single LLM generating an entire report, the work is divided across specialized agents:
The PipelineΒΆ
ββββββββββββ ββββββββββββββ ββββββββββββ ββββββββββββ
β Planner ββββββΊβ Researcher ββββββΊβ Writer ββββββΊβ Reviewer β
ββββββββββββ ββββββββββββββ ββββββββββββ ββββββββββββ
β β β β
Decomposes Gathers info Synthesizes Reviews &
query into from sources findings into provides
sub-questions with citations prose report feedback
| Agent | Role | Key Output |
|---|---|---|
| Planner | Decomposes the research question into sub-questions and creates a research plan | Sub-questions, search strategy |
| Researcher | Executes searches, reads sources, extracts key findings with citations | Findings with source citations |
| Writer | Synthesizes findings into a coherent, well-structured report | Draft report with inline citations |
| Reviewer | Reviews the draft for accuracy, completeness, and citation quality | Feedback, quality score, approval/revision |
Citation TrackingΒΆ
Every claim in the final report must trace back to a source. The pipeline tracks:
- sources_cited: Number of unique sources cited in each step
- quality_score: Agent's self-assessed quality of the output (0.0β1.0)
The ScenarioΒΆ
You are a Research Team Lead evaluating a deep research agent system. You have a 14-step research trace (research_trace.csv) from a completed research run. Your job: analyze the trace to understand agent behavior, token usage, quality patterns, and identify optimization opportunities.
Mock Data
This lab uses a mock research trace CSV. The data represents a realistic deep research run with 14 steps across 4 agent roles, including planning, multi-source research, writing, and iterative review.
PrerequisitesΒΆ
| Requirement | Why |
|---|---|
| Python 3.10+ | Run the analysis scripts |
pandas library |
Data manipulation |
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-079/ folder in your working directory.
| File | Description | Download |
|---|---|---|
broken_research.py |
Bug-fix exercise (3 bugs + self-tests) | π₯ Download |
research_trace.csv |
14-step research trace with agent roles, tokens, and quality | π₯ Download |
Step 1: Understand the Trace FormatΒΆ
Each row in the trace represents one step in the research pipeline:
| Column | Description |
|---|---|
| step_id | Sequential step number (1β14) |
| agent_role | Which agent executed this step: planner, researcher, writer, reviewer |
| action | What the agent did (e.g., decompose_query, search_sources, write_section) |
| tokens_used | Number of tokens consumed in this step |
| sources_cited | Number of sources cited in this step's output |
| quality_score | Quality assessment of this step's output (0.0β1.0) |
| duration_sec | Time taken for this step in seconds |
Step 2: Load and Explore the TraceΒΆ
import pandas as pd
df = pd.read_csv("lab-079/research_trace.csv")
print(f"Total steps: {len(df)}")
print(f"Agent roles: {df['agent_role'].value_counts().to_dict()}")
print(f"Total tokens: {df['tokens_used'].sum():,}")
print(f"Total sources cited: {df['sources_cited'].sum()}")
print(f"\nFull trace:")
print(df[["step_id", "agent_role", "action", "tokens_used", "sources_cited", "quality_score"]].to_string(index=False))
Expected output:
Total steps: 14
Agent roles: {'researcher': 6, 'writer': 4, 'reviewer': 2, 'planner': 2}
Total tokens: varies
Total sources cited: 10
Step 3: Analyze Token Usage by AgentΒΆ
print("Token usage by agent role:\n")
for role, group in df.groupby("agent_role"):
total_tokens = group["tokens_used"].sum()
avg_tokens = group["tokens_used"].mean()
steps = len(group)
print(f" {role:>12s}: {total_tokens:>7,} tokens across {steps} steps (avg {avg_tokens:,.0f}/step)")
print(f"\nTotal tokens: {df['tokens_used'].sum():,}")
# Token share by agent
total_tokens = df["tokens_used"].sum()
print("\nToken distribution:")
for role, group in df.groupby("agent_role"):
share = group["tokens_used"].sum() / total_tokens * 100
bar = "β" * int(share / 2)
print(f" {role:>12s}: {share:>5.1f}% {bar}")
Optimization Insight
The Researcher typically consumes the most tokens because it processes multiple sources per sub-question. To reduce costs, consider caching source extractions and limiting the number of sources per sub-question.
Step 4: Analyze Citation FlowΒΆ
print("Citation flow through the pipeline:\n")
for _, row in df.iterrows():
cited = "π" * row["sources_cited"] if row["sources_cited"] > 0 else "β"
print(f" Step {row['step_id']:>2}: [{row['agent_role']:>10s}] {row['action']:<25s} sources={row['sources_cited']} {cited}")
total_sources = df["sources_cited"].sum()
print(f"\nTotal sources cited across all steps: {total_sources}")
# Sources by agent role
print("\nSources cited by role:")
for role, group in df.groupby("agent_role"):
print(f" {role:>12s}: {group['sources_cited'].sum()} sources")
Step 5: Quality AnalysisΒΆ
print("Quality scores by agent role:\n")
for role, group in df.groupby("agent_role"):
avg_q = group["quality_score"].mean()
min_q = group["quality_score"].min()
max_q = group["quality_score"].max()
print(f" {role:>12s}: avg={avg_q:.2f} min={min_q:.2f} max={max_q:.2f}")
# Find the lowest-quality step
worst_step = df.loc[df["quality_score"].idxmin()]
print(f"\nLowest quality step:")
print(f" Step {worst_step['step_id']}: [{worst_step['agent_role']}] {worst_step['action']}")
print(f" Quality: {worst_step['quality_score']}")
print(f" Tokens: {worst_step['tokens_used']}")
# Find the highest-quality step
best_step = df.loc[df["quality_score"].idxmax()]
print(f"\nHighest quality step:")
print(f" Step {best_step['step_id']}: [{best_step['agent_role']}] {best_step['action']}")
print(f" Quality: {best_step['quality_score']}")
Quality Variance
Watch for quality drops in later Researcher steps β this often indicates source exhaustion (the agent is finding lower-quality sources for harder sub-questions). Consider adding a quality threshold that triggers re-search with alternative queries.
Step 6: Build the Research Analysis ReportΒΆ
writer_tokens = df[df["agent_role"] == "writer"]["tokens_used"].sum()
researcher_steps = len(df[df["agent_role"] == "researcher"])
total_duration = df["duration_sec"].sum()
report = f"""# π Deep Research Trace Analysis
## Pipeline Summary
| Metric | Value |
|--------|-------|
| Total Steps | {len(df)} |
| Total Tokens | {df['tokens_used'].sum():,} |
| Total Sources Cited | {total_sources} |
| Total Duration | {total_duration:.0f}s ({total_duration/60:.1f} min) |
| Avg Quality | {df['quality_score'].mean():.2f} |
## Agent Breakdown
| Role | Steps | Tokens | Sources | Avg Quality |
|------|-------|--------|---------|-------------|
"""
for role in ["planner", "researcher", "writer", "reviewer"]:
group = df[df["agent_role"] == role]
report += f"| {role} | {len(group)} | {group['tokens_used'].sum():,} | {group['sources_cited'].sum()} | {group['quality_score'].mean():.2f} |\n"
report += f"""
## Key Findings
- **Researcher** executed {researcher_steps} steps β the most of any agent role
- **Writer** consumed {writer_tokens:,} tokens for synthesis
- **Total sources cited**: {total_sources} across the pipeline
- **Quality** {'improved' if df.iloc[-1]['quality_score'] > df.iloc[0]['quality_score'] else 'varied'} through the pipeline
## Optimization Recommendations
1. **Cache source extractions** to reduce Researcher token usage
2. **Parallelize sub-question research** β steps are independent
3. **Add quality gates** between pipeline stages
4. **Limit sources per sub-question** to top-3 most relevant
"""
print(report)
with open("lab-079/research_analysis.md", "w") as f:
f.write(report)
print("πΎ Saved to lab-079/research_analysis.md")
π Bug-Fix ExerciseΒΆ
The file lab-079/broken_research.py contains 3 bugs that produce incorrect research analysis. Can you find and fix them all?
Run the self-tests to see which ones fail:
You should see 3 failed tests. Each test corresponds to one bug:
| Test | What it checks | Hint |
|---|---|---|
| Test 1 | Total sources cited | Should sum sources_cited, not count rows |
| Test 2 | Writer token count | Should filter agent_role == "writer", not "researcher" |
| Test 3 | Researcher step count | Should count rows where agent_role == "researcher", not sum tokens |
Fix all 3 bugs, then re-run. When you see All passed!, you're done!
π§ Knowledge CheckΒΆ
Q1 (Multiple Choice): What is the primary advantage of a multi-agent pipeline over a single-LLM approach for research?
- A) It uses fewer tokens overall
- B) Each agent specializes in one task, enabling better quality and traceability
- C) It requires only one model deployment
- D) It eliminates the need for citations
β Reveal Answer
Correct: B) Each agent specializes in one task, enabling better quality and traceability
By splitting research into planning, searching, writing, and reviewing, each agent can be optimized for its specific task. The Researcher can focus on source quality, the Writer on prose coherence, and the Reviewer on factual accuracy. This specialization typically produces higher-quality output than a single end-to-end generation.
Q2 (Multiple Choice): Why is citation tracking important in deep research agents?
- A) It reduces token usage
- B) It ensures every claim maps back to a source, enabling verification and trust
- C) It makes the report longer
- D) It is required by the LLM's terms of service
β Reveal Answer
Correct: B) It ensures every claim maps back to a source, enabling verification and trust
Citation tracking creates an auditable chain from each claim in the final report back to its source. This enables reviewers to verify factual accuracy, users to explore primary sources, and organizations to maintain research integrity β critical for high-stakes applications like legal, medical, or financial research.
Q3 (Run the Lab): What is the total number of sources cited across all steps?
Run the Step 4 analysis on π₯ research_trace.csv and sum the sources_cited column.
β Reveal Answer
10 sources
The sum of all sources_cited values across the 14 steps equals 10. Most sources are cited during Researcher steps, with some additional citations added during the Writer's synthesis.
Q4 (Run the Lab): How many total tokens did the Writer agent consume?
Run the Step 3 analysis and find the total tokens for the writer role.
β Reveal Answer
Sum of tokens_used where agent_role == "writer"
The Writer's total token count includes all writing and synthesis steps. Filter the trace for agent_role == "writer" and sum the tokens_used column to get the exact value.
Q5 (Run the Lab): How many steps did the Researcher agent execute?
Count the rows where agent_role == "researcher".
β Reveal Answer
6 steps
The Researcher executed 6 steps β the most of any agent role. This makes sense because the Researcher handles multiple sub-questions from the Planner, with each sub-question potentially requiring multiple search and extraction steps.
SummaryΒΆ
| Topic | What You Learned |
|---|---|
| Deep Research Agents | Multi-agent pipeline for knowledge synthesis with citation tracking |
| Pipeline Architecture | Planner β Researcher β Writer β Reviewer with specialized roles |
| Citation Tracking | Every claim maps back to a source across the pipeline |
| Token Distribution | Researcher uses most tokens; Writer synthesizes; Reviewer validates |
| Quality Patterns | Quality varies by step β later research steps may show source exhaustion |
| Optimization | Cache sources, parallelize research, add quality gates |
Next StepsΒΆ
- Lab 034 β Multi-Agent with Semantic Kernel (build the agents themselves)
- Lab 067 β GraphRAG (enhance research with knowledge graph retrieval)
- Lab 033 β Agent Observability (monitor deep research pipelines in production)
- Lab 076 β Microsoft Agent Framework (implement pipelines with MAF Graph Workflows)