Lab 081: Agentic Coding Tools β Claude Code vs Copilot CLIΒΆ
What You'll LearnΒΆ
- What agentic coding tools are β AI assistants that operate directly in your terminal with full codebase context
- Compare Claude Code and GitHub Copilot CLI across 10 real-world developer tasks
- Understand how each tool handles code understanding, generation, debugging, and git workflows
- Measure time savings versus manual approaches for common development tasks
- Debug a broken comparison analysis script by fixing 3 bugs
IntroductionΒΆ
A new category of developer tools has emerged: agentic coding assistants that run in your terminal, read your entire codebase, and execute multi-step tasks autonomously. Unlike IDE-based copilots that suggest single lines or blocks, these tools can search codebases, write tests, create commits, refactor modules, and debug failing pipelines β all from a single natural-language prompt.
Two leading tools in this space are:
| Tool | Vendor | How It Works |
|---|---|---|
| Claude Code | Anthropic | Terminal agent that reads your codebase, executes commands, and edits files directly |
| GitHub Copilot CLI | GitHub | Terminal agent integrated with GitHub ecosystem, runs commands and edits files |
Both tools share a common pattern: they accept a natural-language task, analyze your codebase for context, plan an approach, and execute it β often in a single interaction.
The ScenarioΒΆ
You are a Tech Lead at OutdoorGear Inc. evaluating terminal-based coding assistants for your engineering team. You've benchmarked both tools across 10 representative developer tasks and now need to analyze the results to make a recommendation.
No Tool Installation Required
This lab analyzes a pre-recorded benchmark dataset comparing task completion times and success rates. You don't need Claude Code or Copilot CLI installed β all analysis is done locally with pandas.
PrerequisitesΒΆ
| Requirement | Why |
|---|---|
| Python 3.10+ | Run analysis scripts |
pandas library |
DataFrame operations |
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-081/ folder in your working directory.
| File | Description | Download |
|---|---|---|
broken_tools.py |
Bug-fix exercise (3 bugs + self-tests) | π₯ Download |
coding_tools_comparison.csv |
Dataset β 10 tasks compared across tools | π₯ Download |
Step 1: Understanding Agentic Coding ToolsΒΆ
Both Claude Code and Copilot CLI follow a similar agent loop:
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β User Prompt ββββββΆβ Codebase ββββββΆβ Plan & β
β (terminal) β β Analysis β β Execute β
ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ
β
ββββββββββββββββ β
β Edit files, ββββββββββββββ
β run commandsβ
ββββββββββββββββ
Key capabilities shared by both tools:
| Capability | Description |
|---|---|
| Codebase understanding | Read and reason about project structure, dependencies, and patterns |
| Code generation | Write new code (functions, tests, modules) aligned with project conventions |
| Debugging | Analyze errors, trace issues, and apply fixes |
| Git workflows | Stage changes, create commits with conventional messages, manage branches |
| Refactoring | Restructure code while preserving behavior |
| Code review | Review changes and suggest improvements |
Step 2: Load the Benchmark DatasetΒΆ
The dataset contains 10 tasks benchmarked across both tools and manual completion:
import pandas as pd
tasks = pd.read_csv("lab-081/coding_tools_comparison.csv")
print(f"Total tasks: {len(tasks)}")
print(f"Categories: {sorted(tasks['category'].unique())}")
print(f"\nDataset preview:")
print(tasks[["task_id", "task_description", "category"]].to_string(index=False))
Expected output:
Total tasks: 10
Categories: ['code_generation', 'code_review', 'code_understanding', 'codebase_search', 'debugging', 'devops', 'git_workflow', 'migration', 'refactoring', 'scaffolding']
| task_id | task_description | category |
|---|---|---|
| T01 | Explain a complex function in the codebase | code_understanding |
| T02 | Find all API endpoints in the project | codebase_search |
| ... | ... | ... |
| T10 | Debug a failing CI pipeline | devops |
Step 3: Compare Success RatesΒΆ
Calculate success rates for each tool:
for col in ["claude_code_success", "copilot_cli_success"]:
tasks[col] = tasks[col].astype(str).str.lower() == "true"
cc_success = tasks["claude_code_success"].sum()
cp_success = tasks["copilot_cli_success"].sum()
total = len(tasks)
print(f"Claude Code: {cc_success}/{total} = {cc_success/total*100:.0f}%")
print(f"Copilot CLI: {cp_success}/{total} = {cp_success/total*100:.0f}%")
failed_cp = tasks[tasks["copilot_cli_success"] == False]
if len(failed_cp) > 0:
print(f"\nCopilot CLI failures:")
print(failed_cp[["task_id", "task_description", "category"]].to_string(index=False))
Expected output:
Claude Code: 10/10 = 100%
Copilot CLI: 9/10 = 90%
Copilot CLI failures:
task_id task_description category
T10 Debug a failing CI pipeline devops
Insight
Claude Code completed all 10 tasks successfully (100%). Copilot CLI completed 9 out of 10 (90%), failing only on T10 β debugging a failing CI pipeline, which requires deep context about CI configuration, environment variables, and build systems.
Step 4: Compare Completion TimesΒΆ
Analyze how fast each tool completes tasks:
cc_avg = tasks["claude_code_time_sec"].mean()
cp_avg = tasks["copilot_cli_time_sec"].mean()
manual_avg = tasks["manual_time_sec"].mean()
print(f"Average completion time:")
print(f" Claude Code: {cc_avg:.1f}s")
print(f" Copilot CLI: {cp_avg:.1f}s")
print(f" Manual: {manual_avg:.1f}s")
print(f"\nSpeedup over manual:")
print(f" Claude Code: {manual_avg/cc_avg:.0f}x faster")
print(f" Copilot CLI: {manual_avg/cp_avg:.0f}x faster")
Expected output:
Average completion time:
Claude Code: 20.5s
Copilot CLI: 24.5s
Manual: 1005.0s
Speedup over manual:
Claude Code: 49x faster
Copilot CLI: 41x faster
print("\nPer-task comparison:")
for _, t in tasks.iterrows():
faster = "Claude Code" if t["claude_code_time_sec"] < t["copilot_cli_time_sec"] else "Copilot CLI"
print(f" {t['task_id']} ({t['category']:>20}): CC={t['claude_code_time_sec']:>3}s "
f"CP={t['copilot_cli_time_sec']:>3}s β {faster}")
Insight
Claude Code is faster on average (20.5s vs 24.5s). The only task where Copilot CLI was faster is T06 (git workflow) β creating a conventional commit message β likely due to tighter GitHub integration.
Step 5: Analyze by Task CategoryΒΆ
Compare tool performance across different task types:
print("Performance by category:")
for _, row in tasks.iterrows():
cc_status = "β
" if row["claude_code_success"] else "β"
cp_status = "β
" if row["copilot_cli_success"] else "β"
print(f" {row['category']:>20}: CC {cc_status} ({row['claude_code_time_sec']:>3}s) "
f"CP {cp_status} ({row['copilot_cli_time_sec']:>3}s) "
f"Advantage: {row['tool_advantage']}")
Expected output:
code_understanding: CC β
( 8s) CP β
(12s) Advantage: 10x faster
codebase_search: CC β
( 5s) CP β
( 8s) Advantage: 40x faster
code_generation: CC β
(25s) CP β
(30s) Advantage: 20x faster
debugging: CC β
(18s) CP β
(22s) Advantage: 45x faster
refactoring: CC β
(35s) CP β
(40s) Advantage: 30x faster
git_workflow: CC β
( 4s) CP β
( 3s) Advantage: 8x faster
code_review: CC β
(15s) CP β
(20s) Advantage: 35x faster
scaffolding: CC β
(45s) CP β
(50s) Advantage: 75x faster
migration: CC β
(30s) CP β
(35s) Advantage: 55x faster
devops: CC β
(20s) CP β (25s) Advantage: 45x faster
Both tools provide massive speedups over manual work (8x to 75x faster), with the biggest gains in scaffolding and codebase search tasks.
Step 6: Making a RecommendationΒΆ
Summarize the comparison:
print("=== Tool Comparison Summary ===\n")
print(f"{'Metric':<30} {'Claude Code':>12} {'Copilot CLI':>12}")
print("-" * 56)
print(f"{'Success Rate':<30} {'100%':>12} {'90%':>12}")
print(f"{'Avg Time (s)':<30} {cc_avg:>12.1f} {cp_avg:>12.1f}")
print(f"{'Tasks Won (speed)':<30} {'9':>12} {'1':>12}")
print(f"{'Manual Speedup':<30} {f'{manual_avg/cc_avg:.0f}x':>12} {f'{manual_avg/cp_avg:.0f}x':>12}")
Recommendation
Both tools deliver exceptional productivity gains. Claude Code edges ahead in this benchmark with perfect success rate and faster average times. Copilot CLI excels at git workflows and offers tighter GitHub integration. For teams already in the GitHub ecosystem, Copilot CLI is a natural choice; for maximum reliability across diverse tasks, Claude Code is the stronger option.
π Bug-Fix ExerciseΒΆ
The file lab-081/broken_tools.py has 3 bugs in the analysis functions. Can you find and fix them all?
Run the self-tests to see which ones fail:
You should see 3 failed tests. Each test corresponds to one bug:
| Test | What it checks | Hint |
|---|---|---|
| Test 1 | Average speedup calculation | Should compute speedup from Claude Code times, not Copilot CLI times |
| Test 2 | Both-tools success rate | Should use AND (&) not OR (|) for "both succeeded" |
| Test 3 | Fastest tool detection | Comparison operator is reversed |
Fix all 3 bugs, then re-run. When you see All passed!, you're done!
π§ Knowledge CheckΒΆ
Q1 (Multiple Choice): What distinguishes agentic coding tools from traditional IDE-based copilots?
- A) They only work with Python code
- B) They operate in the terminal, read entire codebases, and execute multi-step tasks autonomously
- C) They require a GPU to run locally
- D) They only suggest single-line completions
β Reveal Answer
Correct: B) They operate in the terminal, read entire codebases, and execute multi-step tasks autonomously
Unlike IDE-based copilots that suggest code completions within an editor, agentic coding tools like Claude Code and Copilot CLI run in the terminal, analyze your full project structure, and can perform complex multi-step tasks β searching codebases, writing tests, creating commits, and debugging pipelines β all from a single natural-language prompt.
Q2 (Multiple Choice): What is the primary advantage of agentic coding tools over manual development?
- A) They produce bug-free code every time
- B) They eliminate the need for code review
- C) They dramatically reduce time for common tasks (often 10xβ75x faster)
- D) They replace the need for version control
β Reveal Answer
Correct: C) They dramatically reduce time for common tasks (often 10xβ75x faster)
The benchmark shows speedups ranging from 8x (git workflows) to 75x (scaffolding) compared to manual completion. While the tools don't produce perfect code every time and code review remains important, the time savings for routine tasks are substantial.
Q3 (Run the Lab): What is Claude Code's success rate across all 10 tasks?
Load π₯ coding_tools_comparison.csv and count claude_code_success == True.
β Reveal Answer
100% (10/10)
Claude Code successfully completed all 10 tasks in the benchmark, including code understanding, generation, debugging, refactoring, git workflows, code review, scaffolding, migration, and DevOps tasks.
Q4 (Run the Lab): What is Copilot CLI's success rate, and which task did it fail?
Count copilot_cli_success == True and identify the failed task.
β Reveal Answer
90% (9/10) β failed T10 (Debug a failing CI pipeline)
Copilot CLI succeeded on 9 out of 10 tasks. The only failure was T10 β debugging a failing CI pipeline β which requires deep context about CI configuration, environment variables, and build system interactions.
Q5 (Run the Lab): Which tool is fastest overall based on average completion time?
Compute claude_code_time_sec.mean() and copilot_cli_time_sec.mean().
β Reveal Answer
Claude Code (20.5s avg vs 24.5s avg)
Claude Code's average completion time is 20.5 seconds compared to Copilot CLI's 24.5 seconds. Claude Code was faster on 9 out of 10 tasks; Copilot CLI was faster only on T06 (git workflow, 3s vs 4s).
SummaryΒΆ
| Topic | What You Learned |
|---|---|
| Agentic Coding Tools | Terminal-based AI assistants that read codebases and execute multi-step tasks |
| Claude Code | 100% success rate, 20.5s average, strongest at complex tasks |
| Copilot CLI | 90% success rate, 24.5s average, excels at git workflows |
| Time Savings | Both tools provide 8xβ75x speedup over manual development |
| Task Categories | Both handle code understanding, generation, review, and refactoring well |
| Recommendation | Claude Code for reliability; Copilot CLI for GitHub integration |
Next StepsΒΆ
- Lab 082 β Agent Guardrails: NeMo & Azure Content Safety
- Try both tools on your own codebase to see which fits your workflow best