Skip to content

Lab 081: Agentic Coding Tools β€” Claude Code vs Copilot CLIΒΆ

Level: L100 Path: All paths Time: ~45 min πŸ’° Cost: Free

What You'll LearnΒΆ

  • What agentic coding tools are β€” AI assistants that operate directly in your terminal with full codebase context
  • Compare Claude Code and GitHub Copilot CLI across 10 real-world developer tasks
  • Understand how each tool handles code understanding, generation, debugging, and git workflows
  • Measure time savings versus manual approaches for common development tasks
  • Debug a broken comparison analysis script by fixing 3 bugs

IntroductionΒΆ

A new category of developer tools has emerged: agentic coding assistants that run in your terminal, read your entire codebase, and execute multi-step tasks autonomously. Unlike IDE-based copilots that suggest single lines or blocks, these tools can search codebases, write tests, create commits, refactor modules, and debug failing pipelines β€” all from a single natural-language prompt.

Two leading tools in this space are:

Tool Vendor How It Works
Claude Code Anthropic Terminal agent that reads your codebase, executes commands, and edits files directly
GitHub Copilot CLI GitHub Terminal agent integrated with GitHub ecosystem, runs commands and edits files

Both tools share a common pattern: they accept a natural-language task, analyze your codebase for context, plan an approach, and execute it β€” often in a single interaction.

The ScenarioΒΆ

You are a Tech Lead at OutdoorGear Inc. evaluating terminal-based coding assistants for your engineering team. You've benchmarked both tools across 10 representative developer tasks and now need to analyze the results to make a recommendation.

No Tool Installation Required

This lab analyzes a pre-recorded benchmark dataset comparing task completion times and success rates. You don't need Claude Code or Copilot CLI installed β€” all analysis is done locally with pandas.

PrerequisitesΒΆ

Requirement Why
Python 3.10+ Run analysis scripts
pandas library DataFrame operations
pip install pandas

Quick Start with GitHub Codespaces

Open in GitHub Codespaces

All dependencies are pre-installed in the devcontainer.

πŸ“¦ Supporting FilesΒΆ

Download these files before starting the lab

Save all files to a lab-081/ folder in your working directory.

File Description Download
broken_tools.py Bug-fix exercise (3 bugs + self-tests) πŸ“₯ Download
coding_tools_comparison.csv Dataset β€” 10 tasks compared across tools πŸ“₯ Download

Step 1: Understanding Agentic Coding ToolsΒΆ

Both Claude Code and Copilot CLI follow a similar agent loop:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  User Prompt │────▢│  Codebase    │────▢│  Plan &      β”‚
β”‚  (terminal)  β”‚     β”‚  Analysis    β”‚     β”‚  Execute     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                β”‚
                     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”            β”‚
                     β”‚  Edit files, β”‚β—€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                     β”‚  run commandsβ”‚
                     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Key capabilities shared by both tools:

Capability Description
Codebase understanding Read and reason about project structure, dependencies, and patterns
Code generation Write new code (functions, tests, modules) aligned with project conventions
Debugging Analyze errors, trace issues, and apply fixes
Git workflows Stage changes, create commits with conventional messages, manage branches
Refactoring Restructure code while preserving behavior
Code review Review changes and suggest improvements

Step 2: Load the Benchmark DatasetΒΆ

The dataset contains 10 tasks benchmarked across both tools and manual completion:

import pandas as pd

tasks = pd.read_csv("lab-081/coding_tools_comparison.csv")
print(f"Total tasks: {len(tasks)}")
print(f"Categories: {sorted(tasks['category'].unique())}")
print(f"\nDataset preview:")
print(tasks[["task_id", "task_description", "category"]].to_string(index=False))

Expected output:

Total tasks: 10
Categories: ['code_generation', 'code_review', 'code_understanding', 'codebase_search', 'debugging', 'devops', 'git_workflow', 'migration', 'refactoring', 'scaffolding']
task_id task_description category
T01 Explain a complex function in the codebase code_understanding
T02 Find all API endpoints in the project codebase_search
... ... ...
T10 Debug a failing CI pipeline devops

Step 3: Compare Success RatesΒΆ

Calculate success rates for each tool:

for col in ["claude_code_success", "copilot_cli_success"]:
    tasks[col] = tasks[col].astype(str).str.lower() == "true"

cc_success = tasks["claude_code_success"].sum()
cp_success = tasks["copilot_cli_success"].sum()
total = len(tasks)

print(f"Claude Code:  {cc_success}/{total} = {cc_success/total*100:.0f}%")
print(f"Copilot CLI:  {cp_success}/{total} = {cp_success/total*100:.0f}%")

failed_cp = tasks[tasks["copilot_cli_success"] == False]
if len(failed_cp) > 0:
    print(f"\nCopilot CLI failures:")
    print(failed_cp[["task_id", "task_description", "category"]].to_string(index=False))

Expected output:

Claude Code:  10/10 = 100%
Copilot CLI:   9/10 =  90%

Copilot CLI failures:
 task_id                  task_description category
     T10 Debug a failing CI pipeline   devops

Insight

Claude Code completed all 10 tasks successfully (100%). Copilot CLI completed 9 out of 10 (90%), failing only on T10 β€” debugging a failing CI pipeline, which requires deep context about CI configuration, environment variables, and build systems.


Step 4: Compare Completion TimesΒΆ

Analyze how fast each tool completes tasks:

cc_avg = tasks["claude_code_time_sec"].mean()
cp_avg = tasks["copilot_cli_time_sec"].mean()
manual_avg = tasks["manual_time_sec"].mean()

print(f"Average completion time:")
print(f"  Claude Code:  {cc_avg:.1f}s")
print(f"  Copilot CLI:  {cp_avg:.1f}s")
print(f"  Manual:       {manual_avg:.1f}s")

print(f"\nSpeedup over manual:")
print(f"  Claude Code:  {manual_avg/cc_avg:.0f}x faster")
print(f"  Copilot CLI:  {manual_avg/cp_avg:.0f}x faster")

Expected output:

Average completion time:
  Claude Code:  20.5s
  Copilot CLI:  24.5s
  Manual:       1005.0s

Speedup over manual:
  Claude Code:  49x faster
  Copilot CLI:  41x faster
print("\nPer-task comparison:")
for _, t in tasks.iterrows():
    faster = "Claude Code" if t["claude_code_time_sec"] < t["copilot_cli_time_sec"] else "Copilot CLI"
    print(f"  {t['task_id']} ({t['category']:>20}): CC={t['claude_code_time_sec']:>3}s  "
          f"CP={t['copilot_cli_time_sec']:>3}s  β†’ {faster}")

Insight

Claude Code is faster on average (20.5s vs 24.5s). The only task where Copilot CLI was faster is T06 (git workflow) β€” creating a conventional commit message β€” likely due to tighter GitHub integration.


Step 5: Analyze by Task CategoryΒΆ

Compare tool performance across different task types:

print("Performance by category:")
for _, row in tasks.iterrows():
    cc_status = "βœ…" if row["claude_code_success"] else "❌"
    cp_status = "βœ…" if row["copilot_cli_success"] else "❌"
    print(f"  {row['category']:>20}: CC {cc_status} ({row['claude_code_time_sec']:>3}s)  "
          f"CP {cp_status} ({row['copilot_cli_time_sec']:>3}s)  "
          f"Advantage: {row['tool_advantage']}")

Expected output:

  code_understanding: CC βœ… ( 8s)  CP βœ… (12s)  Advantage: 10x faster
     codebase_search: CC βœ… ( 5s)  CP βœ… ( 8s)  Advantage: 40x faster
     code_generation: CC βœ… (25s)  CP βœ… (30s)  Advantage: 20x faster
          debugging: CC βœ… (18s)  CP βœ… (22s)  Advantage: 45x faster
        refactoring: CC βœ… (35s)  CP βœ… (40s)  Advantage: 30x faster
       git_workflow: CC βœ… ( 4s)  CP βœ… ( 3s)  Advantage: 8x faster
        code_review: CC βœ… (15s)  CP βœ… (20s)  Advantage: 35x faster
        scaffolding: CC βœ… (45s)  CP βœ… (50s)  Advantage: 75x faster
          migration: CC βœ… (30s)  CP βœ… (35s)  Advantage: 55x faster
             devops: CC βœ… (20s)  CP ❌ (25s)  Advantage: 45x faster

Both tools provide massive speedups over manual work (8x to 75x faster), with the biggest gains in scaffolding and codebase search tasks.


Step 6: Making a RecommendationΒΆ

Summarize the comparison:

print("=== Tool Comparison Summary ===\n")
print(f"{'Metric':<30} {'Claude Code':>12} {'Copilot CLI':>12}")
print("-" * 56)
print(f"{'Success Rate':<30} {'100%':>12} {'90%':>12}")
print(f"{'Avg Time (s)':<30} {cc_avg:>12.1f} {cp_avg:>12.1f}")
print(f"{'Tasks Won (speed)':<30} {'9':>12} {'1':>12}")
print(f"{'Manual Speedup':<30} {f'{manual_avg/cc_avg:.0f}x':>12} {f'{manual_avg/cp_avg:.0f}x':>12}")

Recommendation

Both tools deliver exceptional productivity gains. Claude Code edges ahead in this benchmark with perfect success rate and faster average times. Copilot CLI excels at git workflows and offers tighter GitHub integration. For teams already in the GitHub ecosystem, Copilot CLI is a natural choice; for maximum reliability across diverse tasks, Claude Code is the stronger option.


πŸ› Bug-Fix ExerciseΒΆ

The file lab-081/broken_tools.py has 3 bugs in the analysis functions. Can you find and fix them all?

Run the self-tests to see which ones fail:

python lab-081/broken_tools.py

You should see 3 failed tests. Each test corresponds to one bug:

Test What it checks Hint
Test 1 Average speedup calculation Should compute speedup from Claude Code times, not Copilot CLI times
Test 2 Both-tools success rate Should use AND (&) not OR (|) for "both succeeded"
Test 3 Fastest tool detection Comparison operator is reversed

Fix all 3 bugs, then re-run. When you see All passed!, you're done!


🧠 Knowledge Check¢

Q1 (Multiple Choice): What distinguishes agentic coding tools from traditional IDE-based copilots?
  • A) They only work with Python code
  • B) They operate in the terminal, read entire codebases, and execute multi-step tasks autonomously
  • C) They require a GPU to run locally
  • D) They only suggest single-line completions
βœ… Reveal Answer

Correct: B) They operate in the terminal, read entire codebases, and execute multi-step tasks autonomously

Unlike IDE-based copilots that suggest code completions within an editor, agentic coding tools like Claude Code and Copilot CLI run in the terminal, analyze your full project structure, and can perform complex multi-step tasks β€” searching codebases, writing tests, creating commits, and debugging pipelines β€” all from a single natural-language prompt.

Q2 (Multiple Choice): What is the primary advantage of agentic coding tools over manual development?
  • A) They produce bug-free code every time
  • B) They eliminate the need for code review
  • C) They dramatically reduce time for common tasks (often 10x–75x faster)
  • D) They replace the need for version control
βœ… Reveal Answer

Correct: C) They dramatically reduce time for common tasks (often 10x–75x faster)

The benchmark shows speedups ranging from 8x (git workflows) to 75x (scaffolding) compared to manual completion. While the tools don't produce perfect code every time and code review remains important, the time savings for routine tasks are substantial.

Q3 (Run the Lab): What is Claude Code's success rate across all 10 tasks?

Load πŸ“₯ coding_tools_comparison.csv and count claude_code_success == True.

βœ… Reveal Answer

100% (10/10)

Claude Code successfully completed all 10 tasks in the benchmark, including code understanding, generation, debugging, refactoring, git workflows, code review, scaffolding, migration, and DevOps tasks.

Q4 (Run the Lab): What is Copilot CLI's success rate, and which task did it fail?

Count copilot_cli_success == True and identify the failed task.

βœ… Reveal Answer

90% (9/10) β€” failed T10 (Debug a failing CI pipeline)

Copilot CLI succeeded on 9 out of 10 tasks. The only failure was T10 β€” debugging a failing CI pipeline β€” which requires deep context about CI configuration, environment variables, and build system interactions.

Q5 (Run the Lab): Which tool is fastest overall based on average completion time?

Compute claude_code_time_sec.mean() and copilot_cli_time_sec.mean().

βœ… Reveal Answer

Claude Code (20.5s avg vs 24.5s avg)

Claude Code's average completion time is 20.5 seconds compared to Copilot CLI's 24.5 seconds. Claude Code was faster on 9 out of 10 tasks; Copilot CLI was faster only on T06 (git workflow, 3s vs 4s).


SummaryΒΆ

Topic What You Learned
Agentic Coding Tools Terminal-based AI assistants that read codebases and execute multi-step tasks
Claude Code 100% success rate, 20.5s average, strongest at complex tasks
Copilot CLI 90% success rate, 24.5s average, excels at git workflows
Time Savings Both tools provide 8x–75x speedup over manual development
Task Categories Both handle code understanding, generation, review, and refactoring well
Recommendation Claude Code for reliability; Copilot CLI for GitHub integration

Next StepsΒΆ

  • Lab 082 β€” Agent Guardrails: NeMo & Azure Content Safety
  • Try both tools on your own codebase to see which fits your workflow best