Skip to content

Lab 056: Federated M365 Copilot Connectors with MCPΒΆ

Level: L300 Path: All paths Time: ~90 min πŸ’° Cost: Free β€” Uses mock comparison data (no M365 tenant required)

What You'll LearnΒΆ

  • The difference between synced (indexed) connectors and federated (real-time) connectors in Microsoft 365 Copilot
  • How MCP can act as a federated connector β€” providing real-time data access without indexing
  • How citations work in federated vs synced connectors
  • OAuth and compliance considerations for regulated data (healthcare, legal, finance)
  • When to choose each connector type based on latency, freshness, and compliance requirements

IntroductionΒΆ

Microsoft 365 Copilot uses connectors to bring external data into the Copilot experience. There are two fundamental architectures:

Connector Type How It Works Data Location
Synced (Indexed) Crawls and copies data into the Microsoft Search index Data stored on Microsoft servers
Federated (Real-Time) Queries the source system at runtime β€” no data is copied Data stays in the source system

Each approach has trade-offs:

Dimension Federated Synced
Latency Higher (real-time query) Lower (pre-indexed)
Data Freshness Always current (0 sec) Depends on crawl schedule
Compliance Data never leaves source Data copied to Microsoft servers
Offline Access Requires source availability Works even if source is down

The ScenarioΒΆ

OutdoorGear Inc. needs to connect multiple data sources to Microsoft 365 Copilot:

  • Product catalog and order history β€” can be indexed (synced) for fast search
  • Patient medical records, employee salary data, and legal contracts β€” regulated data that must never leave the source system (federated only)
  • Real-time stock prices and shipping tracking β€” need the freshest data possible

Your job is to analyze a comparison dataset of 20 queries (10 federated, 10 synced) and determine when each connector type is the right choice.

MCP as a Federated Connector

An MCP server can serve as a federated connector for M365 Copilot. The MCP server queries the source system in real-time and returns results with citations β€” no data is ever indexed or stored on Microsoft servers. This makes MCP ideal for regulated data that must comply with HIPAA, GDPR, or SOX requirements.

PrerequisitesΒΆ

Requirement Why
Python 3.10+ Analyze connector comparison data
pandas library DataFrame operations
pip install pandas

Quick Start with GitHub Codespaces

Open in GitHub Codespaces

All dependencies are pre-installed in the devcontainer.

πŸ“¦ Supporting FilesΒΆ

Download these files before starting the lab

Save all files to a lab-056/ folder in your working directory.

File Description Download
broken_connector.py Bug-fix exercise (3 bugs + self-tests) πŸ“₯ Download
connector_comparison.csv Dataset πŸ“₯ Download

Step 1: Understanding Connector TypesΒΆ

Synced (Indexed) ConnectorsΒΆ

Synced connectors crawl a data source on a schedule and copy the content into the Microsoft Search index:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    Crawl     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    Index    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Source      β”‚ ──────────► β”‚  Microsoft   β”‚ ─────────► β”‚  Copilot    β”‚
β”‚  System      β”‚  (schedule) β”‚  Graph       β”‚  (fast)    β”‚  Search     β”‚
β”‚             β”‚             β”‚  Connector    β”‚            β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜            β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • βœ… Fast queries β€” data is pre-indexed
  • βœ… Works offline β€” source system can be down
  • ❌ Stale data β€” depends on crawl frequency
  • ❌ Compliance risk β€” data is copied to Microsoft servers

Federated (Real-Time) ConnectorsΒΆ

Federated connectors query the source system at runtime β€” no data is ever copied:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   Real-time   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   Results   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Source      β”‚ ◄──────────► β”‚  Federated   β”‚ ──────────► β”‚  Copilot    β”‚
β”‚  System      β”‚    query      β”‚  Connector   β”‚  + citation β”‚  Search     β”‚
β”‚             β”‚              β”‚  (MCP Server) β”‚             β”‚             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜              β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜             β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
  • βœ… Always fresh β€” queries live data
  • βœ… Compliant β€” data never leaves the source
  • βœ… Citations β€” responses include source links
  • ❌ Higher latency β€” real-time query overhead
  • ❌ Source dependency β€” requires source system availability

Step 2: Load the Comparison DatasetΒΆ

The dataset contains 20 queries β€” each query was run through both a federated and a synced connector:

import pandas as pd

df = pd.read_csv("lab-056/connector_comparison.csv")
print(f"Total queries: {len(df)}")
print(f"Connector types: {df['connector_type'].unique().tolist()}")
print(f"Columns: {list(df.columns)}")
print(f"\nFirst 6 rows:")
print(df.head(6).to_string(index=False))

Expected output:

Total queries: 20
Connector types: ['federated', 'synced']
Columns: ['query_id', 'query_text', 'connector_type', 'latency_ms', 'results_count',
           'data_freshness_sec', 'data_size_kb', 'compliant']

First 6 rows:
query_id                      query_text connector_type  latency_ms  results_count  data_freshness_sec  data_size_kb compliant
     Q01             Show all hiking boots      federated         450              5                   0            12      true
     Q02             Show all hiking boots         synced         120              5                3600            12      true
     Q03           Find tents under $300      federated         520              3                   0             8      true
     Q04           Find tents under $300         synced          95              3                7200             8      true
     Q05  Customer order history C001      federated         680              4                   0            15      true
     Q06  Customer order history C001         synced         150              4                1800            15      true

Step 3: Compare Latency vs FreshnessΒΆ

Analyze the performance trade-offs between connector types:

3a β€” Average Latency by TypeΒΆ

fed = df[df["connector_type"] == "federated"]
syn = df[df["connector_type"] == "synced"]

avg_fed_latency = fed["latency_ms"].mean()
avg_syn_latency = syn["latency_ms"].mean()
ratio = avg_fed_latency / avg_syn_latency

print(f"Average federated latency: {avg_fed_latency:.0f} ms")
print(f"Average synced latency:    {avg_syn_latency:.1f} ms")
print(f"Federated/Synced ratio:    {ratio:.1f}Γ—")

Expected output:

Average federated latency: 473 ms
Average synced latency:    109.8 ms
Federated/Synced ratio:    4.3Γ—

3b β€” Freshness ComparisonΒΆ

print("Data freshness (seconds since last update):")
print(f"  Federated average: {fed['data_freshness_sec'].mean():.0f} sec (always 0 β€” real-time)")
print(f"  Synced average:    {syn['data_freshness_sec'].mean():.0f} sec")
print(f"  Synced max:        {syn['data_freshness_sec'].max():.0f} sec ({syn['data_freshness_sec'].max()/3600:.1f} hours)")

Expected output:

Data freshness (seconds since last update):
  Federated average: 0 sec (always 0 β€” real-time)
  Synced average:    3660 sec
  Synced max:        14400 sec (4.0 hours)

3c β€” Latency DistributionΒΆ

print("Latency ranges:")
for ctype, group in df.groupby("connector_type"):
    print(f"  {ctype}: {group['latency_ms'].min()}–{group['latency_ms'].max()} ms "
          f"(median: {group['latency_ms'].median():.0f} ms)")

Expected output:

Latency ranges:
  federated: 290–680 ms (median: 465 ms)
  synced: 88–150 ms (median: 105 ms)

Step 4: Compliance AnalysisΒΆ

Determine which queries involve regulated data that cannot be indexed:

4a β€” Non-Compliant QueriesΒΆ

non_compliant = df[df["compliant"] == False]
print(f"Non-compliant queries: {len(non_compliant)}")
print(f"\nDetails:")
print(non_compliant[["query_id", "query_text", "connector_type"]].to_string(index=False))

Expected output:

Non-compliant queries: 3

Details:
query_id               query_text connector_type
     Q10  Patient medical records         synced
     Q12      Employee salary data         synced
     Q18    Legal contract clauses         synced

4b β€” Why Synced Is Non-Compliant for Regulated DataΒΆ

# Compare federated vs synced for the same regulated queries
regulated_queries = ["Patient medical records", "Employee salary data", "Legal contract clauses"]
for query_text in regulated_queries:
    rows = df[df["query_text"] == query_text]
    fed_row = rows[rows["connector_type"] == "federated"].iloc[0]
    syn_row = rows[rows["connector_type"] == "synced"].iloc[0]
    print(f"\n{query_text}:")
    print(f"  Federated: compliant={fed_row['compliant']}, latency={fed_row['latency_ms']}ms, freshness={fed_row['data_freshness_sec']}s")
    print(f"  Synced:    compliant={syn_row['compliant']}, latency={syn_row['latency_ms']}ms, freshness={syn_row['data_freshness_sec']}s")

Expected output:

Patient medical records:
  Federated: compliant=True, latency=550ms, freshness=0s
  Synced:    compliant=False, latency=130ms, freshness=3600s

Employee salary data:
  Federated: compliant=True, latency=420ms, freshness=0s
  Synced:    compliant=False, latency=105ms, freshness=1800s

Legal contract clauses:
  Federated: compliant=True, latency=480ms, freshness=0s
  Synced:    compliant=False, latency=115ms, freshness=7200s

Compliance Is Non-Negotiable

For regulated data (HIPAA, GDPR, SOX), the synced connector copies data to Microsoft servers during indexing. This violates data residency and sovereignty requirements. The federated connector (e.g., MCP server) keeps data in the source system β€” only query results are returned at runtime, never stored.


Step 5: When to Use Each Connector TypeΒΆ

Based on the analysis, here are the decision criteria:

Decision MatrixΒΆ

Criterion Use Federated Use Synced
Regulated data (HIPAA, GDPR, SOX) βœ… Required ❌ Non-compliant
Real-time freshness needed βœ… Always current ❌ Stale (crawl delay)
Low latency critical ❌ ~473ms avg βœ… ~110ms avg
Source may be offline ❌ Requires source βœ… Works from index
Large result sets ❌ Runtime cost βœ… Pre-indexed
Infrequently changing data ⚠️ Overkill βœ… Crawl catches updates

OutdoorGear RecommendationsΒΆ

recommendations = {
    "Product catalog": "Synced β€” low latency, not regulated, changes infrequently",
    "Order history": "Synced β€” historical data, benefits from indexing",
    "Patient medical records": "Federated β€” HIPAA regulated, must not leave source",
    "Employee salary data": "Federated β€” PII/compensation data, compliance required",
    "Real-time stock prices": "Federated β€” must be current, stale data is worse than slow",
    "Legal contracts": "Federated β€” SOX regulated, data sovereignty required",
    "Product reviews": "Synced β€” public data, benefits from fast search",
    "Shipping tracking": "Federated β€” real-time status updates needed",
}

print("OutdoorGear Connector Recommendations:")
for source, rec in recommendations.items():
    connector = "πŸ”„ Federated" if "Federated" in rec else "πŸ“¦ Synced"
    print(f"  {connector}  {source}: {rec.split(' β€” ')[1]}")

πŸ› Bug-Fix ExerciseΒΆ

The file lab-056/broken_connector.py has 3 bugs in the connector analysis functions. Can you find and fix them all?

Run the self-tests to see which ones fail:

python lab-056/broken_connector.py

You should see 3 failed tests. Each test corresponds to one bug:

Test What it checks Hint
Test 1 Average freshness by type Should return data_freshness_sec, not latency_ms
Test 2 Non-compliant count Should count compliant == False, not compliant == True
Test 3 Latency ratio Should compute federated / synced, not synced / federated

Fix all 3 bugs, then re-run. When you see πŸŽ‰ All 3 tests passed, you're done!


🧠 Knowledge Check¢

Q1 (Multiple Choice): What is the primary advantage of a federated connector over a synced connector?
  • A) Lower latency for all query types
  • B) Real-time data freshness with no indexing β€” data never leaves the source
  • C) Better support for offline access
  • D) Simpler authentication setup
βœ… Reveal Answer

Correct: B) Real-time data freshness with no indexing β€” data never leaves the source

Federated connectors query the source system at runtime, ensuring results are always current (0-second freshness). Because no data is copied or indexed, it remains in the source system β€” making it compliant with data residency requirements (HIPAA, GDPR, SOX).

Q2 (Multiple Choice): Why are synced connectors non-compliant for regulated data like patient medical records?
  • A) Synced connectors don't support encryption
  • B) Data is copied to Microsoft servers during indexing, violating data residency requirements
  • C) Synced connectors cannot handle large datasets
  • D) Synced connectors don't support OAuth authentication
βœ… Reveal Answer

Correct: B) Data is copied to Microsoft servers during indexing, violating data residency requirements

When a synced connector crawls a data source, it copies the content to Microsoft's search index. For regulated data (HIPAA patient records, GDPR personal data, SOX financial data), this violates data sovereignty and residency requirements. The data must remain in the source system β€” only federated connectors guarantee this.

Q3 (Run the Lab): What is the average latency for federated connector queries?

Filter πŸ“₯ connector_comparison.csv by connector_type == "federated" and compute latency_ms.mean().

βœ… Reveal Answer

473 ms

The 10 federated queries have latencies: 450, 520, 680, 380, 550, 420, 610, 290, 480, 350. Sum = 4730, average = 4730 Γ· 10 = 473 ms.

Q4 (Run the Lab): How many synced queries are non-compliant?

Filter for connector_type == "synced" and compliant == False.

βœ… Reveal Answer

3

Three synced queries are non-compliant: Q10 (Patient medical records), Q12 (Employee salary data), and Q18 (Legal contract clauses). These involve regulated data that must not be copied to external servers.

Q5 (Run the Lab): What is the approximate federated-to-synced latency ratio?

Divide the average federated latency by the average synced latency.

βœ… Reveal Answer

β‰ˆ 4.3Γ—

Average federated latency = 473 ms. Average synced latency β‰ˆ 110 ms. Ratio = 473 Γ· 110 β‰ˆ 4.3Γ—. Federated queries are about 4.3 times slower than synced queries β€” the trade-off for real-time freshness and compliance.


SummaryΒΆ

Topic What You Learned
Connector Types Synced (indexed, fast, stale) vs Federated (real-time, compliant, slower)
MCP as Connector MCP servers can serve as federated connectors for M365 Copilot
Compliance Regulated data requires federated connectors β€” synced copies data to Microsoft
Latency Trade-off Federated β‰ˆ 4.3Γ— slower but always fresh; synced is fast but stale
Decision Criteria Choose based on regulation, freshness needs, latency tolerance, and offline access

Next StepsΒΆ

  • Lab 054 β€” A2A Protocol β€” Build Interoperable Multi-Agent Systems
  • Lab 055 β€” A2A + MCP Full Stack β€” Agent Interoperability Capstone