Lab 056: Federated M365 Copilot Connectors with MCPΒΆ
What You'll LearnΒΆ
- The difference between synced (indexed) connectors and federated (real-time) connectors in Microsoft 365 Copilot
- How MCP can act as a federated connector β providing real-time data access without indexing
- How citations work in federated vs synced connectors
- OAuth and compliance considerations for regulated data (healthcare, legal, finance)
- When to choose each connector type based on latency, freshness, and compliance requirements
IntroductionΒΆ
Microsoft 365 Copilot uses connectors to bring external data into the Copilot experience. There are two fundamental architectures:
| Connector Type | How It Works | Data Location |
|---|---|---|
| Synced (Indexed) | Crawls and copies data into the Microsoft Search index | Data stored on Microsoft servers |
| Federated (Real-Time) | Queries the source system at runtime β no data is copied | Data stays in the source system |
Each approach has trade-offs:
| Dimension | Federated | Synced |
|---|---|---|
| Latency | Higher (real-time query) | Lower (pre-indexed) |
| Data Freshness | Always current (0 sec) | Depends on crawl schedule |
| Compliance | Data never leaves source | Data copied to Microsoft servers |
| Offline Access | Requires source availability | Works even if source is down |
The ScenarioΒΆ
OutdoorGear Inc. needs to connect multiple data sources to Microsoft 365 Copilot:
- Product catalog and order history β can be indexed (synced) for fast search
- Patient medical records, employee salary data, and legal contracts β regulated data that must never leave the source system (federated only)
- Real-time stock prices and shipping tracking β need the freshest data possible
Your job is to analyze a comparison dataset of 20 queries (10 federated, 10 synced) and determine when each connector type is the right choice.
MCP as a Federated Connector
An MCP server can serve as a federated connector for M365 Copilot. The MCP server queries the source system in real-time and returns results with citations β no data is ever indexed or stored on Microsoft servers. This makes MCP ideal for regulated data that must comply with HIPAA, GDPR, or SOX requirements.
PrerequisitesΒΆ
| Requirement | Why |
|---|---|
| Python 3.10+ | Analyze connector comparison data |
pandas library |
DataFrame operations |
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-056/ folder in your working directory.
| File | Description | Download |
|---|---|---|
broken_connector.py |
Bug-fix exercise (3 bugs + self-tests) | π₯ Download |
connector_comparison.csv |
Dataset | π₯ Download |
Step 1: Understanding Connector TypesΒΆ
Synced (Indexed) ConnectorsΒΆ
Synced connectors crawl a data source on a schedule and copy the content into the Microsoft Search index:
βββββββββββββββ Crawl ββββββββββββββββ Index βββββββββββββββ
β Source β βββββββββββΊ β Microsoft β ββββββββββΊ β Copilot β
β System β (schedule) β Graph β (fast) β Search β
β β β Connector β β β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
- β Fast queries β data is pre-indexed
- β Works offline β source system can be down
- β Stale data β depends on crawl frequency
- β Compliance risk β data is copied to Microsoft servers
Federated (Real-Time) ConnectorsΒΆ
Federated connectors query the source system at runtime β no data is ever copied:
βββββββββββββββ Real-time ββββββββββββββββ Results βββββββββββββββ
β Source β ββββββββββββΊ β Federated β βββββββββββΊ β Copilot β
β System β query β Connector β + citation β Search β
β β β (MCP Server) β β β
βββββββββββββββ ββββββββββββββββ βββββββββββββββ
- β Always fresh β queries live data
- β Compliant β data never leaves the source
- β Citations β responses include source links
- β Higher latency β real-time query overhead
- β Source dependency β requires source system availability
Step 2: Load the Comparison DatasetΒΆ
The dataset contains 20 queries β each query was run through both a federated and a synced connector:
import pandas as pd
df = pd.read_csv("lab-056/connector_comparison.csv")
print(f"Total queries: {len(df)}")
print(f"Connector types: {df['connector_type'].unique().tolist()}")
print(f"Columns: {list(df.columns)}")
print(f"\nFirst 6 rows:")
print(df.head(6).to_string(index=False))
Expected output:
Total queries: 20
Connector types: ['federated', 'synced']
Columns: ['query_id', 'query_text', 'connector_type', 'latency_ms', 'results_count',
'data_freshness_sec', 'data_size_kb', 'compliant']
First 6 rows:
query_id query_text connector_type latency_ms results_count data_freshness_sec data_size_kb compliant
Q01 Show all hiking boots federated 450 5 0 12 true
Q02 Show all hiking boots synced 120 5 3600 12 true
Q03 Find tents under $300 federated 520 3 0 8 true
Q04 Find tents under $300 synced 95 3 7200 8 true
Q05 Customer order history C001 federated 680 4 0 15 true
Q06 Customer order history C001 synced 150 4 1800 15 true
Step 3: Compare Latency vs FreshnessΒΆ
Analyze the performance trade-offs between connector types:
3a β Average Latency by TypeΒΆ
fed = df[df["connector_type"] == "federated"]
syn = df[df["connector_type"] == "synced"]
avg_fed_latency = fed["latency_ms"].mean()
avg_syn_latency = syn["latency_ms"].mean()
ratio = avg_fed_latency / avg_syn_latency
print(f"Average federated latency: {avg_fed_latency:.0f} ms")
print(f"Average synced latency: {avg_syn_latency:.1f} ms")
print(f"Federated/Synced ratio: {ratio:.1f}Γ")
Expected output:
3b β Freshness ComparisonΒΆ
print("Data freshness (seconds since last update):")
print(f" Federated average: {fed['data_freshness_sec'].mean():.0f} sec (always 0 β real-time)")
print(f" Synced average: {syn['data_freshness_sec'].mean():.0f} sec")
print(f" Synced max: {syn['data_freshness_sec'].max():.0f} sec ({syn['data_freshness_sec'].max()/3600:.1f} hours)")
Expected output:
Data freshness (seconds since last update):
Federated average: 0 sec (always 0 β real-time)
Synced average: 3660 sec
Synced max: 14400 sec (4.0 hours)
3c β Latency DistributionΒΆ
print("Latency ranges:")
for ctype, group in df.groupby("connector_type"):
print(f" {ctype}: {group['latency_ms'].min()}β{group['latency_ms'].max()} ms "
f"(median: {group['latency_ms'].median():.0f} ms)")
Expected output:
Step 4: Compliance AnalysisΒΆ
Determine which queries involve regulated data that cannot be indexed:
4a β Non-Compliant QueriesΒΆ
non_compliant = df[df["compliant"] == False]
print(f"Non-compliant queries: {len(non_compliant)}")
print(f"\nDetails:")
print(non_compliant[["query_id", "query_text", "connector_type"]].to_string(index=False))
Expected output:
Non-compliant queries: 3
Details:
query_id query_text connector_type
Q10 Patient medical records synced
Q12 Employee salary data synced
Q18 Legal contract clauses synced
4b β Why Synced Is Non-Compliant for Regulated DataΒΆ
# Compare federated vs synced for the same regulated queries
regulated_queries = ["Patient medical records", "Employee salary data", "Legal contract clauses"]
for query_text in regulated_queries:
rows = df[df["query_text"] == query_text]
fed_row = rows[rows["connector_type"] == "federated"].iloc[0]
syn_row = rows[rows["connector_type"] == "synced"].iloc[0]
print(f"\n{query_text}:")
print(f" Federated: compliant={fed_row['compliant']}, latency={fed_row['latency_ms']}ms, freshness={fed_row['data_freshness_sec']}s")
print(f" Synced: compliant={syn_row['compliant']}, latency={syn_row['latency_ms']}ms, freshness={syn_row['data_freshness_sec']}s")
Expected output:
Patient medical records:
Federated: compliant=True, latency=550ms, freshness=0s
Synced: compliant=False, latency=130ms, freshness=3600s
Employee salary data:
Federated: compliant=True, latency=420ms, freshness=0s
Synced: compliant=False, latency=105ms, freshness=1800s
Legal contract clauses:
Federated: compliant=True, latency=480ms, freshness=0s
Synced: compliant=False, latency=115ms, freshness=7200s
Compliance Is Non-Negotiable
For regulated data (HIPAA, GDPR, SOX), the synced connector copies data to Microsoft servers during indexing. This violates data residency and sovereignty requirements. The federated connector (e.g., MCP server) keeps data in the source system β only query results are returned at runtime, never stored.
Step 5: When to Use Each Connector TypeΒΆ
Based on the analysis, here are the decision criteria:
Decision MatrixΒΆ
| Criterion | Use Federated | Use Synced |
|---|---|---|
| Regulated data (HIPAA, GDPR, SOX) | β Required | β Non-compliant |
| Real-time freshness needed | β Always current | β Stale (crawl delay) |
| Low latency critical | β ~473ms avg | β ~110ms avg |
| Source may be offline | β Requires source | β Works from index |
| Large result sets | β Runtime cost | β Pre-indexed |
| Infrequently changing data | β οΈ Overkill | β Crawl catches updates |
OutdoorGear RecommendationsΒΆ
recommendations = {
"Product catalog": "Synced β low latency, not regulated, changes infrequently",
"Order history": "Synced β historical data, benefits from indexing",
"Patient medical records": "Federated β HIPAA regulated, must not leave source",
"Employee salary data": "Federated β PII/compensation data, compliance required",
"Real-time stock prices": "Federated β must be current, stale data is worse than slow",
"Legal contracts": "Federated β SOX regulated, data sovereignty required",
"Product reviews": "Synced β public data, benefits from fast search",
"Shipping tracking": "Federated β real-time status updates needed",
}
print("OutdoorGear Connector Recommendations:")
for source, rec in recommendations.items():
connector = "π Federated" if "Federated" in rec else "π¦ Synced"
print(f" {connector} {source}: {rec.split(' β ')[1]}")
π Bug-Fix ExerciseΒΆ
The file lab-056/broken_connector.py has 3 bugs in the connector analysis functions. Can you find and fix them all?
Run the self-tests to see which ones fail:
You should see 3 failed tests. Each test corresponds to one bug:
| Test | What it checks | Hint |
|---|---|---|
| Test 1 | Average freshness by type | Should return data_freshness_sec, not latency_ms |
| Test 2 | Non-compliant count | Should count compliant == False, not compliant == True |
| Test 3 | Latency ratio | Should compute federated / synced, not synced / federated |
Fix all 3 bugs, then re-run. When you see π All 3 tests passed, you're done!
π§ Knowledge CheckΒΆ
Q1 (Multiple Choice): What is the primary advantage of a federated connector over a synced connector?
- A) Lower latency for all query types
- B) Real-time data freshness with no indexing β data never leaves the source
- C) Better support for offline access
- D) Simpler authentication setup
β Reveal Answer
Correct: B) Real-time data freshness with no indexing β data never leaves the source
Federated connectors query the source system at runtime, ensuring results are always current (0-second freshness). Because no data is copied or indexed, it remains in the source system β making it compliant with data residency requirements (HIPAA, GDPR, SOX).
Q2 (Multiple Choice): Why are synced connectors non-compliant for regulated data like patient medical records?
- A) Synced connectors don't support encryption
- B) Data is copied to Microsoft servers during indexing, violating data residency requirements
- C) Synced connectors cannot handle large datasets
- D) Synced connectors don't support OAuth authentication
β Reveal Answer
Correct: B) Data is copied to Microsoft servers during indexing, violating data residency requirements
When a synced connector crawls a data source, it copies the content to Microsoft's search index. For regulated data (HIPAA patient records, GDPR personal data, SOX financial data), this violates data sovereignty and residency requirements. The data must remain in the source system β only federated connectors guarantee this.
Q3 (Run the Lab): What is the average latency for federated connector queries?
Filter π₯ connector_comparison.csv by connector_type == "federated" and compute latency_ms.mean().
β Reveal Answer
473 ms
The 10 federated queries have latencies: 450, 520, 680, 380, 550, 420, 610, 290, 480, 350. Sum = 4730, average = 4730 Γ· 10 = 473 ms.
Q4 (Run the Lab): How many synced queries are non-compliant?
Filter for connector_type == "synced" and compliant == False.
β Reveal Answer
3
Three synced queries are non-compliant: Q10 (Patient medical records), Q12 (Employee salary data), and Q18 (Legal contract clauses). These involve regulated data that must not be copied to external servers.
Q5 (Run the Lab): What is the approximate federated-to-synced latency ratio?
Divide the average federated latency by the average synced latency.
β Reveal Answer
β 4.3Γ
Average federated latency = 473 ms. Average synced latency β 110 ms. Ratio = 473 Γ· 110 β 4.3Γ. Federated queries are about 4.3 times slower than synced queries β the trade-off for real-time freshness and compliance.
SummaryΒΆ
| Topic | What You Learned |
|---|---|
| Connector Types | Synced (indexed, fast, stale) vs Federated (real-time, compliant, slower) |
| MCP as Connector | MCP servers can serve as federated connectors for M365 Copilot |
| Compliance | Regulated data requires federated connectors β synced copies data to Microsoft |
| Latency Trade-off | Federated β 4.3Γ slower but always fresh; synced is fast but stale |
| Decision Criteria | Choose based on regulation, freshness needs, latency tolerance, and offline access |