Skip to content

Lab 052: Fabric IQ β€” Conversational Data Agent (NL β†’ SQL)ΒΆ

L200 All paths ~75 min Free β€” Uses SQLite locally (Fabric capacity optional)

What You'll LearnΒΆ

  • How Microsoft Fabric Data Agents translate natural-language questions into SQL, DAX, or KQL queries
  • The end-to-end flow of NL β†’ SQL generation, execution, and result presentation
  • How least-privilege access and Entra ID identity binding keep data secure at every step
  • Why query transparency and audit logging are critical for trust in AI-generated queries
  • How to enable self-serve analytics for non-technical users without exposing raw database access

IntroductionΒΆ

NL to SQL Flow

A Fabric Data Agent lets business users ask data questions in plain English. Behind the scenes the agent inspects the database schema, generates a SQL (or DAX / KQL) query, executes it under the caller's own Entra identity, and returns a formatted answer β€” all without the user writing a single line of code.

In this lab you will build a local simulation of that pipeline using SQLite and Python. The scenario is OutdoorGear, a fictitious outdoor-equipment retailer. The database contains two tables:

Table Description
products Product catalog β€” 10 items across categories such as Tents, Backpacks, Sleeping Bags, and Accessories
orders Order history β€” 15 orders referencing products by product_id

Non-technical users β€” store managers, marketing analysts, supply-chain planners β€” need to ask questions like "How many tents are in stock?" or "What is the total revenue?" without learning SQL. By the end of this lab you will understand exactly how a Fabric Data Agent answers those questions and why the security model matters.

PrerequisitesΒΆ

Requirement Notes
Python 3.10+ python.org/downloads
pandas pip install pandas β€” used to load CSV files into SQLite
sqlite3 Part of the Python standard library β€” no installation required

No Fabric capacity needed

This lab runs entirely on your local machine using SQLite. A Fabric capacity is only needed if you want to deploy a real Data Agent afterward.


Quick Start with GitHub Codespaces

Open in GitHub Codespaces

All dependencies are pre-installed in the devcontainer.

πŸ“¦ Supporting FilesΒΆ

Download these files before starting the lab

Save all files to a lab-052/ folder in your working directory.

File Description Download
broken_query_gen.py Bug-fix exercise (3 bugs + self-tests) πŸ“₯ Download
orders.csv Dataset πŸ“₯ Download
products.csv Dataset πŸ“₯ Download

Step 1: Understanding Fabric Data AgentsΒΆ

A Fabric Data Agent sits between the user and the data. When a user types a question the agent:

  1. Parses the natural-language input and identifies intent, entities, and filters.
  2. Inspects the connected data-source schema (tables, columns, relationships).
  3. Generates a query in the appropriate language β€” SQL for warehouses and SQL endpoints, DAX for semantic models, KQL for KQL databases.
  4. Executes the query under the user's own Entra ID. The agent never uses a service account with elevated privileges; it delegates to the caller's identity so that Row-Level Security (RLS) and object-level permissions are enforced automatically.
  5. Returns the result along with the generated query so the user (or an auditor) can inspect exactly what ran.

This design delivers three guarantees:

Guarantee How
Least-privilege Queries run as the authenticated user β€” no shared super-user
Transparency The generated SQL/DAX/KQL is always shown to the caller
Auditability Every query is logged with the user's identity and timestamp

Why transparency matters

If the agent generates an incorrect query the user can see β€” and report β€” the mistake. This feedback loop is essential for building trust in AI-generated analytics.


Step 2: Set Up the DatabaseΒΆ

In this step you will create a local SQLite database from two CSV files that ship with the lab.

2.1 Load the CSV files into SQLiteΒΆ

import sqlite3
import pandas as pd

conn = sqlite3.connect("lab-052/outdoor_gear.db")

pd.read_csv("lab-052/products.csv").to_sql("products", conn, if_exists="replace", index=False)
pd.read_csv("lab-052/orders.csv").to_sql("orders", conn, if_exists="replace", index=False)

print("βœ… Database created: lab-052/outdoor_gear.db")

2.2 Explore the schemaΒΆ

cursor = conn.execute("SELECT name FROM sqlite_master WHERE type='table'")
tables = [row[0] for row in cursor.fetchall()]
print("Tables:", tables)

for table in tables:
    print(f"\n--- {table} ---")
    info = conn.execute(f"PRAGMA table_info({table})").fetchall()
    for col in info:
        print(f"  {col[1]:20s} {col[2]}")

Expected output:

Tables: ['products', 'orders']

--- products ---
  product_id           TEXT
  product_name         TEXT
  category             TEXT
  price                REAL
  stock                INTEGER

--- orders ---
  order_id             TEXT
  product_id           TEXT
  customer_name        TEXT
  quantity             INTEGER
  total                REAL
  order_date           TEXT

2.3 Quick row countsΒΆ

for table in tables:
    count = conn.execute(f"SELECT COUNT(*) FROM {table}").fetchone()[0]
    print(f"{table}: {count} rows")
products: 10 rows
orders: 15 rows

Step 3: Build NL β†’ SQL Query PatternsΒΆ

A Fabric Data Agent maps natural-language questions to SQL queries. Below are five representative patterns that cover the most common question types: counting, aggregation, filtering, joining, and averaging.

Pattern 1 β€” Counting with a filterΒΆ

User asks: "How many tents are in stock?"

SELECT COUNT(*)
FROM   products
WHERE  category = 'Tents'
  AND  stock > 0;

Expected result: 2

The stock > 0 filter matters

Without the stock > 0 clause the query would count products that exist in the catalog even if they are out of stock. A well-designed agent always applies the in-stock filter when the user says "in stock."


Pattern 2 β€” Sum aggregationΒΆ

User asks: "What is the total revenue?"

SELECT SUM(total)
FROM   orders;

Expected result: 3209.74

Revenue comes from the orders table β€” not from multiplying price Γ— stock in the products table. This is a common mistake in NL β†’ SQL systems.


Pattern 3 β€” Simple filter / SELECT *ΒΆ

User asks: "Show all backpacks"

SELECT *
FROM   products
WHERE  category = 'Backpacks';

This returns all columns for products in the Backpacks category.


Pattern 4 β€” JOIN + GROUP BY + ORDER BYΒΆ

User asks: "Which product has the most orders?"

SELECT   p.product_name,
         COUNT(*) AS order_count
FROM     orders o
JOIN     products p ON o.product_id = p.product_id
GROUP BY p.product_name
ORDER BY order_count DESC
LIMIT    1;

Expected result: Alpine Explorer Tent β€” 3 orders

COUNT(*) vs SUM(quantity)

"Most orders" means the highest number of order rows, not the highest total quantity. The correct aggregate is COUNT(*), not SUM(quantity).


Pattern 5 β€” Average aggregationΒΆ

User asks: "Average order value?"

SELECT AVG(total)
FROM   orders;

Expected result: 213.98

Verification: total revenue is 3,209.74 and there are 15 orders β†’ 3,209.74 Γ· 15 = 213.9827 β‰ˆ 213.98.


Step 4: Run Queries and VerifyΒΆ

Execute every pattern against the local SQLite database and confirm the results match the expected values.

queries = {
    "How many tents are in stock?": (
        "SELECT COUNT(*) FROM products WHERE category='Tents' AND stock > 0",
        "2",
    ),
    "What is the total revenue?": (
        "SELECT SUM(total) FROM orders",
        "3209.74",
    ),
    "Show all backpacks": (
        "SELECT * FROM products WHERE category='Backpacks'",
        None,  # tabular result β€” just display
    ),
    "Which product has the most orders?": (
        "SELECT p.product_name, COUNT(*) AS order_count "
        "FROM orders o JOIN products p ON o.product_id=p.product_id "
        "GROUP BY p.product_name ORDER BY order_count DESC LIMIT 1",
        "Alpine Explorer Tent|3",
    ),
    "Average order value?": (
        "SELECT AVG(total) FROM orders",
        "213.98",
    ),
}

print("=" * 60)
for question, (sql, expected) in queries.items():
    print(f"\n❓ {question}")
    print(f"   SQL ➜ {sql}")
    result = conn.execute(sql).fetchall()
    print(f"   Result: {result}")
    if expected:
        print(f"   Expected: {expected}")
print("\n" + "=" * 60)

Compare carefully

If any result does not match, re-check the CSV data and the query. Mismatches usually come from an incorrect filter or the wrong aggregation function.


Step 5: Security and AuditΒΆ

In a production Fabric deployment the same queries you ran locally would be executed through the Data Agent with full enterprise security. This section explains the key safeguards.

Entra ID identity bindingΒΆ

Every query is executed under the calling user's Entra ID token. The Data Agent does not have its own database credentials β€” it delegates authentication to the identity provider. This means:

  • A store manager sees only their store's data (if RLS is configured).
  • A marketing analyst can query aggregate revenue but cannot see individual customer records.
  • An external auditor can review query logs tied to specific user identities.

Row-Level Security (RLS)ΒΆ

Fabric supports RLS on SQL endpoints and semantic models. When the Data Agent generates a query, the database engine automatically applies RLS filters based on the authenticated user's identity. The agent itself never modifies or strips these filters.

Query logging and auditΒΆ

Every generated query β€” along with the user identity, timestamp, and result row count β€” is recorded in the Fabric activity log. This enables:

Capability Benefit
Compliance reporting Prove who accessed what data and when
Anomaly detection Flag unusual query patterns (e.g., bulk exports)
Agent improvement Identify frequently failed queries and improve the NL β†’ SQL model

Local simulation

In this lab you are running queries directly against SQLite, so there is no Entra binding or RLS. In a real Fabric deployment these controls are enforced automatically.


Bug-Fix ExerciseΒΆ

The file lab-052/broken_query_gen.py contains a simplified NL β†’ SQL generator with three bugs. Your task is to find and fix each one.

Run the broken scriptΒΆ

python lab-052/broken_query_gen.py

Bug 1 β€” Missing stock > 0 filterΒΆ

# ❌ BUG: counts all products in the category, including out-of-stock
def count_in_stock(category):
    return f"SELECT COUNT(*) FROM products WHERE category='{category}'"

Fix: Add AND stock > 0 to the WHERE clause.

# βœ… FIXED
def count_in_stock(category):
    return f"SELECT COUNT(*) FROM products WHERE category='{category}' AND stock > 0"

Bug 2 β€” Revenue uses price Γ— stock instead of order totalsΒΆ

# ❌ BUG: calculates potential inventory value, not actual revenue
def total_revenue():
    return "SELECT SUM(price * stock) FROM products"

Fix: Query the orders table instead.

# βœ… FIXED
def total_revenue():
    return "SELECT SUM(total) FROM orders"

Bug 3 β€” Most ordered uses quantity DESC instead of COUNT(*)ΒΆ

# ❌ BUG: returns the order with the highest single-order quantity,
#          not the product with the most orders
def most_ordered_product():
    return (
        "SELECT p.product_name, o.quantity "
        "FROM orders o JOIN products p ON o.product_id=p.product_id "
        "ORDER BY o.quantity DESC LIMIT 1"
    )

Fix: Group by product and count order rows.

# βœ… FIXED
def most_ordered_product():
    return (
        "SELECT p.product_name, COUNT(*) AS order_count "
        "FROM orders o JOIN products p ON o.product_id=p.product_id "
        "GROUP BY p.product_name ORDER BY order_count DESC LIMIT 1"
    )

Supporting FilesΒΆ

The following files are provided in the lab-052/ directory.

lab-052/products.csvΒΆ

product_id,product_name,category,price,stock
P001,Alpine Explorer Tent,Tents,349.99,5
P002,TrailMaster 2P Tent,Tents,199.99,8
P003,Summit Backpack 65L,Backpacks,189.99,12
P004,DayHiker 30L Pack,Backpacks,79.99,20
P005,Arctic Dream Sleeping Bag,Sleeping Bags,299.99,3
P006,Summer Lite Sleeping Bag,Sleeping Bags,89.99,15
P007,Trekking Poles Carbon,Accessories,59.99,25
P008,Headlamp ProBeam 400,Accessories,34.99,30
P009,Portable Water Filter,Accessories,34.92,18
P010,Camping Cookset Titanium,Accessories,124.99,7

lab-052/orders.csvΒΆ

order_id,product_id,customer_name,quantity,total,order_date
O001,P001,Alice Martin,1,349.99,2025-01-05
O002,P003,Bob Chen,1,189.99,2025-01-08
O003,P005,Carla Diaz,1,299.99,2025-01-10
O004,P002,David Kim,2,399.98,2025-01-12
O005,P007,Eva Novak,3,179.97,2025-01-15
O006,P001,Frank Osei,1,349.99,2025-01-17
O007,P004,Grace Liu,1,79.99,2025-01-20
O008,P008,Hiro Tanaka,2,69.98,2025-01-22
O009,P006,Isabelle Roy,1,89.99,2025-01-24
O010,P001,Jake Wilson,1,349.99,2025-01-27
O011,P009,Karen Patel,1,34.92,2025-01-29
O012,P003,Liam Murphy,1,189.99,2025-02-01
O013,P010,Mia Santos,1,124.99,2025-02-04
O014,P002,Noah Berg,1,199.99,2025-02-06
O015,P005,Olivia Park,1,299.99,2025-02-09

lab-052/broken_query_gen.pyΒΆ

"""Broken NL β†’ SQL generator β€” fix the three bugs!"""

import sqlite3

DB_PATH = "lab-052/outdoor_gear.db"

# ❌ BUG 1: Missing stock > 0 filter
def count_in_stock(category):
    return f"SELECT COUNT(*) FROM products WHERE category='{category}'"

# ❌ BUG 2: Uses price * stock instead of order totals
def total_revenue():
    return "SELECT SUM(price * stock) FROM products"

# ❌ BUG 3: Uses quantity DESC instead of COUNT(*)
def most_ordered_product():
    return (
        "SELECT p.product_name, o.quantity "
        "FROM orders o JOIN products p ON o.product_id=p.product_id "
        "ORDER BY o.quantity DESC LIMIT 1"
    )

def run(query_fn, *args):
    conn = sqlite3.connect(DB_PATH)
    sql = query_fn(*args)
    print(f"SQL: {sql}")
    result = conn.execute(sql).fetchall()
    print(f"Result: {result}\n")
    conn.close()

if __name__ == "__main__":
    print("--- Tents in stock ---")
    run(count_in_stock, "Tents")

    print("--- Total revenue ---")
    run(total_revenue)

    print("--- Most ordered product ---")
    run(most_ordered_product)

Knowledge CheckΒΆ

Q1 (Multiple Choice): What security model does a Fabric Data Agent use for query execution?
  • A) A shared service account with full database access
  • B) The user's own Entra ID with least-privilege permissions
  • C) An API key embedded in the agent configuration
  • D) Anonymous access with IP-based restrictions
βœ… Reveal Answer

Correct: B) The user's own Entra ID with least-privilege permissions

Fabric Data Agents execute every query under the calling user's Entra identity. This ensures that Row-Level Security, object permissions, and conditional-access policies are enforced automatically β€” the agent never elevates privileges beyond what the user already has.

Q2 (Multiple Choice): Why is it important that generated SQL is inspectable by the user?
  • A) So users can copy the SQL and run it faster next time
  • B) To enable transparency, auditability, and trust in AI-generated queries
  • C) Because SQL syntax highlighting looks better in the UI
  • D) To allow users to manually optimize query performance
βœ… Reveal Answer

Correct: B) To enable transparency, auditability, and trust in AI-generated queries

When users can see the exact SQL that was generated and executed, they can verify correctness, report mistakes, and auditors can review data-access patterns. This transparency is a foundational requirement for trustworthy AI-assisted analytics.

Q3 (Run the query): How many tents are currently in stock (stock > 0)?

Run this query against the lab database:

SELECT COUNT(*) FROM products WHERE category='Tents' AND stock > 0;
βœ… Reveal Answer

Answer: 2

Both the Alpine Explorer Tent (P001, stock=5) and the TrailMaster 2P Tent (P002, stock=8) have stock greater than zero. The query correctly filters on both category='Tents' and stock > 0.

Q4 (Run the query): What is the total revenue from all orders?

Run this query against the lab database:

SELECT SUM(total) FROM orders;
βœ… Reveal Answer

Answer: $3,209.74

The total column in the orders table contains the actual revenue for each order (price Γ— quantity). Summing all 15 order totals gives 3,209.74. A common mistake is to calculate SUM(price * stock) from the products table, which gives inventory value β€” not revenue.

Q5 (Run the query): Which product has the most orders?

Run this query against the lab database:

SELECT p.product_name, COUNT(*) AS order_count
FROM   orders o
JOIN   products p ON o.product_id = p.product_id
GROUP BY p.product_name
ORDER BY order_count DESC
LIMIT 1;
βœ… Reveal Answer

Answer: Alpine Explorer Tent (P001) β€” 3 orders

Product P001 appears in orders O001, O006, and O010. The query joins orders to products, groups by product name, counts the number of order rows per product, and returns the one with the highest count. Note that COUNT(*) counts order rows β€” not total quantity shipped.


SummaryΒΆ

Topic Key Takeaway
Fabric Data Agents Translate natural-language questions into SQL, DAX, or KQL and execute them on behalf of the user
NL β†’ SQL pipeline Parse intent β†’ inspect schema β†’ generate query β†’ execute β†’ return results
Identity & security Queries run under the user's Entra ID β€” least-privilege by default
Row-Level Security RLS filters are applied by the database engine, not the agent
Query transparency The generated query is always shown so users can verify and auditors can review
Audit logging Every query is recorded with user identity, timestamp, and result metadata
Common NL β†’ SQL bugs Missing filters, wrong table for aggregation, incorrect aggregate function

Next StepsΒΆ