Skip to content

Lab 053: Fabric IQ β€” Batch AI Enrichment with AI FunctionsΒΆ

Level: L300 Path: All paths Time: ~90 min πŸ’° Cost: Free β€” Uses mock AI functions locally (Fabric capacity optional)

What You'll LearnΒΆ

  • What Fabric AI Functions are and how they integrate AI into Spark/pandas workflows (ai.classify, ai.summarize, ai.extract, ai.embed)
  • Design AI ETL pipelines that enrich tabular data with LLM-powered transformations
  • Process data in batch β€” applying classification, summarization, and entity extraction to entire DataFrames
  • Build and test with mock AI functions locally, then swap to real Fabric ai.*() calls in production

IntroductionΒΆ

AI ETL Pipeline

Traditional ETL pipelines move and transform structured data β€” clean, filter, join, aggregate. AI Functions add a new dimension: they let you call an LLM on every row of a DataFrame, treating classification, summarization, and extraction as native column operations.

In Microsoft Fabric, the ai.*() functions run directly inside Spark notebooks. You write df["sentiment"] = ai.classify(df["text"], ["positive", "neutral", "negative"]) and Fabric handles batching, rate-limiting, and model routing behind the scenes.

The ScenarioΒΆ

You are a Data Engineer at OutdoorGear Inc. The product team has collected 20 customer reviews for outdoor gear and wants you to build an enrichment pipeline that:

  1. Classifies each review's sentiment (positive / neutral / negative)
  2. Summarizes each review into a short snippet
  3. Extracts key entities (pros and cons) from the review text
  4. Embeds the review text for downstream semantic search (discussed conceptually)

Because you're developing locally, you'll use mock AI functions that mimic the behavior of Fabric's ai.*() calls. Once the pipeline is validated, swapping to real models requires changing only the function implementations.

Mock vs. Real AI Functions

This lab uses mock functions (rule-based, no LLM needed) so anyone can follow along without a Fabric capacity. The mock functions produce deterministic results that match the expected outputs. In production Fabric, you would replace these mocks with ai.classify(), ai.summarize(), etc.

PrerequisitesΒΆ

Requirement Why
Python 3.10+ Run the enrichment pipeline
pandas library DataFrame operations
(Optional) Microsoft Fabric capacity For real ai.*() functions
pip install pandas

Quick Start with GitHub Codespaces

Open in GitHub Codespaces

All dependencies are pre-installed in the devcontainer.

πŸ“¦ Supporting FilesΒΆ

Download these files before starting the lab

Save all files to a lab-053/ folder in your working directory.

File Description Download
broken_pipeline.py Bug-fix exercise (3 bugs + self-tests) πŸ“₯ Download
product_reviews.csv Dataset πŸ“₯ Download

Step 1: Understanding Fabric AI FunctionsΒΆ

Fabric AI Functions are native operations that apply LLM capabilities to DataFrame columns. They abstract away prompt engineering, batching, and API management:

Function Signature Description
ai.classify() ai.classify(column, categories) Classifies text into one of the provided categories using an LLM
ai.summarize() ai.summarize(column, max_length=None) Generates a concise summary of each text value
ai.extract() ai.extract(column, fields) Extracts structured fields (entities, keywords) from text
ai.embed() ai.embed(column, model=None) Generates vector embeddings for downstream similarity search

How They Work in FabricΒΆ

In a real Fabric Spark notebook, you'd write:

from synapse.ml.fabric import ai

# Classify sentiment in one line
df["sentiment"] = ai.classify(df["review_text"], ["positive", "neutral", "negative"])

# Summarize reviews
df["summary"] = ai.summarize(df["review_text"], max_length=50)

Fabric handles:

  • Batching β€” groups rows into optimal batch sizes for the model endpoint
  • Rate limiting β€” respects token-per-minute limits automatically
  • Error handling β€” retries transient failures with exponential backoff
  • Model routing β€” uses the workspace's default model or a specified one

Why Mock First?

Building with mocks lets you validate pipeline logic, data types, and downstream consumers before spending compute on real LLM calls. This is a best practice for any AI ETL pipeline.


Step 2: Load the Reviews DatasetΒΆ

The dataset contains 20 product reviews for OutdoorGear products:

import pandas as pd

reviews = pd.read_csv("lab-053/product_reviews.csv")
print(f"Total reviews: {len(reviews)}")
print(f"Unique products: {reviews['product_name'].nunique()}")
print(f"Rating range: {reviews['rating'].min()} – {reviews['rating'].max()}")
print(f"Average rating: {reviews['rating'].mean():.2f}")
print(f"\nReviews per product:")
print(reviews.groupby("product_name").size().sort_values(ascending=False))

Expected output:

Total reviews: 20
Unique products: 7
Rating range: 1 – 5
Average rating: 3.70

Reviews per product:
product_name
Alpine Explorer Tent       5
Peak Performer Boots       3
Explorer Pro Backpack      3
TrailMaster X4 Tent        3
CozyNights Sleeping Bag    2
DayTripper Pack            2
Summit Water Bottle        2

Take a moment to explore the data:

print(reviews[["review_id", "product_name", "rating", "review_text"]].head(5).to_string(index=False))

Step 3: Implement Mock AI FunctionsΒΆ

Instead of calling a real LLM, we create deterministic mock functions that mimic Fabric's ai.*() behavior:

3a β€” mock_classify(rating)ΒΆ

Classifies sentiment based on the numeric rating:

def mock_classify(rating: int) -> str:
    """Mock ai.classify() β€” maps rating to sentiment."""
    if rating >= 4:
        return "positive"
    elif rating == 3:
        return "neutral"
    else:
        return "negative"
  • Rating β‰₯ 4 β†’ "positive"
  • Rating = 3 β†’ "neutral"
  • Rating ≀ 2 β†’ "negative"

3b β€” mock_summarize(text)ΒΆ

Returns a truncated version of the review text:

def mock_summarize(text: str) -> str:
    """Mock ai.summarize() β€” returns first 50 characters."""
    if len(text) <= 50:
        return text
    return text[:50] + "..."

3c β€” mock_extract(text)ΒΆ

Extracts simple keywords by scanning for positive/negative indicator words:

POSITIVE_WORDS = {"amazing", "great", "best", "perfect", "incredible", "love",
                  "good", "solid", "comfortable", "warm", "durable"}
NEGATIVE_WORDS = {"broke", "terrible", "disappointed", "cheap", "thin",
                  "cramped", "snags", "cracked"}

def mock_extract(text: str) -> dict:
    """Mock ai.extract() β€” finds pros and cons keywords."""
    words = set(text.lower().split())
    pros = sorted(words & POSITIVE_WORDS)
    cons = sorted(words & NEGATIVE_WORDS)
    return {"pros": pros, "cons": cons}

Real vs. Mock

In production Fabric, ai.classify() sends the review text to an LLM with the candidate labels β€” it understands context, sarcasm, and nuance. Our mock uses the rating as a proxy, which is a reasonable heuristic for this dataset but wouldn't generalize to unlabeled text.


Step 4: Run the Enrichment PipelineΒΆ

Apply the mock functions to every row in the DataFrame:

# Classify sentiment
reviews["sentiment"] = reviews["rating"].apply(mock_classify)

# Summarize reviews
reviews["summary"] = reviews["review_text"].apply(mock_summarize)

# Extract entities
reviews["entities"] = reviews["review_text"].apply(mock_extract)

print("Enriched DataFrame columns:", list(reviews.columns))
print(f"\nSentiment distribution:")
print(reviews["sentiment"].value_counts())

Expected output:

Enriched DataFrame columns: ['review_id', 'product_id', 'product_name', 'category',
                              'rating', 'review_text', 'sentiment', 'summary', 'entities']

Sentiment distribution:
positive    13
neutral      4
negative     3

Verify the ResultsΒΆ

# Show a sample of enriched data
sample_cols = ["review_id", "product_name", "rating", "sentiment", "summary"]
print(reviews[sample_cols].head(6).to_string(index=False))

Expected:

review_id product_name rating sentiment summary
R001 Alpine Explorer Tent 5 positive Amazing tent! Held up perfectly in heavy rain. Se...
R002 Alpine Explorer Tent 4 positive Solid tent but a bit heavy for long hikes. Great ...
R003 Alpine Explorer Tent 5 positive Best tent I've ever owned. Worth every penny.
R004 Alpine Explorer Tent 3 neutral Decent tent but nothing special at this price poi...
R005 Alpine Explorer Tent 4 positive Good quality materials. Survived a storm with no ...
R006 TrailMaster X4 Tent 4 positive Great ventilation and the zipper is smooth. Sligh...

Sentiment BreakdownΒΆ

Sentiment Count Ratings
Positive (rating β‰₯ 4) 13 Ratings 4 and 5
Neutral (rating = 3) 4 Rating 3
Negative (rating ≀ 2) 3 Ratings 1 and 2

Step 5: Analyze Enriched DataΒΆ

Now that the reviews are enriched, analyze them to extract business insights:

5a β€” Average Rating by SentimentΒΆ

print("Average rating by sentiment:")
print(reviews.groupby("sentiment")["rating"].mean().to_string())

Expected:

negative    1.666667
neutral     3.000000
positive    4.384615

5b β€” Product-Level AnalysisΒΆ

product_stats = reviews.groupby("product_name").agg(
    review_count=("review_id", "count"),
    avg_rating=("rating", "mean"),
).sort_values("review_count", ascending=False)

print(f"Overall average rating: {reviews['rating'].mean():.2f}")
print(f"\nMost reviewed product: {product_stats.index[0]} ({product_stats.iloc[0]['review_count']:.0f} reviews)")
print(f"\nProduct statistics:")
print(product_stats.to_string())

Expected:

Overall average rating: 3.70

Most reviewed product: Alpine Explorer Tent (5 reviews)

Product statistics:
                         review_count  avg_rating
product_name
Alpine Explorer Tent                5    4.200000
Explorer Pro Backpack               3    3.666667
Peak Performer Boots                3    4.000000
TrailMaster X4 Tent                 3    3.333333
CozyNights Sleeping Bag             2    4.000000
DayTripper Pack                     2    3.500000
Summit Water Bottle                 2    2.500000

5c β€” Best-Rated Product (2+ reviews)ΒΆ

multi_review = product_stats[product_stats["review_count"] >= 2]
best = multi_review.sort_values("avg_rating", ascending=False).iloc[0]
print(f"Highest-rated product (2+ reviews): {multi_review.sort_values('avg_rating', ascending=False).index[0]}")
print(f"  Average rating: {best['avg_rating']:.2f}")

Expected:

Highest-rated product (2+ reviews): Alpine Explorer Tent
  Average rating: 4.20

5d β€” Sentiment by CategoryΒΆ

print("Sentiment distribution by category:")
print(pd.crosstab(reviews["category"], reviews["sentiment"]))

Step 6: Production ConsiderationsΒΆ

When moving from mocks to real Fabric AI Functions, consider these factors:

Batch SizeΒΆ

Batch Size Trade-off
Small (1–10 rows) Higher latency per row; easier to debug
Medium (50–100 rows) Good balance of throughput and cost
Large (500+ rows) Maximum throughput; risk of timeouts and rate limits

Fabric's ai.*() functions handle batching automatically, but you can tune it:

# In Fabric, control batch behavior via configuration
spark.conf.set("spark.synapse.ml.ai.batchSize", 50)

Mock β†’ Real SwapΒΆ

The key advantage of our mock-first approach: swapping to real functions requires changing only the function implementations:

# ── Mock (local development) ────────────────────
reviews["sentiment"] = reviews["rating"].apply(mock_classify)

# ── Real Fabric (production) ────────────────────
# from synapse.ml.fabric import ai
# reviews["sentiment"] = ai.classify(reviews["review_text"],
#                                    ["positive", "neutral", "negative"])

Cost AwarenessΒΆ

Factor Impact
Token count Each review consumes input tokens; longer reviews cost more
Model choice GPT-4o vs. GPT-4o-mini β€” 10Γ— cost difference
Redundant calls Cache results to avoid re-processing unchanged rows
Column count Each ai.*() call is a separate LLM invocation per row

Cost Tip

For 20 reviews, cost is negligible. For 200,000 reviews, a single ai.classify() column could cost $50+ with GPT-4o. Always prototype with a sample, validate results, then scale.


πŸ› Bug-Fix ExerciseΒΆ

The file lab-053/broken_pipeline.py has 3 bugs in the AI enrichment functions. Can you find and fix them all?

Run the self-tests to see which ones fail:

python lab-053/broken_pipeline.py

You should see 3 failed tests. Each test corresponds to one bug:

Test What it checks Hint
Test 1 Sentiment classification thresholds Rating 3 should be neutral, not positive
Test 2 Reviews-per-product grouping Should group by product_name, not review_id
Test 3 Average rating filtered by sentiment Must filter the DataFrame before computing the mean

Fix all 3 bugs, then re-run. When you see πŸŽ‰ All 3 tests passed, you're done!


🧠 Knowledge Check¢

Q1 (Multiple Choice): What does ai.classify() do in Fabric AI Functions?
  • A) Splits text into sentences for NLP processing
  • B) Classifies text into predefined categories using an LLM
  • C) Trains a custom classification model on your data
  • D) Converts text to numeric feature vectors
βœ… Reveal Answer

Correct: B) Classifies text into predefined categories using an LLM

ai.classify() sends each text value to an LLM along with the candidate labels you provide (e.g., ["positive", "neutral", "negative"]). The LLM returns the best-matching label. It does not train a model β€” it uses the LLM's existing knowledge via in-context learning.

Q2 (Multiple Choice): Why is batch size important when using AI Functions at scale?
  • A) Larger batches always produce more accurate results
  • B) Batch size determines which LLM model is used
  • C) Balances throughput, cost, and rate-limit compliance
  • D) Smaller batches use fewer tokens per row
βœ… Reveal Answer

Correct: C) Balances throughput, cost, and rate-limit compliance

Batch size affects how many rows are sent to the LLM endpoint per request. Too small = high latency overhead; too large = risk of rate-limit errors and timeouts. The optimal batch size balances throughput (rows/second), cost (tokens/request), and API rate limits.

Q3 (Run the Lab): How many reviews have a positive sentiment (rating β‰₯ 4)?

Apply mock_classify() to the rating column and count the "positive" values.

βœ… Reveal Answer

13

Ratings of 4 or 5 map to "positive". There are 9 reviews with rating 4 and 4 reviews with rating 5, totaling 13 positive reviews out of 20.

Q4 (Run the Lab): Which product has the most reviews?

Group by product_name and count the rows.

βœ… Reveal Answer

Alpine Explorer Tent β€” 5 reviews

Alpine Explorer Tent (P001) has reviews R001–R005, making it the most-reviewed product. The next most-reviewed products (Peak Performer Boots, Explorer Pro Backpack, TrailMaster X4 Tent) each have 3 reviews.

Q5 (Run the Lab): What is the average rating across all 20 reviews?

Compute reviews["rating"].mean().

βœ… Reveal Answer

3.70

Sum of all ratings: 5+4+5+3+4+4+2+4+5+4+3+5+4+2+4+3+5+3+4+1 = 74. Average = 74 Γ· 20 = 3.70.


SummaryΒΆ

Topic What You Learned
AI Functions ai.classify, ai.summarize, ai.extract, ai.embed as DataFrame operations
Mock-First Development Build and validate pipeline logic before using real LLM calls
Batch Enrichment Apply AI transformations to every row of a dataset
Sentiment Analysis Rating-based classification: positive (β‰₯4), neutral (3), negative (≀2)
Product Analytics Group-by analysis on enriched data for business insights
Production Readiness Batch size, cost, caching, and mock-to-real swap patterns

Next StepsΒΆ

  • Lab 051 (coming soon) β€” Fabric IQ β€” Real-Time Event Stream Processing
  • Lab 052 (coming soon) β€” Fabric IQ β€” Natural Language to SQL with AI