Lab 053: Fabric IQ β Batch AI Enrichment with AI FunctionsΒΆ
What You'll LearnΒΆ
- What Fabric AI Functions are and how they integrate AI into Spark/pandas workflows (
ai.classify,ai.summarize,ai.extract,ai.embed) - Design AI ETL pipelines that enrich tabular data with LLM-powered transformations
- Process data in batch β applying classification, summarization, and entity extraction to entire DataFrames
- Build and test with mock AI functions locally, then swap to real Fabric
ai.*()calls in production
IntroductionΒΆ
Traditional ETL pipelines move and transform structured data β clean, filter, join, aggregate. AI Functions add a new dimension: they let you call an LLM on every row of a DataFrame, treating classification, summarization, and extraction as native column operations.
In Microsoft Fabric, the ai.*() functions run directly inside Spark notebooks. You write df["sentiment"] = ai.classify(df["text"], ["positive", "neutral", "negative"]) and Fabric handles batching, rate-limiting, and model routing behind the scenes.
The ScenarioΒΆ
You are a Data Engineer at OutdoorGear Inc. The product team has collected 20 customer reviews for outdoor gear and wants you to build an enrichment pipeline that:
- Classifies each review's sentiment (positive / neutral / negative)
- Summarizes each review into a short snippet
- Extracts key entities (pros and cons) from the review text
- Embeds the review text for downstream semantic search (discussed conceptually)
Because you're developing locally, you'll use mock AI functions that mimic the behavior of Fabric's ai.*() calls. Once the pipeline is validated, swapping to real models requires changing only the function implementations.
Mock vs. Real AI Functions
This lab uses mock functions (rule-based, no LLM needed) so anyone can follow along without a Fabric capacity. The mock functions produce deterministic results that match the expected outputs. In production Fabric, you would replace these mocks with ai.classify(), ai.summarize(), etc.
PrerequisitesΒΆ
| Requirement | Why |
|---|---|
| Python 3.10+ | Run the enrichment pipeline |
pandas library |
DataFrame operations |
| (Optional) Microsoft Fabric capacity | For real ai.*() functions |
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-053/ folder in your working directory.
| File | Description | Download |
|---|---|---|
broken_pipeline.py |
Bug-fix exercise (3 bugs + self-tests) | π₯ Download |
product_reviews.csv |
Dataset | π₯ Download |
Step 1: Understanding Fabric AI FunctionsΒΆ
Fabric AI Functions are native operations that apply LLM capabilities to DataFrame columns. They abstract away prompt engineering, batching, and API management:
| Function | Signature | Description |
|---|---|---|
ai.classify() |
ai.classify(column, categories) |
Classifies text into one of the provided categories using an LLM |
ai.summarize() |
ai.summarize(column, max_length=None) |
Generates a concise summary of each text value |
ai.extract() |
ai.extract(column, fields) |
Extracts structured fields (entities, keywords) from text |
ai.embed() |
ai.embed(column, model=None) |
Generates vector embeddings for downstream similarity search |
How They Work in FabricΒΆ
In a real Fabric Spark notebook, you'd write:
from synapse.ml.fabric import ai
# Classify sentiment in one line
df["sentiment"] = ai.classify(df["review_text"], ["positive", "neutral", "negative"])
# Summarize reviews
df["summary"] = ai.summarize(df["review_text"], max_length=50)
Fabric handles:
- Batching β groups rows into optimal batch sizes for the model endpoint
- Rate limiting β respects token-per-minute limits automatically
- Error handling β retries transient failures with exponential backoff
- Model routing β uses the workspace's default model or a specified one
Why Mock First?
Building with mocks lets you validate pipeline logic, data types, and downstream consumers before spending compute on real LLM calls. This is a best practice for any AI ETL pipeline.
Step 2: Load the Reviews DatasetΒΆ
The dataset contains 20 product reviews for OutdoorGear products:
import pandas as pd
reviews = pd.read_csv("lab-053/product_reviews.csv")
print(f"Total reviews: {len(reviews)}")
print(f"Unique products: {reviews['product_name'].nunique()}")
print(f"Rating range: {reviews['rating'].min()} β {reviews['rating'].max()}")
print(f"Average rating: {reviews['rating'].mean():.2f}")
print(f"\nReviews per product:")
print(reviews.groupby("product_name").size().sort_values(ascending=False))
Expected output:
Total reviews: 20
Unique products: 7
Rating range: 1 β 5
Average rating: 3.70
Reviews per product:
product_name
Alpine Explorer Tent 5
Peak Performer Boots 3
Explorer Pro Backpack 3
TrailMaster X4 Tent 3
CozyNights Sleeping Bag 2
DayTripper Pack 2
Summit Water Bottle 2
Take a moment to explore the data:
print(reviews[["review_id", "product_name", "rating", "review_text"]].head(5).to_string(index=False))
Step 3: Implement Mock AI FunctionsΒΆ
Instead of calling a real LLM, we create deterministic mock functions that mimic Fabric's ai.*() behavior:
3a β mock_classify(rating)ΒΆ
Classifies sentiment based on the numeric rating:
def mock_classify(rating: int) -> str:
"""Mock ai.classify() β maps rating to sentiment."""
if rating >= 4:
return "positive"
elif rating == 3:
return "neutral"
else:
return "negative"
- Rating β₯ 4 β
"positive" - Rating = 3 β
"neutral" - Rating β€ 2 β
"negative"
3b β mock_summarize(text)ΒΆ
Returns a truncated version of the review text:
def mock_summarize(text: str) -> str:
"""Mock ai.summarize() β returns first 50 characters."""
if len(text) <= 50:
return text
return text[:50] + "..."
3c β mock_extract(text)ΒΆ
Extracts simple keywords by scanning for positive/negative indicator words:
POSITIVE_WORDS = {"amazing", "great", "best", "perfect", "incredible", "love",
"good", "solid", "comfortable", "warm", "durable"}
NEGATIVE_WORDS = {"broke", "terrible", "disappointed", "cheap", "thin",
"cramped", "snags", "cracked"}
def mock_extract(text: str) -> dict:
"""Mock ai.extract() β finds pros and cons keywords."""
words = set(text.lower().split())
pros = sorted(words & POSITIVE_WORDS)
cons = sorted(words & NEGATIVE_WORDS)
return {"pros": pros, "cons": cons}
Real vs. Mock
In production Fabric, ai.classify() sends the review text to an LLM with the candidate labels β it understands context, sarcasm, and nuance. Our mock uses the rating as a proxy, which is a reasonable heuristic for this dataset but wouldn't generalize to unlabeled text.
Step 4: Run the Enrichment PipelineΒΆ
Apply the mock functions to every row in the DataFrame:
# Classify sentiment
reviews["sentiment"] = reviews["rating"].apply(mock_classify)
# Summarize reviews
reviews["summary"] = reviews["review_text"].apply(mock_summarize)
# Extract entities
reviews["entities"] = reviews["review_text"].apply(mock_extract)
print("Enriched DataFrame columns:", list(reviews.columns))
print(f"\nSentiment distribution:")
print(reviews["sentiment"].value_counts())
Expected output:
Enriched DataFrame columns: ['review_id', 'product_id', 'product_name', 'category',
'rating', 'review_text', 'sentiment', 'summary', 'entities']
Sentiment distribution:
positive 13
neutral 4
negative 3
Verify the ResultsΒΆ
# Show a sample of enriched data
sample_cols = ["review_id", "product_name", "rating", "sentiment", "summary"]
print(reviews[sample_cols].head(6).to_string(index=False))
Expected:
| review_id | product_name | rating | sentiment | summary |
|---|---|---|---|---|
| R001 | Alpine Explorer Tent | 5 | positive | Amazing tent! Held up perfectly in heavy rain. Se... |
| R002 | Alpine Explorer Tent | 4 | positive | Solid tent but a bit heavy for long hikes. Great ... |
| R003 | Alpine Explorer Tent | 5 | positive | Best tent I've ever owned. Worth every penny. |
| R004 | Alpine Explorer Tent | 3 | neutral | Decent tent but nothing special at this price poi... |
| R005 | Alpine Explorer Tent | 4 | positive | Good quality materials. Survived a storm with no ... |
| R006 | TrailMaster X4 Tent | 4 | positive | Great ventilation and the zipper is smooth. Sligh... |
Sentiment BreakdownΒΆ
| Sentiment | Count | Ratings |
|---|---|---|
| Positive (rating β₯ 4) | 13 | Ratings 4 and 5 |
| Neutral (rating = 3) | 4 | Rating 3 |
| Negative (rating β€ 2) | 3 | Ratings 1 and 2 |
Step 5: Analyze Enriched DataΒΆ
Now that the reviews are enriched, analyze them to extract business insights:
5a β Average Rating by SentimentΒΆ
print("Average rating by sentiment:")
print(reviews.groupby("sentiment")["rating"].mean().to_string())
Expected:
5b β Product-Level AnalysisΒΆ
product_stats = reviews.groupby("product_name").agg(
review_count=("review_id", "count"),
avg_rating=("rating", "mean"),
).sort_values("review_count", ascending=False)
print(f"Overall average rating: {reviews['rating'].mean():.2f}")
print(f"\nMost reviewed product: {product_stats.index[0]} ({product_stats.iloc[0]['review_count']:.0f} reviews)")
print(f"\nProduct statistics:")
print(product_stats.to_string())
Expected:
Overall average rating: 3.70
Most reviewed product: Alpine Explorer Tent (5 reviews)
Product statistics:
review_count avg_rating
product_name
Alpine Explorer Tent 5 4.200000
Explorer Pro Backpack 3 3.666667
Peak Performer Boots 3 4.000000
TrailMaster X4 Tent 3 3.333333
CozyNights Sleeping Bag 2 4.000000
DayTripper Pack 2 3.500000
Summit Water Bottle 2 2.500000
5c β Best-Rated Product (2+ reviews)ΒΆ
multi_review = product_stats[product_stats["review_count"] >= 2]
best = multi_review.sort_values("avg_rating", ascending=False).iloc[0]
print(f"Highest-rated product (2+ reviews): {multi_review.sort_values('avg_rating', ascending=False).index[0]}")
print(f" Average rating: {best['avg_rating']:.2f}")
Expected:
5d β Sentiment by CategoryΒΆ
print("Sentiment distribution by category:")
print(pd.crosstab(reviews["category"], reviews["sentiment"]))
Step 6: Production ConsiderationsΒΆ
When moving from mocks to real Fabric AI Functions, consider these factors:
Batch SizeΒΆ
| Batch Size | Trade-off |
|---|---|
| Small (1β10 rows) | Higher latency per row; easier to debug |
| Medium (50β100 rows) | Good balance of throughput and cost |
| Large (500+ rows) | Maximum throughput; risk of timeouts and rate limits |
Fabric's ai.*() functions handle batching automatically, but you can tune it:
# In Fabric, control batch behavior via configuration
spark.conf.set("spark.synapse.ml.ai.batchSize", 50)
Mock β Real SwapΒΆ
The key advantage of our mock-first approach: swapping to real functions requires changing only the function implementations:
# ββ Mock (local development) ββββββββββββββββββββ
reviews["sentiment"] = reviews["rating"].apply(mock_classify)
# ββ Real Fabric (production) ββββββββββββββββββββ
# from synapse.ml.fabric import ai
# reviews["sentiment"] = ai.classify(reviews["review_text"],
# ["positive", "neutral", "negative"])
Cost AwarenessΒΆ
| Factor | Impact |
|---|---|
| Token count | Each review consumes input tokens; longer reviews cost more |
| Model choice | GPT-4o vs. GPT-4o-mini β 10Γ cost difference |
| Redundant calls | Cache results to avoid re-processing unchanged rows |
| Column count | Each ai.*() call is a separate LLM invocation per row |
Cost Tip
For 20 reviews, cost is negligible. For 200,000 reviews, a single ai.classify() column could cost $50+ with GPT-4o. Always prototype with a sample, validate results, then scale.
π Bug-Fix ExerciseΒΆ
The file lab-053/broken_pipeline.py has 3 bugs in the AI enrichment functions. Can you find and fix them all?
Run the self-tests to see which ones fail:
You should see 3 failed tests. Each test corresponds to one bug:
| Test | What it checks | Hint |
|---|---|---|
| Test 1 | Sentiment classification thresholds | Rating 3 should be neutral, not positive |
| Test 2 | Reviews-per-product grouping | Should group by product_name, not review_id |
| Test 3 | Average rating filtered by sentiment | Must filter the DataFrame before computing the mean |
Fix all 3 bugs, then re-run. When you see π All 3 tests passed, you're done!
π§ Knowledge CheckΒΆ
Q1 (Multiple Choice): What does ai.classify() do in Fabric AI Functions?
- A) Splits text into sentences for NLP processing
- B) Classifies text into predefined categories using an LLM
- C) Trains a custom classification model on your data
- D) Converts text to numeric feature vectors
β Reveal Answer
Correct: B) Classifies text into predefined categories using an LLM
ai.classify() sends each text value to an LLM along with the candidate labels you provide (e.g., ["positive", "neutral", "negative"]). The LLM returns the best-matching label. It does not train a model β it uses the LLM's existing knowledge via in-context learning.
Q2 (Multiple Choice): Why is batch size important when using AI Functions at scale?
- A) Larger batches always produce more accurate results
- B) Batch size determines which LLM model is used
- C) Balances throughput, cost, and rate-limit compliance
- D) Smaller batches use fewer tokens per row
β Reveal Answer
Correct: C) Balances throughput, cost, and rate-limit compliance
Batch size affects how many rows are sent to the LLM endpoint per request. Too small = high latency overhead; too large = risk of rate-limit errors and timeouts. The optimal batch size balances throughput (rows/second), cost (tokens/request), and API rate limits.
Q3 (Run the Lab): How many reviews have a positive sentiment (rating β₯ 4)?
Apply mock_classify() to the rating column and count the "positive" values.
β Reveal Answer
13
Ratings of 4 or 5 map to "positive". There are 9 reviews with rating 4 and 4 reviews with rating 5, totaling 13 positive reviews out of 20.
Q4 (Run the Lab): Which product has the most reviews?
Group by product_name and count the rows.
β Reveal Answer
Alpine Explorer Tent β 5 reviews
Alpine Explorer Tent (P001) has reviews R001βR005, making it the most-reviewed product. The next most-reviewed products (Peak Performer Boots, Explorer Pro Backpack, TrailMaster X4 Tent) each have 3 reviews.
Q5 (Run the Lab): What is the average rating across all 20 reviews?
Compute reviews["rating"].mean().
β Reveal Answer
3.70
Sum of all ratings: 5+4+5+3+4+4+2+4+5+4+3+5+4+2+4+3+5+3+4+1 = 74. Average = 74 Γ· 20 = 3.70.
SummaryΒΆ
| Topic | What You Learned |
|---|---|
| AI Functions | ai.classify, ai.summarize, ai.extract, ai.embed as DataFrame operations |
| Mock-First Development | Build and validate pipeline logic before using real LLM calls |
| Batch Enrichment | Apply AI transformations to every row of a dataset |
| Sentiment Analysis | Rating-based classification: positive (β₯4), neutral (3), negative (β€2) |
| Product Analytics | Group-by analysis on enriched data for business insights |
| Production Readiness | Batch size, cost, caching, and mock-to-real swap patterns |