Lab 005: Prompt EngineeringΒΆ
What You'll LearnΒΆ
- The anatomy of a prompt: system, user, assistant messages
- Core techniques: zero-shot, few-shot, chain-of-thought, role prompting
- How to write effective system prompts for AI agents
- Common failure patterns β and how to fix them
- Practical templates you can use immediately
IntroductionΒΆ
Prompt engineering is the practice of designing inputs to LLMs that reliably produce the outputs you want. It's part art, part science β and the single most impactful skill for building good AI agents.
A well-crafted prompt can turn a mediocre response into an excellent one without changing the model. A poorly designed system prompt will cause your agent to misbehave no matter how powerful the model is.
Try these examples live
Open the GitHub Models Playground in a browser tab and test each example as you read. It's free with a GitHub account.
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-005/ folder in your working directory.
| File | Description | Download |
|---|---|---|
prompt_challenges.py |
Interactive exercise script | π₯ Download |
Part 1: Anatomy of a PromptΒΆ
Every LLM API call has up to three message types:
ββββββββββββββββββββββββββββββββββββββββββββββββ
β SYSTEM MESSAGE β
β "You are a helpful assistant for Zava, β
β a DIY retail company..." β
β (Persistent instructions β defines behavior)β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β USER MESSAGE β
β "What are your top-selling products β
β in the camping category?" β
β (The human's input) β
ββββββββββββββββββββββββββββββββββββββββββββββββ€
β ASSISTANT MESSAGE (optional) β
β "The top-selling camping products are..." β
β (Prior model responses β for few-shot or β
β continued conversations) β
ββββββββββββββββββββββββββββββββββββββββββββββββ
The System MessageΒΆ
The system message is the most important part of agent design. It:
- Defines the agent's persona and role
- Sets behavioral rules ("never invent data")
- Specifies output format (Markdown, JSON, tables)
- Provides domain context the model wouldn't otherwise have
- Handles edge cases ("if asked out-of-scope questions, say...")
π€ Check Your Understanding
What are the three message roles in an LLM API call, and which one is invisible to the end user?
Answer
The three roles are system, user, and assistant. The system message is invisible to end users β it's set by the developer and defines the agent's persona, rules, scope, and behavior. The user sees their own messages and the assistant's responses.
Part 2: Core TechniquesΒΆ
Zero-ShotΒΆ
Ask directly with no examples. Works for simple, well-defined tasks.
Classify this customer review as Positive, Neutral, or Negative.
Review: "The tent arrived on time but the zipper broke after one use."
When to use: Simple classification, extraction, summarization.
Few-ShotΒΆ
Provide examples before your actual question. Dramatically improves consistency.
Classify customer reviews as Positive, Neutral, or Negative.
Review: "Great quality, arrived fast!" β Positive
Review: "It's okay, nothing special." β Neutral
Review: "Completely broken on arrival." β Negative
Review: "The tent arrived on time but the zipper broke after one use." β
When to use: Any task where you want a specific format, tone, or classification scheme.
Rule of thumb
2β5 examples is usually enough. More than 10 rarely helps and costs more tokens.
Chain-of-Thought (CoT)ΒΆ
Ask the model to think step-by-step before giving the final answer. Improves accuracy on reasoning tasks.
Without CoT:
Q: A store sells 3 tents for $249 each and gives a 15% group discount.
What is the total?
A: $635.55
With CoT:
Q: A store sells 3 tents for $249 each and gives a 15% group discount.
What is the total? Think step by step.
A:
Step 1: 3 tents Γ $249 = $747
Step 2: 15% discount = $747 Γ 0.15 = $112.05
Step 3: Total = $747 - $112.05 = $634.95
Final answer: $634.95
How to trigger CoT: - "Think step by step" - "Let's work through this" - "Explain your reasoning before answering"
When to use: Math, logic, multi-step reasoning, debugging, complex decisions.
π€ Check Your Understanding
Why does adding "Think step by step" to a math prompt improve accuracy, even though the model has the same knowledge either way?
Answer
Chain-of-thought prompting forces the model to generate intermediate reasoning steps before producing a final answer. This reduces errors because the model can catch mistakes in earlier steps. Without CoT, the model may "rush" to a final answer and skip critical calculations.
π€ Check Your Understanding
When would you choose few-shot prompting over zero-shot prompting?
Answer
Use few-shot when you need a specific format, tone, or classification scheme that the model might not infer from instructions alone. Providing 2β5 examples dramatically improves consistency. Zero-shot works for simple, well-defined tasks where the model can infer the expected output format.
Role PromptingΒΆ
Give the model a persona to adopt. Changes tone, vocabulary, and depth.
You are a senior PostgreSQL database engineer with 15 years of experience.
Review this query for performance issues and suggest improvements:
SELECT * FROM sales WHERE store_id = 5 ORDER BY sale_date;
vs.
Review this query for performance issues:
SELECT * FROM sales WHERE store_id = 5 ORDER BY sale_date;
The role prompt produces more detailed, expert-level feedback.
Structured OutputΒΆ
Force the model to respond in a specific format β JSON, Markdown table, bullet list.
Extract the product details from this text and return as JSON.
Do not include any explanation β return only the JSON object.
Text: "The ProTrek X200 hiking boots are available in sizes 7-13,
priced at $189.99, and come in black and brown."
Expected format:
{
"name": string,
"sizes": [number],
"price": number,
"colors": [string]
}
Use JSON mode when available
Most APIs support response_format: { type: "json_object" } which forces valid JSON output and eliminates parsing errors.
Prompt ChainingΒΆ
Break complex tasks into a sequence of smaller prompts. Each output feeds the next.
Step 1: Extract key facts from the sales report β JSON
Step 2: Feed JSON to "write an executive summary" prompt β Text
Step 3: Feed summary to "translate to Spanish" prompt β Final output
This is more reliable than asking one prompt to do everything.
Part 3: Writing Agent System PromptsΒΆ
For AI agents (used in all labs from L100+), the system prompt is the agent's constitution. Here's a proven structure:
## Role
You are [name], a [role] for [company/context].
Your tone is [professional/friendly/technical].
## Capabilities
You can:
- [capability 1]
- [capability 2]
Use ONLY the tools provided to you. Never invent data.
## Rules
- [Rule 1: always do X]
- [Rule 2: never do Y]
- [Rule 3: when Z happens, respond with...]
## Output Format
- Default: Markdown tables
- Charts: only when explicitly requested
- Language: respond in the same language the user writes in
## Scope
Only answer questions about [domain].
For out-of-scope questions, say: "I can only help with [domain]."
Real example: Zava Sales Agent (from this repo's workshop)ΒΆ
You are Zava, a sales analysis agent for Zava DIY Retail (Washington State).
Your tone is professional and friendly. Use emojis sparingly.
## Data Rules
- Always fetch table schemas before querying (get_multiple_table_schemas())
- Apply LIMIT 20 to all SELECT queries
- Use exact table and column names from the schema
- Never invent, estimate, or assume data
## Financial Calendar
- Financial year (FY) starts July 1
- Q1=JulβSep, Q2=OctβDec, Q3=JanβMar, Q4=AprβJun
## Visualizations
- Generate charts ONLY when user uses words: "chart", "graph", "visualize", "show as"
- Always save as PNG and provide download link
## Scope
Only answer questions about Zava sales data.
If asked about anything else, say you're specialized for Zava sales analysis.
Part 4: Common Failure Patterns β and FixesΒΆ
β The Vague PromptΒΆ
# Bad
"Summarize this."
# Good
"Summarize this sales report in 3 bullet points.
Each bullet should be β€20 words.
Focus on: total revenue, top product, and key trend."
Rule: Be explicit about format, length, and focus.
β The Contradictory PromptΒΆ
# Bad (contradicts itself)
"Be concise but include all the details."
# Good
"Summarize in 100 words. Prioritize: revenue numbers and top-performing stores."
Rule: When space is limited, tell the model what to prioritize.
β No Negative ExamplesΒΆ
# Bad (doesn't stop hallucination)
"Answer questions about our product catalog."
# Good
"Answer questions about our product catalog.
If you don't have a product in your data, say 'I don't have that product in the catalog.'
Never guess or suggest alternatives you haven't verified."
Rule: Always define what the agent should do when it can't answer.
β Instruction OverloadΒΆ
# Bad (27 rules, contradictory, hard to follow)
"Be helpful. Be concise. Be detailed. Use tables. Use bullet points.
Always explain. Never explain. Answer in English. Answer in Portuguese..."
# Good
"Use Markdown tables for data. Use bullet points for lists.
Default to the user's language."
Rule: 5β10 clear rules outperform 30 vague ones.
β Forgetting the Edge CasesΒΆ
Always ask: "What happens if the user asks something out of scope? What if data is missing? What if the question is ambiguous?"
Build rules for those cases explicitly.
Part 5: Quick Reference TemplatesΒΆ
Extraction PromptΒΆ
Extract the following fields from the text below.
Return as JSON. If a field is not found, use null.
Fields: name, price, category, availability
Text:
"""
{text}
"""
Classification PromptΒΆ
Classify the following support ticket into one of these categories:
[Billing, Shipping, Returns, Technical, Other]
Return only the category name. No explanation.
Ticket: "{ticket_text}"
Summarization PromptΒΆ
Summarize the following in {n} bullet points.
Each bullet: one key insight, β€15 words.
Audience: {audience}
Text:
"""
{text}
"""
Agent System Prompt TemplateΒΆ
## Role
You are {agent_name}, a {role} for {company}.
Tone: {tone}.
## Capabilities
You have access to these tools: {tools}
Only use verified tool outputs. Never invent data.
## Rules
- {rule_1}
- {rule_2}
## Output Format
{format_instructions}
## Scope
{scope_definition}
For out-of-scope questions: "{out_of_scope_response}"
π€ Check Your Understanding
Why is it important for an agent's system prompt to define what the agent should do when it can't answer a question?
Answer
Without an explicit fallback instruction, the LLM will try to answer anyway β often hallucinating a plausible-sounding but incorrect response. Defining out-of-scope behavior (e.g., "say 'I can only help with X'") prevents the agent from inventing data and sets clear user expectations.
Part 6: π§ͺ Interactive Challenges β Fix the PromptsΒΆ
Reading about prompts is good. Writing and running them is better.
These 4 challenges give you broken or vague prompts that produce bad results. Your task: improve them until the output matches the target.
Setup (5 minutes, free)ΒΆ
pip install openai
export GITHUB_TOKEN=your_github_token # github.com β Settings β Developer Settings β Tokens
Run the challenge file you downloaded from the π¦ Supporting Files section above:
What each challenge testsΒΆ
| # | What's broken | Technique to apply |
|---|---|---|
| 1 | Vague user prompt, no format instruction | Specific output format |
| 2 | No structure, likely prose instead of JSON | Structured output |
| 3 | Direct question without reasoning steps | Chain-of-thought |
| 4 | No scope guardrails β hallucinated products | Scope control |
How to work through each challengeΒΆ
- Run
python prompt_challenges.pyand read the β BAD PROMPT result - Edit the
IMPROVED_SYSTEM_*orIMPROVED_USER_*variables at the bottom of each challenge - Re-run and compare with the Target description in the comments
- Keep iterating until your output matches
There's no single right answer
The goal is to get output that meets the target spec. How you phrase the prompt is up to you β compare approaches with a colleague!
π§ Knowledge CheckΒΆ
Q1 (Multiple Choice): You are building an agent that needs to solve a multi-step math problem. Which prompting technique will most improve accuracy?
- A) Zero-shot prompting
- B) Role prompting (e.g., "You are a mathematician")
- C) Chain-of-thought prompting (e.g., "Think step by step")
- D) Structured output prompting
β Reveal Answer
Correct: C β Chain-of-thought prompting
Chain-of-thought (CoT) forces the model to reason through intermediate steps before producing a final answer. This dramatically reduces errors on math, logic, and multi-step problems. "Think step by step" or showing few-shot examples with explicit reasoning both trigger CoT. Zero-shot works for simple tasks; role prompting helps with tone/expertise; structured output helps with formatting.
Q2 (Multiple Choice): Which of the three conversation roles does the USER never directly see when interacting with an agent?
- A) user
- B) assistant
- C) system
- D) function
β Reveal Answer
Correct: C β system
The system message is the agent's "constitution" β it sets the persona, rules, scope, and behavior. It's set by the developer and not visible to end users in the chat interface. The user role holds the human's inputs. The assistant role holds the model's previous responses (included in subsequent API calls to maintain context).
Q3 (Multiple Choice): Your OutdoorGear agent keeps saying things like 'The TrailBlazer Tent probably weighs around 1.5kg' even though the exact weight is in the database. Which system prompt rule is the best fix?
- A) "You are a helpful OutdoorGear assistant."
- B) "Never invent, estimate, or assume data. Only use outputs from the tools provided to you. If the product is not found, say: 'I don't have that information in our catalog.'"
- C) "Think step by step before answering."
- D) "Always respond in JSON format."
β Reveal Answer
Correct: B
The key is two instructions working together: (1) the prohibition on inventing/estimating data, and (2) an explicit fallback phrase for when data is unavailable. Without the fallback, the model will invent an answer rather than say nothing. Grounding rules + fallback behavior together prevent hallucination in tool-using agents.
SummaryΒΆ
| Technique | Best for |
|---|---|
| Zero-shot | Simple, clear tasks |
| Few-shot | Consistent format or classification |
| Chain-of-thought | Reasoning, math, multi-step problems |
| Role prompting | Expert-level responses |
| Structured output | JSON, tables, parseable data |
| Prompt chaining | Complex multi-step workflows |
The golden rule: Be specific about what you want, what format, and what to do when things go wrong.
Next StepsΒΆ
You're now ready to build your first hands-on lab:
β Lab 010 β GitHub Copilot First Steps β Apply prompt skills in VS Code
β Lab 013 β GitHub Models β Run your own prompts via API for free
β Lab 014 β SK Hello Agent β Write a system prompt for a Semantic Kernel agent