Lab 013: GitHub Models β Free LLM InferenceΒΆ
What You'll LearnΒΆ
- What GitHub Models is and which models are available
- How to use the GitHub Models playground (browser, no code)
- How to call GitHub Models via the REST API and Python SDK
- How to generate text embeddings for free (needed for RAG labs)
IntroductionΒΆ
GitHub Models gives you free API access to frontier LLMs β GPT-4o, Llama, Phi, Mistral, and more β using your GitHub personal access token. No Azure account, no credit card, no sign-up beyond what you already have.
This is the LLM backend used in all L200 labs in this hub.
Prerequisites SetupΒΆ
1. Create a GitHub personal access tokenΒΆ
- Go to github.com/settings/tokens
- Click "Generate new token (classic)"
- Name:
github-models-labs - Expiration: 90 days
- Scopes: none needed (read-only access is sufficient for Models API)
- Click "Generate token" β copy and save it immediately
2. Store the token as an environment variableΒΆ
π¦ Supporting FilesΒΆ
Download these files before starting the lab
Save all files to a lab-013/ folder in your working directory.
| File | Description | Download |
|---|---|---|
requirements.txt |
Python dependencies | π₯ Download |
starter.py |
Starter script with TODOs | π₯ Download |
Lab ExerciseΒΆ
Step 1: Explore the PlaygroundΒΆ
- Go to github.com/marketplace/models
- Click on "gpt-4o"
- Click "Playground"
- Type a message and press Enter
You're now chatting with GPT-4o for free, directly in the browser.
Try different models:
- gpt-4o-mini β faster and cheaper
- Phi-4 β Microsoft's small but powerful model
- Llama-3.3-70B-Instruct β Meta's open-source model
Step 2: Make your first API callΒΆ
Install the OpenAI Python SDK (it's compatible with GitHub Models):
Create hello_models.py:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
)
response = client.chat.completions.create(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "What is the Model Context Protocol?"},
],
max_tokens=500,
)
print(response.choices[0].message.content)
Run it:
Add the NuGet package:
Create Program.cs:
using Azure;
using Azure.AI.Inference;
var endpoint = new Uri("https://models.inference.ai.azure.com");
var credential = new AzureKeyCredential(Environment.GetEnvironmentVariable("GITHUB_TOKEN")!);
var client = new ChatCompletionsClient(endpoint, credential);
var response = await client.CompleteAsync(new ChatCompletionsOptions
{
Model = "gpt-4o-mini",
Messages =
{
new ChatRequestSystemMessage("You are a helpful assistant."),
new ChatRequestUserMessage("What is the Model Context Protocol?"),
},
MaxTokens = 500,
});
Console.WriteLine(response.Value.Content);
Step 3: Generate text embeddingsΒΆ
Embeddings are the key ingredient for RAG. Let's generate one:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
)
response = client.embeddings.create(
model="text-embedding-3-small",
input="A waterproof outdoor camping tent",
)
vector = response.data[0].embedding
print(f"Embedding dimensions: {len(vector)}")
print(f"First 5 values: {vector[:5]}")
What is an embedding?
An embedding is a list of numbers (a vector) that represents the meaning of a piece of text.
Similar texts produce vectors that are close together in vector space.
This is how semantic search works: compare the query vector to all document vectors and return the closest ones.
Step 4: Available ModelsΒΆ
Check what models are available via the API:
import os
from openai import OpenAI
client = OpenAI(
base_url="https://models.inference.ai.azure.com",
api_key=os.environ["GITHUB_TOKEN"],
)
models = client.models.list()
for model in models.data:
print(model.id)
π Starter FilesΒΆ
Download the starter file to follow along:
The π₯ starter.py contains 4 exercises with TODO comments. Complete each TODO to build a working GitHub Models client.
Rate LimitsΒΆ
GitHub Models is free but rate-limited:
| Tier | Requests/min | Tokens/day |
|---|---|---|
| Free | ~15 | ~150,000 |
| Copilot Pro/Business | Higher | Higher |
For lab purposes, these limits are more than sufficient. If you hit a limit, wait 1 minute.
SummaryΒΆ
GitHub Models gives you free access to frontier LLMs using just your GitHub account. You can use the playground browser UI or call the API from Python/C#/REST. The API is OpenAI-compatible, so any code that works with OpenAI works here too.
Next StepsΒΆ
- Build an agent with Semantic Kernel: β Lab 014 β SK Hello Agent
- Build a RAG app: β Lab 022 β RAG with GitHub Models + pgvector