SQLModel with Instructor: Complete Integration Guide¶
SQLModel is a modern Python library that combines the power of SQLAlchemy's database operations with Pydantic's data validation. Created by Sebastian Ramirez (the creator of FastAPI), SQLModel provides a unified approach to database modeling and API development.
When integrated with Instructor, SQLModel becomes a powerful tool for AI-driven database operations, allowing you to generate structured data directly from language models and seamlessly store it in your database.
Why SQLModel + Instructor?¶
The combination of SQLModel and Instructor offers several key advantages:
- Single Model Definition: Write one model that works for database tables, API schemas, and AI data generation
- Type Safety: Full type checking and editor support throughout your application
- AI-Powered Data Generation: Generate realistic database records using large language models
- FastAPI Integration: Seamless API development with automatic documentation
- Production Ready: Built on proven technologies (SQLAlchemy + Pydantic)
Quick Start Example¶
Here's a simple example to get you started:
import instructor
from openai import OpenAI
from typing import Optional
from uuid import UUID, uuid4
from pydantic.json_schema import SkipJsonSchema
from sqlmodel import Field, SQLModel, create_engine, Session
# Initialize the Instructor client
client = instructor.from_openai(OpenAI())
class Hero(SQLModel, instructor.OpenAISchema, table=True):
id: SkipJsonSchema[UUID] = Field(default_factory=lambda: uuid4(), primary_key=True)
name: str
secret_name: str
age: Optional[int] = None
power_level: Optional[int] = Field(default=None, ge=1, le=100)
# Generate AI-powered data
def create_hero() -> Hero:
return client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[
{"role": "user", "content": "Create a superhero with a power level between 1-100"},
],
)
# Database setup and insertion
engine = create_engine("sqlite:///heroes.db")
SQLModel.metadata.create_all(engine)
hero = create_hero()
with Session(engine) as session:
session.add(hero)
session.commit()
print(f"Created hero: {hero.name} with power level {hero.power_level}")
Core Concepts and Best Practices¶
Model Definition Strategies¶
Using SkipJsonSchema for Auto-Generated Fields¶
The SkipJsonSchema
annotation is crucial for fields that should be generated by your application rather than the AI:
from pydantic.json_schema import SkipJsonSchema
from sqlmodel import Field, SQLModel
import instructor
from uuid import UUID, uuid4
from datetime import datetime
class Product(SQLModel, instructor.OpenAISchema, table=True):
# Auto-generated fields excluded from AI generation
id: SkipJsonSchema[UUID] = Field(default_factory=uuid4, primary_key=True)
created_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)
updated_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)
# AI-generated fields
name: str = Field(description="Product name")
description: str = Field(description="Detailed product description")
price: float = Field(gt=0, description="Product price in USD")
category: str = Field(description="Product category")
Field Validation and Constraints¶
SQLModel supports Pydantic's validation features, ensuring data quality:
from typing import Optional
from sqlmodel import Field, SQLModel
import instructor
from pydantic import validator
class Customer(SQLModel, instructor.OpenAISchema, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str = Field(min_length=2, max_length=100)
email: str = Field(regex=r'^[\w\.-]+@[\w\.-]+\.\w+$')
age: Optional[int] = Field(default=None, ge=18, le=120)
credit_score: Optional[int] = Field(default=None, ge=300, le=850)
@validator('email')
def validate_email_domain(cls, v):
allowed_domains = ['gmail.com', 'yahoo.com', 'outlook.com']
domain = v.split('@')[1]
if domain not in allowed_domains:
raise ValueError(f'Email domain must be one of {allowed_domains}')
return v
Advanced Integration Patterns¶
Relationship Modeling with AI Generation¶
SQLModel supports relationships between tables, which can be populated using AI:
from typing import List, Optional
from sqlmodel import Field, SQLModel, Relationship
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
class Team(SQLModel, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
city: str
# Relationship to heroes
heroes: List["Hero"] = Relationship(back_populates="team")
class Hero(SQLModel, instructor.OpenAISchema, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
name: str
secret_name: str
age: Optional[int] = None
# Foreign key to team
team_id: Optional[int] = Field(default=None, foreign_key="team.id")
team: Optional[Team] = Relationship(back_populates="heroes")
def create_hero_for_team(team_name: str) -> Hero:
return client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[
{"role": "user", "content": f"Create a superhero for the {team_name} team"},
],
)
Bulk Data Generation¶
Generate multiple records efficiently:
from typing import List
import instructor
from openai import OpenAI
from sqlmodel import Session
client = instructor.from_openai(OpenAI())
def create_hero_team(team_size: int = 5) -> List[Hero]:
return client.chat.completions.create(
model="gpt-4",
response_model=List[Hero],
messages=[
{"role": "user", "content": f"Create a team of {team_size} diverse superheroes"},
],
)
# Bulk insert
heroes = create_hero_team(10)
with Session(engine) as session:
for hero in heroes:
session.add(hero)
session.commit()
print(f"Created {len(heroes)} heroes")
FastAPI Integration¶
Building Production APIs¶
SQLModel's tight integration with FastAPI makes it perfect for building production APIs:
from fastapi import FastAPI, HTTPException, Depends
from sqlmodel import Session, select
from typing import List, Optional
import instructor
from openai import AsyncOpenAI
app = FastAPI(title="Hero Management API")
client = instructor.from_openai(AsyncOpenAI())
def get_session():
with Session(engine) as session:
yield session
# Create hero endpoint
@app.post("/heroes/", response_model=Hero)
async def create_hero_endpoint(
prompt: str,
session: Session = Depends(get_session)
):
hero = await client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[
{"role": "user", "content": f"Create a superhero: {prompt}"},
],
)
session.add(hero)
session.commit()
session.refresh(hero)
return hero
# List heroes endpoint
@app.get("/heroes/", response_model=List[Hero])
def list_heroes(
limit: int = 10,
offset: int = 0,
session: Session = Depends(get_session)
):
statement = select(Hero).offset(offset).limit(limit)
heroes = session.exec(statement).all()
return heroes
# Get specific hero
@app.get("/heroes/{hero_id}", response_model=Hero)
def get_hero(hero_id: int, session: Session = Depends(get_session)):
hero = session.get(Hero, hero_id)
if not hero:
raise HTTPException(status_code=404, detail="Hero not found")
return hero
API Response Models¶
Create specialized models for different API operations:
from sqlmodel import SQLModel
from typing import Optional
# Base model for database
class HeroBase(SQLModel):
name: str
secret_name: str
age: Optional[int] = None
# Database model
class Hero(HeroBase, table=True):
id: Optional[int] = Field(default=None, primary_key=True)
# API models
class HeroCreate(HeroBase):
pass
class HeroRead(HeroBase):
id: int
class HeroUpdate(SQLModel):
name: Optional[str] = None
secret_name: Optional[str] = None
age: Optional[int] = None
Performance Optimization¶
Database Connection Management¶
Optimize database connections for production:
from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool
# Production database configuration
engine = create_engine(
"postgresql://user:password@localhost/dbname",
poolclass=QueuePool,
pool_size=20,
max_overflow=0,
pool_pre_ping=True,
echo=False # Set to True for debugging
)
Efficient AI Data Generation¶
Optimize AI calls for better performance:
import asyncio
from typing import List
import instructor
from openai import AsyncOpenAI
client = instructor.from_openai(AsyncOpenAI())
async def create_heroes_batch(prompts: List[str]) -> List[Hero]:
"""Generate multiple heroes concurrently"""
tasks = []
for prompt in prompts:
task = client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[{"role": "user", "content": prompt}],
)
tasks.append(task)
return await asyncio.gather(*tasks)
# Usage
prompts = [
"Create a fire-based superhero",
"Create a water-based superhero",
"Create an earth-based superhero"
]
heroes = await create_heroes_batch(prompts)
Testing Strategies¶
Unit Testing with SQLModel¶
Test your models and AI integration:
import pytest
from sqlmodel import Session, SQLModel, create_engine
from sqlalchemy.pool import StaticPool
@pytest.fixture
def session():
engine = create_engine(
"sqlite://",
connect_args={"check_same_thread": False},
poolclass=StaticPool,
)
SQLModel.metadata.create_all(engine)
with Session(engine) as session:
yield session
def test_hero_creation(session):
hero = Hero(name="Test Hero", secret_name="Test Identity", age=25)
session.add(hero)
session.commit()
assert hero.id is not None
assert hero.name == "Test Hero"
@pytest.mark.asyncio
async def test_ai_hero_generation():
# Mock the AI response for testing
mock_hero = Hero(name="AI Hero", secret_name="AI Identity", age=30)
# Test the generated hero meets requirements
assert len(mock_hero.name) > 0
assert len(mock_hero.secret_name) > 0
assert mock_hero.age is None or mock_hero.age > 0
Integration Testing¶
Test the full stack including AI generation:
from fastapi.testclient import TestClient
import pytest
client = TestClient(app)
def test_create_hero_endpoint():
response = client.post(
"/heroes/",
params={"prompt": "Create a test superhero"}
)
assert response.status_code == 200
hero_data = response.json()
assert "name" in hero_data
assert "secret_name" in hero_data
def test_list_heroes():
response = client.get("/heroes/")
assert response.status_code == 200
heroes = response.json()
assert isinstance(heroes, list)
Production Deployment¶
Environment Configuration¶
Set up proper configuration for different environments:
from pydantic import BaseSettings
from sqlmodel import create_engine
class Settings(BaseSettings):
database_url: str = "sqlite:///./app.db"
openai_api_key: str
debug: bool = False
class Config:
env_file = ".env"
settings = Settings()
engine = create_engine(settings.database_url)
Error Handling and Logging¶
Implement robust error handling:
import logging
from fastapi import HTTPException
import instructor
from openai import OpenAI
logger = logging.getLogger(__name__)
client = instructor.from_openai(OpenAI())
async def safe_create_hero(prompt: str) -> Hero:
try:
hero = await client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[{"role": "user", "content": prompt}],
max_retries=3,
)
logger.info(f"Successfully created hero: {hero.name}")
return hero
except Exception as e:
logger.error(f"Failed to create hero: {str(e)}")
raise HTTPException(
status_code=500,
detail="Failed to generate hero data"
)
Advanced Use Cases¶
Data Migration and Seeding¶
Use AI to generate realistic seed data:
from sqlmodel import Session
import instructor
from openai import OpenAI
client = instructor.from_openai(OpenAI())
def seed_database():
"""Generate realistic seed data for development"""
engine = create_engine("sqlite:///seed.db")
SQLModel.metadata.create_all(engine)
# Generate diverse heroes
hero_types = [
"tech-based superhero",
"magic-based superhero",
"strength-based superhero",
"speed-based superhero",
"psychic superhero"
]
with Session(engine) as session:
for hero_type in hero_types:
for i in range(5): # 5 heroes of each type
hero = client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[
{"role": "user", "content": f"Create a unique {hero_type}"}
],
)
session.add(hero)
session.commit()
print("Database seeded successfully!")
if __name__ == "__main__":
seed_database()
Real-time Data Processing¶
Combine SQLModel with streaming for real-time applications:
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from collections.abc import AsyncIterable
import instructor
from openai import AsyncOpenAI
import json
app = FastAPI()
client = instructor.from_openai(AsyncOpenAI())
@app.post("/heroes/stream")
async def stream_hero_creation(prompts: List[str]):
async def generate_heroes():
for prompt in prompts:
try:
hero = await client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[{"role": "user", "content": prompt}],
)
# Save to database
with Session(engine) as session:
session.add(hero)
session.commit()
session.refresh(hero)
yield f"data: {hero.model_dump_json()}\n\n"
except Exception as e:
yield f"data: {json.dumps({'error': str(e)})}\n\n"
return StreamingResponse(
generate_heroes(),
media_type="text/plain"
)
Troubleshooting Common Issues¶
Model Inheritance Issues¶
When using both SQLModel and instructor.OpenAISchema:
# Correct way to inherit from both
class Hero(SQLModel, instructor.OpenAISchema, table=True):
__table_args__ = {'extend_existing': True} # Prevents table conflicts
# ... model fields
JSON Schema Conflicts¶
Handle conflicts between database and AI schema requirements:
from pydantic import Field
from pydantic.json_schema import SkipJsonSchema
class Hero(SQLModel, instructor.OpenAISchema, table=True):
# Database-only fields
id: SkipJsonSchema[int] = Field(default=None, primary_key=True)
created_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)
# AI-generated fields with database constraints
name: str = Field(description="Hero name for AI", max_length=100) # DB constraint
power_level: int = Field(description="Power level 1-100", ge=1, le=100)
Performance Monitoring¶
Monitor AI generation performance:
import time
from functools import wraps
def monitor_ai_calls(func):
@wraps(func)
async def wrapper(*args, **kwargs):
start_time = time.time()
result = await func(*args, **kwargs)
duration = time.time() - start_time
logger.info(f"AI call took {duration:.2f} seconds")
return result
return wrapper
@monitor_ai_calls
async def create_hero(prompt: str) -> Hero:
return await client.chat.completions.create(
model="gpt-4",
response_model=Hero,
messages=[{"role": "user", "content": prompt}],
)
Conclusion¶
SQLModel with Instructor provides a powerful foundation for building AI-powered applications with robust database integration. The combination offers:
- Developer Productivity: Single model definition for multiple use cases
- Type Safety: Full type checking and validation
- AI Integration: Seamless integration with language models
- Production Ready: Built on proven, scalable technologies
- FastAPI Compatible: Perfect for modern API development
By following the patterns and best practices outlined in this guide, you can build sophisticated applications that leverage AI for data generation while maintaining data integrity and performance.
Next Steps¶
- Explore the FastAPI integration guide for advanced API patterns
- Check out validation techniques for robust data handling
- Learn about streaming responses for real-time applications
Example of AI-generated hero data stored in SQLite database