SQLModel with Instructor: Complete Integration Guide¶

SQLModel is a modern Python library that combines the power of SQLAlchemy's database operations with Pydantic's data validation. Created by Sebastian Ramirez (the creator of FastAPI), SQLModel provides a unified approach to database modeling and API development.

When integrated with Instructor, SQLModel becomes a powerful tool for AI-driven database operations, allowing you to generate structured data directly from language models and seamlessly store it in your database.

Why SQLModel + Instructor?¶

The combination of SQLModel and Instructor offers several key advantages:

Single Model Definition: Write one model that works for database tables, API schemas, and AI data generation
Type Safety: Full type checking and editor support throughout your application
AI-Powered Data Generation: Generate realistic database records using large language models
FastAPI Integration: Seamless API development with automatic documentation
Production Ready: Built on proven technologies (SQLAlchemy + Pydantic)

Quick Start Example¶

Here's a simple example to get you started:

import instructor
from openai import OpenAI
from typing import Optional
from uuid import UUID, uuid4
from pydantic.json_schema import SkipJsonSchema
from sqlmodel import Field, SQLModel, create_engine, Session

# Initialize the Instructor client
client = instructor.from_openai(OpenAI())

class Hero(SQLModel, instructor.OpenAISchema, table=True):
    id: SkipJsonSchema[UUID] = Field(default_factory=lambda: uuid4(), primary_key=True)
    name: str
    secret_name: str
    age: Optional[int] = None
    power_level: Optional[int] = Field(default=None, ge=1, le=100)

# Generate AI-powered data
def create_hero() -> Hero:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=Hero,
        messages=[
            {"role": "user", "content": "Create a superhero with a power level between 1-100"},
        ],
    )

# Database setup and insertion
engine = create_engine("sqlite:///heroes.db")
SQLModel.metadata.create_all(engine)

hero = create_hero()
with Session(engine) as session:
    session.add(hero)
    session.commit()
    print(f"Created hero: {hero.name} with power level {hero.power_level}")

Core Concepts and Best Practices¶

Model Definition Strategies¶

Using SkipJsonSchema for Auto-Generated Fields¶

The SkipJsonSchema annotation is crucial for fields that should be generated by your application rather than the AI:

from pydantic.json_schema import SkipJsonSchema
from sqlmodel import Field, SQLModel
import instructor
from uuid import UUID, uuid4
from datetime import datetime

class Product(SQLModel, instructor.OpenAISchema, table=True):
    # Auto-generated fields excluded from AI generation
    id: SkipJsonSchema[UUID] = Field(default_factory=uuid4, primary_key=True)
    created_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)
    updated_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)

    # AI-generated fields
    name: str = Field(description="Product name")
    description: str = Field(description="Detailed product description")
    price: float = Field(gt=0, description="Product price in USD")
    category: str = Field(description="Product category")

Field Validation and Constraints¶

SQLModel supports Pydantic's validation features, ensuring data quality:

from typing import Optional
from sqlmodel import Field, SQLModel
import instructor
from pydantic import validator

class Customer(SQLModel, instructor.OpenAISchema, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str = Field(min_length=2, max_length=100)
    email: str = Field(regex=r'^[\w\.-]+@[\w\.-]+\.\w+$')
    age: Optional[int] = Field(default=None, ge=18, le=120)
    credit_score: Optional[int] = Field(default=None, ge=300, le=850)

    @validator('email')
    def validate_email_domain(cls, v):
        allowed_domains = ['gmail.com', 'yahoo.com', 'outlook.com']
        domain = v.split('@')[1]
        if domain not in allowed_domains:
            raise ValueError(f'Email domain must be one of {allowed_domains}')
        return v

Advanced Integration Patterns¶

Relationship Modeling with AI Generation¶

SQLModel supports relationships between tables, which can be populated using AI:

from typing import List, Optional
from sqlmodel import Field, SQLModel, Relationship
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

class Team(SQLModel, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str
    city: str

    # Relationship to heroes
    heroes: List["Hero"] = Relationship(back_populates="team")

class Hero(SQLModel, instructor.OpenAISchema, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)
    name: str
    secret_name: str
    age: Optional[int] = None

    # Foreign key to team
    team_id: Optional[int] = Field(default=None, foreign_key="team.id")
    team: Optional[Team] = Relationship(back_populates="heroes")

def create_hero_for_team(team_name: str) -> Hero:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=Hero,
        messages=[
            {"role": "user", "content": f"Create a superhero for the {team_name} team"},
        ],
    )

Bulk Data Generation¶

Generate multiple records efficiently:

from typing import List
import instructor
from openai import OpenAI
from sqlmodel import Session

client = instructor.from_openai(OpenAI())

def create_hero_team(team_size: int = 5) -> List[Hero]:
    return client.chat.completions.create(
        model="gpt-4",
        response_model=List[Hero],
        messages=[
            {"role": "user", "content": f"Create a team of {team_size} diverse superheroes"},
        ],
    )

# Bulk insert
heroes = create_hero_team(10)
with Session(engine) as session:
    for hero in heroes:
        session.add(hero)
    session.commit()
    print(f"Created {len(heroes)} heroes")

FastAPI Integration¶

Building Production APIs¶

SQLModel's tight integration with FastAPI makes it perfect for building production APIs:

from fastapi import FastAPI, HTTPException, Depends
from sqlmodel import Session, select
from typing import List, Optional
import instructor
from openai import AsyncOpenAI

app = FastAPI(title="Hero Management API")
client = instructor.from_openai(AsyncOpenAI())

def get_session():
    with Session(engine) as session:
        yield session

# Create hero endpoint
@app.post("/heroes/", response_model=Hero)
async def create_hero_endpoint(
    prompt: str,
    session: Session = Depends(get_session)
):
    hero = await client.chat.completions.create(
        model="gpt-4",
        response_model=Hero,
        messages=[
            {"role": "user", "content": f"Create a superhero: {prompt}"},
        ],
    )
    session.add(hero)
    session.commit()
    session.refresh(hero)
    return hero

# List heroes endpoint
@app.get("/heroes/", response_model=List[Hero])
def list_heroes(
    limit: int = 10,
    offset: int = 0,
    session: Session = Depends(get_session)
):
    statement = select(Hero).offset(offset).limit(limit)
    heroes = session.exec(statement).all()
    return heroes

# Get specific hero
@app.get("/heroes/{hero_id}", response_model=Hero)
def get_hero(hero_id: int, session: Session = Depends(get_session)):
    hero = session.get(Hero, hero_id)
    if not hero:
        raise HTTPException(status_code=404, detail="Hero not found")
    return hero

API Response Models¶

Create specialized models for different API operations:

from sqlmodel import SQLModel
from typing import Optional

# Base model for database
class HeroBase(SQLModel):
    name: str
    secret_name: str
    age: Optional[int] = None

# Database model
class Hero(HeroBase, table=True):
    id: Optional[int] = Field(default=None, primary_key=True)

# API models
class HeroCreate(HeroBase):
    pass

class HeroRead(HeroBase):
    id: int

class HeroUpdate(SQLModel):
    name: Optional[str] = None
    secret_name: Optional[str] = None
    age: Optional[int] = None

Performance Optimization¶

Database Connection Management¶

Optimize database connections for production:

from sqlmodel import create_engine
from sqlalchemy.pool import QueuePool

# Production database configuration
engine = create_engine(
    "postgresql://user:password@localhost/dbname",
    poolclass=QueuePool,
    pool_size=20,
    max_overflow=0,
    pool_pre_ping=True,
    echo=False  # Set to True for debugging
)

Efficient AI Data Generation¶

Optimize AI calls for better performance:

import asyncio
from typing import List
import instructor
from openai import AsyncOpenAI

client = instructor.from_openai(AsyncOpenAI())

async def create_heroes_batch(prompts: List[str]) -> List[Hero]:
    """Generate multiple heroes concurrently"""
    tasks = []
    for prompt in prompts:
        task = client.chat.completions.create(
            model="gpt-4",
            response_model=Hero,
            messages=[{"role": "user", "content": prompt}],
        )
        tasks.append(task)

    return await asyncio.gather(*tasks)

# Usage
prompts = [
    "Create a fire-based superhero",
    "Create a water-based superhero",
    "Create an earth-based superhero"
]
heroes = await create_heroes_batch(prompts)

Testing Strategies¶

Unit Testing with SQLModel¶

Test your models and AI integration:

import pytest
from sqlmodel import Session, SQLModel, create_engine
from sqlalchemy.pool import StaticPool

@pytest.fixture
def session():
    engine = create_engine(
        "sqlite://",
        connect_args={"check_same_thread": False},
        poolclass=StaticPool,
    )
    SQLModel.metadata.create_all(engine)
    with Session(engine) as session:
        yield session

def test_hero_creation(session):
    hero = Hero(name="Test Hero", secret_name="Test Identity", age=25)
    session.add(hero)
    session.commit()

    assert hero.id is not None
    assert hero.name == "Test Hero"

@pytest.mark.asyncio
async def test_ai_hero_generation():
    # Mock the AI response for testing
    mock_hero = Hero(name="AI Hero", secret_name="AI Identity", age=30)

    # Test the generated hero meets requirements
    assert len(mock_hero.name) > 0
    assert len(mock_hero.secret_name) > 0
    assert mock_hero.age is None or mock_hero.age > 0

Integration Testing¶

Test the full stack including AI generation:

from fastapi.testclient import TestClient
import pytest

client = TestClient(app)

def test_create_hero_endpoint():
    response = client.post(
        "/heroes/",
        params={"prompt": "Create a test superhero"}
    )
    assert response.status_code == 200
    hero_data = response.json()
    assert "name" in hero_data
    assert "secret_name" in hero_data

def test_list_heroes():
    response = client.get("/heroes/")
    assert response.status_code == 200
    heroes = response.json()
    assert isinstance(heroes, list)

Production Deployment¶

Environment Configuration¶

Set up proper configuration for different environments:

from pydantic import BaseSettings
from sqlmodel import create_engine

class Settings(BaseSettings):
    database_url: str = "sqlite:///./app.db"
    openai_api_key: str
    debug: bool = False

    class Config:
        env_file = ".env"

settings = Settings()
engine = create_engine(settings.database_url)

Error Handling and Logging¶

Implement robust error handling:

import logging
from fastapi import HTTPException
import instructor
from openai import OpenAI

logger = logging.getLogger(__name__)
client = instructor.from_openai(OpenAI())

async def safe_create_hero(prompt: str) -> Hero:
    try:
        hero = await client.chat.completions.create(
            model="gpt-4",
            response_model=Hero,
            messages=[{"role": "user", "content": prompt}],
            max_retries=3,
        )
        logger.info(f"Successfully created hero: {hero.name}")
        return hero
    except Exception as e:
        logger.error(f"Failed to create hero: {str(e)}")
        raise HTTPException(
            status_code=500,
            detail="Failed to generate hero data"
        )

Advanced Use Cases¶

Data Migration and Seeding¶

Use AI to generate realistic seed data:

from sqlmodel import Session
import instructor
from openai import OpenAI

client = instructor.from_openai(OpenAI())

def seed_database():
    """Generate realistic seed data for development"""
    engine = create_engine("sqlite:///seed.db")
    SQLModel.metadata.create_all(engine)

    # Generate diverse heroes
    hero_types = [
        "tech-based superhero",
        "magic-based superhero",
        "strength-based superhero",
        "speed-based superhero",
        "psychic superhero"
    ]

    with Session(engine) as session:
        for hero_type in hero_types:
            for i in range(5):  # 5 heroes of each type
                hero = client.chat.completions.create(
                    model="gpt-4",
                    response_model=Hero,
                    messages=[
                        {"role": "user", "content": f"Create a unique {hero_type}"}
                    ],
                )
                session.add(hero)

        session.commit()
        print("Database seeded successfully!")

if __name__ == "__main__":
    seed_database()

Real-time Data Processing¶

Combine SQLModel with streaming for real-time applications:

from fastapi import FastAPI
from fastapi.responses import StreamingResponse
from collections.abc import AsyncIterable
import instructor
from openai import AsyncOpenAI
import json

app = FastAPI()
client = instructor.from_openai(AsyncOpenAI())

@app.post("/heroes/stream")
async def stream_hero_creation(prompts: List[str]):
    async def generate_heroes():
        for prompt in prompts:
            try:
                hero = await client.chat.completions.create(
                    model="gpt-4",
                    response_model=Hero,
                    messages=[{"role": "user", "content": prompt}],
                )

                # Save to database
                with Session(engine) as session:
                    session.add(hero)
                    session.commit()
                    session.refresh(hero)

                yield f"data: {hero.model_dump_json()}\n\n"
            except Exception as e:
                yield f"data: {json.dumps({'error': str(e)})}\n\n"

    return StreamingResponse(
        generate_heroes(),
        media_type="text/plain"
    )

Troubleshooting Common Issues¶

Model Inheritance Issues¶

When using both SQLModel and instructor.OpenAISchema:

# Correct way to inherit from both
class Hero(SQLModel, instructor.OpenAISchema, table=True):
    __table_args__ = {'extend_existing': True}  # Prevents table conflicts
    # ... model fields

JSON Schema Conflicts¶

Handle conflicts between database and AI schema requirements:

from pydantic import Field
from pydantic.json_schema import SkipJsonSchema

class Hero(SQLModel, instructor.OpenAISchema, table=True):
    # Database-only fields
    id: SkipJsonSchema[int] = Field(default=None, primary_key=True)
    created_at: SkipJsonSchema[datetime] = Field(default_factory=datetime.utcnow)

    # AI-generated fields with database constraints
    name: str = Field(description="Hero name for AI", max_length=100)  # DB constraint
    power_level: int = Field(description="Power level 1-100", ge=1, le=100)

Performance Monitoring¶

Monitor AI generation performance:

import time
from functools import wraps

def monitor_ai_calls(func):
    @wraps(func)
    async def wrapper(*args, **kwargs):
        start_time = time.time()
        result = await func(*args, **kwargs)
        duration = time.time() - start_time
        logger.info(f"AI call took {duration:.2f} seconds")
        return result
    return wrapper

@monitor_ai_calls
async def create_hero(prompt: str) -> Hero:
    return await client.chat.completions.create(
        model="gpt-4",
        response_model=Hero,
        messages=[{"role": "user", "content": prompt}],
    )

Conclusion¶

SQLModel with Instructor provides a powerful foundation for building AI-powered applications with robust database integration. The combination offers:

Developer Productivity: Single model definition for multiple use cases
Type Safety: Full type checking and validation
AI Integration: Seamless integration with language models
Production Ready: Built on proven, scalable technologies
FastAPI Compatible: Perfect for modern API development

By following the patterns and best practices outlined in this guide, you can build sophisticated applications that leverage AI for data generation while maintaining data integrity and performance.

Next Steps¶

Explore the FastAPI integration guide for advanced API patterns
Check out validation techniques for robust data handling
Learn about streaming responses for real-time applications

Image of hero record in the database

Example of AI-generated hero data stored in SQLite database