langchain-structured-output

Get structured, validated output from LangChain agents and models using Pydantic schemas, type-safe responses, and automatic validation

christian-bromann 3 1 Updated 5mo ago

GitHub

Install

npx skillscat add christian-bromann/langchain-skills/langchain-structured-output

Install via the SkillsCat registry.

SKILL.md

langchain-structured-output (Python)

Overview

Structured output transforms unstructured model responses into validated, typed data. Instead of parsing free text, you get Python objects conforming to your schema - perfect for extracting data, building forms, or integrating with downstream systems.

Key Concepts:

response_format: Define expected output schema
Pydantic Validation: Type-safe schemas with automatic validation
with_structured_output(): Model method for direct structured output
Tool Strategy: Uses tool calling under the hood for models without native support

Decision Tables

When to Use Structured Output

Use Case	Use Structured Output?	Why
Extract contact info, dates, etc.	✅ Yes	Reliable data extraction
Form filling	✅ Yes	Validate all required fields
API integration	✅ Yes	Type-safe responses
Classification tasks	✅ Yes	Enum validation
Open-ended Q&A	❌ No	Free-form text is fine
Creative writing	❌ No	Don't constrain creativity

Schema Options

Schema Type	When to Use	Example
Pydantic model	Python projects (recommended)	`class Model(BaseModel):`
TypedDict	Simpler typing	`class Data(TypedDict):`
JSON Schema	Interoperability	`{"type": "object", ...}`
Union types	Multiple possible formats	`Union[Schema1, Schema2]`

Code Examples

Basic Structured Output with Agent

from langchain.agents import create_agent
from pydantic import BaseModel, Field

class ContactInfo(BaseModel):
    name: str
    email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
    phone: str

agent = create_agent(
    model="gpt-4.1",
    response_format=ContactInfo,
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Extract: John Doe, john@example.com, (555) 123-4567"
    }]
})

print(result["structured_response"])
# ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')

Model Direct Structured Output

from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field

class Movie(BaseModel):
    """Movie information."""
    title: str = Field(description="Movie title")
    year: int = Field(description="Release year")
    director: str
    rating: float = Field(ge=0, le=10)

model = ChatOpenAI(model="gpt-4.1")
structured_model = model.with_structured_output(Movie)

response = structured_model.invoke("Tell me about Inception")
print(response)
# Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)

Complex Nested Schema

from pydantic import BaseModel, Field
from typing import List

class Address(BaseModel):
    street: str
    city: str
    state: str
    zip: str

class Person(BaseModel):
    name: str
    age: int = Field(gt=0)
    email: str
    address: Address
    tags: List[str] = Field(default_factory=list)

agent = create_agent(
    model="gpt-4.1",
    response_format=Person,
)

Enum and Literal Types

from pydantic import BaseModel, Field
from typing import Literal

class Classification(BaseModel):
    category: Literal["urgent", "normal", "low"]
    sentiment: Literal["positive", "neutral", "negative"]
    confidence: float = Field(ge=0, le=1)

agent = create_agent(
    model="gpt-4.1",
    response_format=Classification,
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Classify: This is extremely important and I'm very happy!"
    }]
})
# Classification(category="urgent", sentiment="positive", confidence=0.95)

Optional Fields and Defaults

from pydantic import BaseModel, Field
from typing import Optional, List

class Event(BaseModel):
    title: str
    date: str
    location: Optional[str] = None
    attendees: List[str] = Field(default_factory=list)
    confirmed: bool = False

Union Types (Multiple Schemas)

from pydantic import BaseModel
from typing import Union, Literal

class EmailContact(BaseModel):
    type: Literal["email"]
    to: str
    subject: str

class PhoneContact(BaseModel):
    type: Literal["phone"]
    number: str
    message: str

ContactMethod = Union[EmailContact, PhoneContact]

agent = create_agent(
    model="gpt-4.1",
    response_format=ContactMethod,
)
# Model chooses which schema based on input

Array Extraction

from pydantic import BaseModel
from typing import List, Optional, Literal

class Task(BaseModel):
    title: str
    priority: Literal["high", "medium", "low"]
    due_date: Optional[str] = None

class TaskList(BaseModel):
    tasks: List[Task]

agent = create_agent(
    model="gpt-4.1",
    response_format=TaskList,
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "Extract tasks: 1. Fix bug (high priority, due tomorrow) 2. Update docs"
    }]
})

Include Raw AIMessage

from langchain_openai import ChatOpenAI
from pydantic import BaseModel

class Person(BaseModel):
    name: str
    age: int

model = ChatOpenAI(model="gpt-4.1")
structured_model = model.with_structured_output(Person, include_raw=True)

response = structured_model.invoke("Person: Alice, 30 years old")
print(response)
# {
#   "raw": AIMessage(...),
#   "parsed": Person(name="Alice", age=30)
# }

TypedDict Alternative

from typing_extensions import TypedDict, Annotated
from langchain.agents import create_agent

class ContactDict(TypedDict):
    """Contact information."""
    name: Annotated[str, ..., "Person's full name"]
    email: Annotated[str, ..., "Email address"]
    phone: Annotated[str, ..., "Phone number"]

agent = create_agent(
    model="gpt-4.1",
    response_format=ContactDict,
)

result = agent.invoke({"messages": [{"role": "user", "content": "..."}]})
# Returns dict, not Pydantic model
print(type(result["structured_response"]))  # <class 'dict'>

Error Handling

from langchain.agents import create_agent
from pydantic import BaseModel, Field, ValidationError

class StrictSchema(BaseModel):
    email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
    age: int = Field(ge=0, le=120)

agent = create_agent(
    model="gpt-4.1",
    response_format=StrictSchema,
)

try:
    result = agent.invoke({
        "messages": [{"role": "user", "content": "Email: invalid, Age: -5"}]
    })
except ValidationError as e:
    print(f"Validation failed: {e}")

Boundaries

What You CAN Configure

✅ Schema structure: Any valid Pydantic model
✅ Field validation: Types, ranges, regex, etc.
✅ Optional vs required: Control field presence
✅ Nested objects: Complex hierarchies
✅ Arrays: Lists of items
✅ Enums: Restricted values with Literal

What You CANNOT Configure

❌ Model reasoning: Can't control how model generates data
❌ Guarantee 100% accuracy: Model may still make mistakes
❌ Force valid data if context lacks it: Model can't invent missing info

Gotchas

1. Accessing Response Wrong

# ❌ Problem: Accessing wrong key
result = agent.invoke(input)
print(result["response"])  # KeyError!

# ✅ Solution: Use structured_response
print(result["structured_response"])

2. Missing Descriptions

# ❌ Problem: No field descriptions
class Data(BaseModel):
    date: str  # What format?
    amount: float  # What unit?

# ✅ Solution: Add descriptions via Field
class Data(BaseModel):
    date: str = Field(description="Date in YYYY-MM-DD format")
    amount: float = Field(description="Amount in USD")

3. Over-constraining

import re

# ❌ Problem: Too strict for model
class Data(BaseModel):
    code: str = Field(pattern=r"^[A-Z]{2}-\d{4}-[A-Z]{3}$")  # Very specific!

# ✅ Solution: Use looser validation or describe format
class Data(BaseModel):
    code: str = Field(description="Format: XX-0000-XXX (letters and numbers)")

4. Pydantic v1 vs v2

# Pydantic v2 (current)
from pydantic import BaseModel, Field

class Data(BaseModel):
    value: int = Field(ge=0, le=100)

# Pydantic v1 (legacy)
from pydantic import BaseModel, Field

class Data(BaseModel):
    value: int = Field(..., ge=0, le=100)  # Note the ...
    
    class Config:
        # v1 config
        pass

5. Not Using Correct Type Hints

# ❌ Problem: Missing type hints
class Data(BaseModel):
    items = []  # No type hint!

# ✅ Solution: Always use type hints
from typing import List

class Data(BaseModel):
    items: List[str] = Field(default_factory=list)

langchain-structured-output

Install

langchain-structured-output (Python)

Overview

Decision Tables

When to Use Structured Output

Schema Options

Code Examples

Basic Structured Output with Agent

Model Direct Structured Output

Complex Nested Schema

Enum and Literal Types

Optional Fields and Defaults

Union Types (Multiple Schemas)

Array Extraction

Include Raw AIMessage

TypedDict Alternative

Error Handling

Boundaries

What You CAN Configure

What You CANNOT Configure

Gotchas

1. Accessing Response Wrong

2. Missing Descriptions

3. Over-constraining

4. Pydantic v1 vs v2

5. Not Using Correct Type Hints

Links to Documentation

Categories

Install

langchain-structured-output

Install

langchain-structured-output (Python)

Overview

Decision Tables

When to Use Structured Output

Schema Options

Code Examples

Basic Structured Output with Agent

Model Direct Structured Output

Complex Nested Schema

Enum and Literal Types

Optional Fields and Defaults

Union Types (Multiple Schemas)

Array Extraction

Include Raw AIMessage

TypedDict Alternative

Error Handling

Boundaries

What You CAN Configure

What You CANNOT Configure

Gotchas

1. Accessing Response Wrong

2. Missing Descriptions

3. Over-constraining

4. Pydantic v1 vs v2

5. Not Using Correct Type Hints

Links to Documentation

Categories

Install

Recommended Skills