Get structured, validated output from LangChain agents and models using Pydantic schemas, type-safe responses, and automatic validation
Install
npx skillscat add christian-bromann/langchain-skills/langchain-structured-output Install via the SkillsCat registry.
langchain-structured-output (Python)
Overview
Structured output transforms unstructured model responses into validated, typed data. Instead of parsing free text, you get Python objects conforming to your schema - perfect for extracting data, building forms, or integrating with downstream systems.
Key Concepts:
- response_format: Define expected output schema
- Pydantic Validation: Type-safe schemas with automatic validation
- with_structured_output(): Model method for direct structured output
- Tool Strategy: Uses tool calling under the hood for models without native support
Decision Tables
When to Use Structured Output
| Use Case | Use Structured Output? | Why |
|---|---|---|
| Extract contact info, dates, etc. | ✅ Yes | Reliable data extraction |
| Form filling | ✅ Yes | Validate all required fields |
| API integration | ✅ Yes | Type-safe responses |
| Classification tasks | ✅ Yes | Enum validation |
| Open-ended Q&A | ❌ No | Free-form text is fine |
| Creative writing | ❌ No | Don't constrain creativity |
Schema Options
| Schema Type | When to Use | Example |
|---|---|---|
| Pydantic model | Python projects (recommended) | class Model(BaseModel): |
| TypedDict | Simpler typing | class Data(TypedDict): |
| JSON Schema | Interoperability | {"type": "object", ...} |
| Union types | Multiple possible formats | Union[Schema1, Schema2] |
Code Examples
Basic Structured Output with Agent
from langchain.agents import create_agent
from pydantic import BaseModel, Field
class ContactInfo(BaseModel):
name: str
email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
phone: str
agent = create_agent(
model="gpt-4.1",
response_format=ContactInfo,
)
result = agent.invoke({
"messages": [{
"role": "user",
"content": "Extract: John Doe, john@example.com, (555) 123-4567"
}]
})
print(result["structured_response"])
# ContactInfo(name='John Doe', email='john@example.com', phone='(555) 123-4567')Model Direct Structured Output
from langchain_openai import ChatOpenAI
from pydantic import BaseModel, Field
class Movie(BaseModel):
"""Movie information."""
title: str = Field(description="Movie title")
year: int = Field(description="Release year")
director: str
rating: float = Field(ge=0, le=10)
model = ChatOpenAI(model="gpt-4.1")
structured_model = model.with_structured_output(Movie)
response = structured_model.invoke("Tell me about Inception")
print(response)
# Movie(title="Inception", year=2010, director="Christopher Nolan", rating=8.8)Complex Nested Schema
from pydantic import BaseModel, Field
from typing import List
class Address(BaseModel):
street: str
city: str
state: str
zip: str
class Person(BaseModel):
name: str
age: int = Field(gt=0)
email: str
address: Address
tags: List[str] = Field(default_factory=list)
agent = create_agent(
model="gpt-4.1",
response_format=Person,
)Enum and Literal Types
from pydantic import BaseModel, Field
from typing import Literal
class Classification(BaseModel):
category: Literal["urgent", "normal", "low"]
sentiment: Literal["positive", "neutral", "negative"]
confidence: float = Field(ge=0, le=1)
agent = create_agent(
model="gpt-4.1",
response_format=Classification,
)
result = agent.invoke({
"messages": [{
"role": "user",
"content": "Classify: This is extremely important and I'm very happy!"
}]
})
# Classification(category="urgent", sentiment="positive", confidence=0.95)Optional Fields and Defaults
from pydantic import BaseModel, Field
from typing import Optional, List
class Event(BaseModel):
title: str
date: str
location: Optional[str] = None
attendees: List[str] = Field(default_factory=list)
confirmed: bool = FalseUnion Types (Multiple Schemas)
from pydantic import BaseModel
from typing import Union, Literal
class EmailContact(BaseModel):
type: Literal["email"]
to: str
subject: str
class PhoneContact(BaseModel):
type: Literal["phone"]
number: str
message: str
ContactMethod = Union[EmailContact, PhoneContact]
agent = create_agent(
model="gpt-4.1",
response_format=ContactMethod,
)
# Model chooses which schema based on inputArray Extraction
from pydantic import BaseModel
from typing import List, Optional, Literal
class Task(BaseModel):
title: str
priority: Literal["high", "medium", "low"]
due_date: Optional[str] = None
class TaskList(BaseModel):
tasks: List[Task]
agent = create_agent(
model="gpt-4.1",
response_format=TaskList,
)
result = agent.invoke({
"messages": [{
"role": "user",
"content": "Extract tasks: 1. Fix bug (high priority, due tomorrow) 2. Update docs"
}]
})Include Raw AIMessage
from langchain_openai import ChatOpenAI
from pydantic import BaseModel
class Person(BaseModel):
name: str
age: int
model = ChatOpenAI(model="gpt-4.1")
structured_model = model.with_structured_output(Person, include_raw=True)
response = structured_model.invoke("Person: Alice, 30 years old")
print(response)
# {
# "raw": AIMessage(...),
# "parsed": Person(name="Alice", age=30)
# }TypedDict Alternative
from typing_extensions import TypedDict, Annotated
from langchain.agents import create_agent
class ContactDict(TypedDict):
"""Contact information."""
name: Annotated[str, ..., "Person's full name"]
email: Annotated[str, ..., "Email address"]
phone: Annotated[str, ..., "Phone number"]
agent = create_agent(
model="gpt-4.1",
response_format=ContactDict,
)
result = agent.invoke({"messages": [{"role": "user", "content": "..."}]})
# Returns dict, not Pydantic model
print(type(result["structured_response"])) # <class 'dict'>Error Handling
from langchain.agents import create_agent
from pydantic import BaseModel, Field, ValidationError
class StrictSchema(BaseModel):
email: str = Field(pattern=r"^[^@]+@[^@]+\.[^@]+$")
age: int = Field(ge=0, le=120)
agent = create_agent(
model="gpt-4.1",
response_format=StrictSchema,
)
try:
result = agent.invoke({
"messages": [{"role": "user", "content": "Email: invalid, Age: -5"}]
})
except ValidationError as e:
print(f"Validation failed: {e}")Boundaries
What You CAN Configure
✅ Schema structure: Any valid Pydantic model
✅ Field validation: Types, ranges, regex, etc.
✅ Optional vs required: Control field presence
✅ Nested objects: Complex hierarchies
✅ Arrays: Lists of items
✅ Enums: Restricted values with Literal
What You CANNOT Configure
❌ Model reasoning: Can't control how model generates data
❌ Guarantee 100% accuracy: Model may still make mistakes
❌ Force valid data if context lacks it: Model can't invent missing info
Gotchas
1. Accessing Response Wrong
# ❌ Problem: Accessing wrong key
result = agent.invoke(input)
print(result["response"]) # KeyError!
# ✅ Solution: Use structured_response
print(result["structured_response"])2. Missing Descriptions
# ❌ Problem: No field descriptions
class Data(BaseModel):
date: str # What format?
amount: float # What unit?
# ✅ Solution: Add descriptions via Field
class Data(BaseModel):
date: str = Field(description="Date in YYYY-MM-DD format")
amount: float = Field(description="Amount in USD")3. Over-constraining
import re
# ❌ Problem: Too strict for model
class Data(BaseModel):
code: str = Field(pattern=r"^[A-Z]{2}-\d{4}-[A-Z]{3}$") # Very specific!
# ✅ Solution: Use looser validation or describe format
class Data(BaseModel):
code: str = Field(description="Format: XX-0000-XXX (letters and numbers)")4. Pydantic v1 vs v2
# Pydantic v2 (current)
from pydantic import BaseModel, Field
class Data(BaseModel):
value: int = Field(ge=0, le=100)
# Pydantic v1 (legacy)
from pydantic import BaseModel, Field
class Data(BaseModel):
value: int = Field(..., ge=0, le=100) # Note the ...
class Config:
# v1 config
pass5. Not Using Correct Type Hints
# ❌ Problem: Missing type hints
class Data(BaseModel):
items = [] # No type hint!
# ✅ Solution: Always use type hints
from typing import List
class Data(BaseModel):
items: List[str] = Field(default_factory=list)