Build Retrieval Augmented Generation (RAG) systems with LangChain - includes embeddings, vector stores, retrievers, document loaders, and text splitting
Install
npx skillscat add christian-bromann/langchain-skills/skills-langchain-rag-python Install via the SkillsCat registry.
langchain-rag (Python)
Overview
Retrieval Augmented Generation (RAG) enhances LLM responses by fetching relevant context from external knowledge sources. Instead of relying solely on training data, RAG systems retrieve documents at query time and use them to ground responses.
Key Concepts:
- Document Loaders: Ingest data from files, web, databases
- Text Splitters: Break documents into chunks
- Embeddings: Convert text to vectors
- Vector Stores: Store and search embeddings
- Retrievers: Fetch relevant documents for queries
RAG Pipeline
- Index: Load → Split → Embed → Store
- Retrieve: Query → Embed → Search → Return docs
- Generate: Docs + Query → LLM → Response
Code Examples
Basic RAG Setup
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import InMemoryVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema import Document
# 1. Load documents (example: in-memory text)
docs = [
Document(page_content="LangChain is a framework for building LLM applications.", metadata={}),
Document(page_content="RAG stands for Retrieval Augmented Generation.", metadata={}),
]
# 2. Split documents
splitter = RecursiveCharacterTextSplitter(
chunk_size=500,
chunk_overlap=50,
)
splits = splitter.split_documents(docs)
# 3. Create embeddings and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = InMemoryVectorStore.from_documents(splits, embeddings)
# 4. Create retriever
retriever = vectorstore.as_retriever(k=4) # Top 4 results
# 5. Use in RAG
model = ChatOpenAI(model="gpt-4.1")
query = "What is RAG?"
relevant_docs = retriever.invoke(query)
context = "\n\n".join([doc.page_content for doc in relevant_docs])
response = model.invoke([
{"role": "system", "content": f"Use the following context to answer questions:\n\n{context}"},
{"role": "user", "content": query},
])
print(response.content)Loading Web Pages
from langchain_community.document_loaders import WebBaseLoader
loader = WebBaseLoader("https://docs.langchain.com/oss/python/langchain/agents")
docs = loader.load()
print(f"Loaded {len(docs)} documents")Loading PDF Files
from langchain_community.document_loaders import PyPDFLoader
loader = PyPDFLoader("./document.pdf")
docs = loader.load()Advanced Text Splitting
from langchain_text_splitters import RecursiveCharacterTextSplitter
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000, # Characters per chunk
chunk_overlap=200, # Overlap for context continuity
separators=["\n\n", "\n", " ", ""], # Split hierarchy
)
splits = splitter.split_documents(docs)Using Chroma (Persistent)
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Create and populate
vectorstore = Chroma.from_documents(
documents=splits,
embedding=embeddings,
collection_name="my-docs",
persist_directory="./chroma_db",
)
# Later: Load existing
vectorstore2 = Chroma(
collection_name="my-docs",
embedding_function=embeddings,
persist_directory="./chroma_db",
)Advanced Retrieval
# Similarity search with scores
results = vectorstore.similarity_search_with_score(query, k=5)
for doc, score in results:
print(f"Score: {score}, Content: {doc.page_content}")
# MMR (Maximum Marginal Relevance) for diversity
retriever = vectorstore.as_retriever(
search_type="mmr",
search_kwargs={"fetch_k": 20, "lambda_mult": 0.5, "k": 5},
)Metadata Filtering
# Add metadata when creating documents
from langchain.schema import Document
docs = [
Document(
page_content="Python programming guide",
metadata={"language": "python", "topic": "programming"}
),
Document(
page_content="JavaScript tutorial",
metadata={"language": "javascript", "topic": "programming"}
),
]
# Search with filter
results = vectorstore.similarity_search(
"programming",
k=5,
filter={"language": "python"} # Only Python docs
)RAG with Agent
from langchain.agents import create_agent
from langchain.tools import tool
@tool
def search_docs(query: str) -> str:
"""Search documentation for relevant information."""
docs = retriever.invoke(query)
return "\n\n".join([d.page_content for d in docs])
agent = create_agent(
model="gpt-4.1",
tools=[search_docs],
)
result = agent.invoke({
"messages": [{"role": "user", "content": "How do I create an agent?"}]
})Using Faiss for Performance
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings()
# Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)
# Save to disk
vectorstore.save_local("faiss_index")
# Load from disk
vectorstore2 = FAISS.load_local("faiss_index", embeddings)Customizing Embeddings
from langchain_openai import OpenAIEmbeddings
# Different embedding models
small_embeddings = OpenAIEmbeddings(model="text-embedding-3-small") # 1536 dim
large_embeddings = OpenAIEmbeddings(model="text-embedding-3-large") # 3072 dim
# Custom dimensions (for 3rd gen models)
custom_embeddings = OpenAIEmbeddings(
model="text-embedding-3-large",
dimensions=1024 # Reduce from 3072 to save space
)Boundaries
What You CAN Configure
✅ Chunk size/overlap: Control document splitting
✅ Embedding model: Choose quality vs cost
✅ Number of results: Top-k retrieval
✅ Metadata filters: Filter by document properties
✅ Search algorithms: Similarity, MMR, hybrid
What You CANNOT Configure
❌ Embedding dimensions (per model): Fixed by model
❌ Perfect retrieval: Semantic search has limits
❌ Real-time document updates: Re-indexing needed
Gotchas
1. Forgetting to Split Documents
# ❌ Problem: Entire documents are too large
vectorstore.add_documents(large_docs) # May hit token limits
# ✅ Solution: Always split first
splits = splitter.split_documents(large_docs)
vectorstore.add_documents(splits)2. Chunk Size Too Small/Large
# ❌ Problem: Too small - loses context
splitter = RecursiveCharacterTextSplitter(chunk_size=50)
# ❌ Problem: Too large - hits limits
splitter = RecursiveCharacterTextSplitter(chunk_size=10000)
# ✅ Solution: Balance (500-1500 typically good)
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200,
)3. No Overlap
# ❌ Problem: No overlap - context breaks at boundaries
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=0, # Bad!
)
# ✅ Solution: Use overlap (10-20% of chunk size)
splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200, # 20%
)4. Not Persisting Vector Store
# ❌ Problem: Using InMemoryVectorStore in production
vectorstore = InMemoryVectorStore.from_documents(docs, embeddings)
# Lost on restart!
# ✅ Solution: Use persistent store
vectorstore = Chroma.from_documents(
documents=docs,
embedding=embeddings,
collection_name="prod-docs",
persist_directory="./chroma_db",
)5. Mixing Embedding Models
# ❌ Problem: Different embeddings for index and query
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings(model="text-embedding-3-small"))
# Later with different model
retriever = vectorstore.as_retriever(embeddings=OpenAIEmbeddings(model="text-embedding-3-large")) # Incompatible!
# ✅ Solution: Use same embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever() # Uses same embeddings