christian-bromann

langchain-rag

Build Retrieval Augmented Generation (RAG) systems with LangChain - includes embeddings, vector stores, retrievers, document loaders, and text splitting

christian-bromann 3 1 Updated 3mo ago
GitHub

Install

npx skillscat add christian-bromann/langchain-skills/skills-langchain-rag-python

Install via the SkillsCat registry.

SKILL.md

langchain-rag (Python)

Overview

Retrieval Augmented Generation (RAG) enhances LLM responses by fetching relevant context from external knowledge sources. Instead of relying solely on training data, RAG systems retrieve documents at query time and use them to ground responses.

Key Concepts:

  • Document Loaders: Ingest data from files, web, databases
  • Text Splitters: Break documents into chunks
  • Embeddings: Convert text to vectors
  • Vector Stores: Store and search embeddings
  • Retrievers: Fetch relevant documents for queries

RAG Pipeline

  1. Index: Load → Split → Embed → Store
  2. Retrieve: Query → Embed → Search → Return docs
  3. Generate: Docs + Query → LLM → Response

Code Examples

Basic RAG Setup

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import InMemoryVectorStore
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain.schema import Document

# 1. Load documents (example: in-memory text)
docs = [
    Document(page_content="LangChain is a framework for building LLM applications.", metadata={}),
    Document(page_content="RAG stands for Retrieval Augmented Generation.", metadata={}),
]

# 2. Split documents
splitter = RecursiveCharacterTextSplitter(
    chunk_size=500,
    chunk_overlap=50,
)
splits = splitter.split_documents(docs)

# 3. Create embeddings and store
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = InMemoryVectorStore.from_documents(splits, embeddings)

# 4. Create retriever
retriever = vectorstore.as_retriever(k=4)  # Top 4 results

# 5. Use in RAG
model = ChatOpenAI(model="gpt-4.1")

query = "What is RAG?"
relevant_docs = retriever.invoke(query)

context = "\n\n".join([doc.page_content for doc in relevant_docs])
response = model.invoke([
    {"role": "system", "content": f"Use the following context to answer questions:\n\n{context}"},
    {"role": "user", "content": query},
])

print(response.content)

Loading Web Pages

from langchain_community.document_loaders import WebBaseLoader

loader = WebBaseLoader("https://docs.langchain.com/oss/python/langchain/agents")
docs = loader.load()
print(f"Loaded {len(docs)} documents")

Loading PDF Files

from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./document.pdf")
docs = loader.load()

Advanced Text Splitting

from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,        # Characters per chunk
    chunk_overlap=200,      # Overlap for context continuity
    separators=["\n\n", "\n", " ", ""],  # Split hierarchy
)

splits = splitter.split_documents(docs)

Using Chroma (Persistent)

from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Create and populate
vectorstore = Chroma.from_documents(
    documents=splits,
    embedding=embeddings,
    collection_name="my-docs",
    persist_directory="./chroma_db",
)

# Later: Load existing
vectorstore2 = Chroma(
    collection_name="my-docs",
    embedding_function=embeddings,
    persist_directory="./chroma_db",
)

Advanced Retrieval

# Similarity search with scores
results = vectorstore.similarity_search_with_score(query, k=5)
for doc, score in results:
    print(f"Score: {score}, Content: {doc.page_content}")

# MMR (Maximum Marginal Relevance) for diversity
retriever = vectorstore.as_retriever(
    search_type="mmr",
    search_kwargs={"fetch_k": 20, "lambda_mult": 0.5, "k": 5},
)

Metadata Filtering

# Add metadata when creating documents
from langchain.schema import Document

docs = [
    Document(
        page_content="Python programming guide",
        metadata={"language": "python", "topic": "programming"}
    ),
    Document(
        page_content="JavaScript tutorial",
        metadata={"language": "javascript", "topic": "programming"}
    ),
]

# Search with filter
results = vectorstore.similarity_search(
    "programming",
    k=5,
    filter={"language": "python"}  # Only Python docs
)

RAG with Agent

from langchain.agents import create_agent
from langchain.tools import tool

@tool
def search_docs(query: str) -> str:
    """Search documentation for relevant information."""
    docs = retriever.invoke(query)
    return "\n\n".join([d.page_content for d in docs])

agent = create_agent(
    model="gpt-4.1",
    tools=[search_docs],
)

result = agent.invoke({
    "messages": [{"role": "user", "content": "How do I create an agent?"}]
})

Using Faiss for Performance

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()

# Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)

# Save to disk
vectorstore.save_local("faiss_index")

# Load from disk
vectorstore2 = FAISS.load_local("faiss_index", embeddings)

Customizing Embeddings

from langchain_openai import OpenAIEmbeddings

# Different embedding models
small_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")  # 1536 dim
large_embeddings = OpenAIEmbeddings(model="text-embedding-3-large")  # 3072 dim

# Custom dimensions (for 3rd gen models)
custom_embeddings = OpenAIEmbeddings(
    model="text-embedding-3-large",
    dimensions=1024  # Reduce from 3072 to save space
)

Boundaries

What You CAN Configure

Chunk size/overlap: Control document splitting
Embedding model: Choose quality vs cost
Number of results: Top-k retrieval
Metadata filters: Filter by document properties
Search algorithms: Similarity, MMR, hybrid

What You CANNOT Configure

Embedding dimensions (per model): Fixed by model
Perfect retrieval: Semantic search has limits
Real-time document updates: Re-indexing needed

Gotchas

1. Forgetting to Split Documents

# ❌ Problem: Entire documents are too large
vectorstore.add_documents(large_docs)  # May hit token limits

# ✅ Solution: Always split first
splits = splitter.split_documents(large_docs)
vectorstore.add_documents(splits)

2. Chunk Size Too Small/Large

# ❌ Problem: Too small - loses context
splitter = RecursiveCharacterTextSplitter(chunk_size=50)

# ❌ Problem: Too large - hits limits
splitter = RecursiveCharacterTextSplitter(chunk_size=10000)

# ✅ Solution: Balance (500-1500 typically good)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

3. No Overlap

# ❌ Problem: No overlap - context breaks at boundaries
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=0,  # Bad!
)

# ✅ Solution: Use overlap (10-20% of chunk size)
splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,  # 20%
)

4. Not Persisting Vector Store

# ❌ Problem: Using InMemoryVectorStore in production
vectorstore = InMemoryVectorStore.from_documents(docs, embeddings)
# Lost on restart!

# ✅ Solution: Use persistent store
vectorstore = Chroma.from_documents(
    documents=docs,
    embedding=embeddings,
    collection_name="prod-docs",
    persist_directory="./chroma_db",
)

5. Mixing Embedding Models

# ❌ Problem: Different embeddings for index and query
vectorstore = Chroma.from_documents(docs, OpenAIEmbeddings(model="text-embedding-3-small"))

# Later with different model
retriever = vectorstore.as_retriever(embeddings=OpenAIEmbeddings(model="text-embedding-3-large"))  # Incompatible!

# ✅ Solution: Use same embedding model
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_documents(docs, embeddings)
retriever = vectorstore.as_retriever()  # Uses same embeddings

Links to Documentation