christian-bromann

langchain-embeddings

Guide to using embedding model integrations in LangChain including OpenAI, Azure, and local embeddings

christian-bromann 3 1 Updated 3mo ago
GitHub

Install

npx skillscat add christian-bromann/langchain-skills/skills-langchain-integrations-embeddings-python

Install via the SkillsCat registry.

SKILL.md

langchain-embeddings (Python)

Overview

Embedding models convert text into numerical vector representations that capture semantic meaning. These vectors enable semantic search, similarity comparison, and are essential for building RAG (Retrieval-Augmented Generation) systems with vector databases.

Key Concepts

  • Embeddings: Dense vector representations of text that encode semantic meaning
  • Vector Dimensions: Different models produce vectors of different sizes (e.g., 1536 for OpenAI, 768 for some open-source models)
  • Similarity Search: Finding similar texts by comparing vector distances (cosine similarity, euclidean distance)
  • Batch Processing: Efficiently embedding multiple texts at once
  • Use Cases: Semantic search, document retrieval, clustering, recommendation systems

Provider Selection Decision Table

Provider Best For Model Examples Dimensions Package Key Features
OpenAI General purpose, high quality text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002 1536, 3072 langchain-openai High quality, reliable, flexible dimensions
Azure OpenAI Enterprise, compliance text-embedding-ada-002 (Azure) 1536 langchain-openai Enterprise SLAs, data residency
Cohere Multilingual, search optimization embed-english-v3.0, embed-multilingual-v3.0 1024 langchain-cohere Search/clustering modes, multilingual
HuggingFace Open source, customizable all-MiniLM-L6-v2, BGE models Varies langchain-huggingface Free, local inference, many models
Google GCP integration textembedding-gecko 768 langchain-google-genai GCP ecosystem, multimodal
Ollama Local, privacy llama2, mistral, nomic-embed-text Varies langchain-ollama Fully local, no API costs, privacy

When to Choose Each Provider

Choose OpenAI if:

  • You need high-quality embeddings for production
  • You want reliable, fast API-based embeddings
  • Cost is reasonable for your use case (~$0.13 per 1M tokens)

Choose Azure OpenAI if:

  • You need enterprise support and SLAs
  • Data compliance/residency is critical
  • You're already using Azure infrastructure

Choose Cohere if:

  • You need multilingual embeddings
  • You want optimized embeddings for search vs. clustering
  • You need competitive pricing

Choose HuggingFace if:

  • You want to use open-source models
  • You need specific model characteristics
  • You want to run inference locally or on your own infrastructure

Choose Ollama if:

  • Privacy is paramount (fully local)
  • You want zero API costs after setup
  • You have sufficient local compute resources

Code Examples

OpenAI Embeddings

from langchain_openai import OpenAIEmbeddings
import os

# Basic initialization
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY"),  # Optional if set in env
)

# Embed a single query
query_embedding = embeddings.embed_query(
    "What is the capital of France?"
)
print(f"Vector dimensions: {len(query_embedding)}")
print(f"First few values: {query_embedding[:5]}")

# Embed multiple documents
documents = [
    "Paris is the capital of France.",
    "London is the capital of England.",
    "Berlin is the capital of Germany.",
]
doc_embeddings = embeddings.embed_documents(documents)
print(f"Embedded {len(doc_embeddings)} documents")

# Using newer models with custom dimensions
small_embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=512,  # Reduce from default 1536 for efficiency
)

Azure OpenAI Embeddings

from langchain_openai import AzureOpenAIEmbeddings
import os

embeddings = AzureOpenAIEmbeddings(
    azure_endpoint=os.getenv("AZURE_OPENAI_ENDPOINT"),
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_deployment="text-embedding-ada-002",
    api_version="2024-02-01",
)

embedding = embeddings.embed_query("Hello world")
print(f"Embedding length: {len(embedding)}")

HuggingFace Embeddings (Local)

from langchain_huggingface import HuggingFaceEmbeddings

# Run embeddings locally with sentence-transformers
embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2",  # Default model
    model_kwargs={"device": "cpu"},  # or "cuda" for GPU
    encode_kwargs={"normalize_embeddings": True},
)

embedding = embeddings.embed_query("This runs locally!")
print(f"Embedding dimensions: {len(embedding)}")

# Use a different model
embeddings = HuggingFaceEmbeddings(
    model_name="BAAI/bge-small-en-v1.5",
)

Ollama Embeddings (Local)

from langchain_ollama import OllamaEmbeddings

# Requires Ollama running locally: ollama pull nomic-embed-text
embeddings = OllamaEmbeddings(
    model="nomic-embed-text",
    base_url="http://localhost:11434",  # Default Ollama URL
)

embedding = embeddings.embed_query("Fully local embeddings")

Cohere Embeddings

from langchain_cohere import CohereEmbeddings
import os

embeddings = CohereEmbeddings(
    cohere_api_key=os.getenv("COHERE_API_KEY"),
    model="embed-english-v3.0",
)

query_embedding = embeddings.embed_query("Search query")
doc_embeddings = embeddings.embed_documents(["doc1", "doc2"])

Computing Similarity

from langchain_openai import OpenAIEmbeddings
import numpy as np

embeddings = OpenAIEmbeddings()

# Embed query and documents
query = "What is machine learning?"
docs = [
    "Machine learning is a branch of AI",
    "Paris is the capital of France",
    "Neural networks are used in deep learning",
]

query_vec = embeddings.embed_query(query)
doc_vecs = embeddings.embed_documents(docs)

# Compute cosine similarity
def cosine_similarity(vec_a, vec_b):
    """Calculate cosine similarity between two vectors."""
    return np.dot(vec_a, vec_b) / (np.linalg.norm(vec_a) * np.linalg.norm(vec_b))

# Find most similar document
similarities = [cosine_similarity(query_vec, doc_vec) for doc_vec in doc_vecs]
print("Similarities:", similarities)
most_similar_idx = np.argmax(similarities)
print("Most similar doc:", docs[most_similar_idx])

Batch Processing for Efficiency

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    chunk_size=512,  # Process in batches
)

# Efficiently embed large document sets
large_doc_set = [f"Document {i}: Some content here" for i in range(1000)]

doc_embeddings = embeddings.embed_documents(large_doc_set)
print(f"Embedded {len(doc_embeddings)} documents in batches")

Using with Vector Stores

from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import FAISS

embeddings = OpenAIEmbeddings()

texts = [
    "LangChain is a framework for developing applications powered by LLMs",
    "Vector stores enable semantic search capabilities",
    "Embeddings convert text into numerical vectors",
]

# Create vector store with embeddings
vectorstore = FAISS.from_texts(texts, embeddings)

# Perform similarity search
query = "What is LangChain?"
docs = vectorstore.similarity_search(query, k=2)
for doc in docs:
    print(doc.page_content)

Boundaries

What Agents CAN Do

Initialize embedding models

  • Set up OpenAI, Azure, Cohere, HuggingFace, or Ollama embeddings
  • Configure API keys and model parameters

Embed text content

  • Embed single queries with embed_query()
  • Embed multiple documents with embed_documents()
  • Process large batches efficiently

Use embeddings with vector stores

  • Pass embeddings to vector store constructors
  • Enable semantic search capabilities

Choose appropriate models

  • Select based on quality, cost, latency requirements
  • Use local models for privacy concerns

Optimize for use case

  • Adjust batch sizes for efficiency
  • Use smaller dimensions to reduce costs/storage

What Agents CANNOT Do

Modify embedding dimensions arbitrarily

  • Cannot change dimensions beyond what the model supports
  • text-embedding-3-* models support custom dimensions, older models don't

Mix embeddings from different models

  • Cannot compare embeddings from different models directly
  • Must use same model for all embeddings in a similarity search

Exceed API rate limits

  • Cannot bypass provider rate limits
  • Must implement rate limiting for large-scale operations

Generate embeddings without proper authentication

  • Cannot use cloud providers without valid API keys
  • Cannot access models without proper credentials

Gotchas

1. Model Consistency is Critical

# ❌ BAD: Using different models
embeddings1 = OpenAIEmbeddings(model="text-embedding-3-small")
embeddings2 = OpenAIEmbeddings(model="text-embedding-ada-002")

query_vec = embeddings1.embed_query("query")
doc_vec = embeddings2.embed_query("document")
# Similarity comparison will be meaningless!

# ✅ GOOD: Use same model for everything
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
query_vec = embeddings.embed_query("query")
doc_vec = embeddings.embed_query("document")
# Now similarity makes sense

Fix: Always use the same embedding model for all texts you want to compare.

2. Import from Correct Packages

# ❌ OLD: Using deprecated community imports
from langchain.embeddings import OpenAIEmbeddings  # Deprecated!

# ✅ NEW: Use provider-specific packages
from langchain_openai import OpenAIEmbeddings
from langchain_cohere import CohereEmbeddings
from langchain_huggingface import HuggingFaceEmbeddings

Fix: Use provider-specific packages, not langchain-community.

3. Text Length Limits

# ❌ Text too long
embeddings = OpenAIEmbeddings()
very_long_text = "..." * 100000
embeddings.embed_query(very_long_text)  # Will fail!

# ✅ Chunk long texts first
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(
    chunk_size=8000,  # OpenAI limit is ~8191 tokens
    chunk_overlap=200,
)
chunks = splitter.split_text(very_long_text)
chunk_embeddings = embeddings.embed_documents(chunks)

Fix: Split long texts into chunks before embedding. Most models have 8k token limits.

4. HuggingFace Model Download

# ❌ First run may be slow (downloading model)
from langchain_huggingface import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-mpnet-base-v2"
)
# Downloads ~420MB on first run!

# ✅ Be aware and cache models
# Models are cached in ~/.cache/huggingface/
# Subsequent runs will be fast

Fix: First run downloads the model. Plan for network and disk space.

5. Azure Configuration Complexity

# ❌ INCOMPLETE: Missing required fields
embeddings = AzureOpenAIEmbeddings(
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
)

# ✅ COMPLETE: All required fields
embeddings = AzureOpenAIEmbeddings(
    azure_endpoint="https://my-instance.openai.azure.com/",
    api_key=os.getenv("AZURE_OPENAI_API_KEY"),
    azure_deployment="text-embedding-ada-002",
    api_version="2024-02-01",
)

Fix: Azure requires endpoint, deployment name, and API version.

6. Ollama Service Must Be Running

# ❌ Ollama not running
from langchain_ollama import OllamaEmbeddings

embeddings = OllamaEmbeddings(model="nomic-embed-text")
embeddings.embed_query("test")  # Connection error!

# ✅ Ensure Ollama is running and model is pulled
# Terminal:
# ollama pull nomic-embed-text
# ollama serve

embeddings = OllamaEmbeddings(model="nomic-embed-text")
embeddings.embed_query("test")  # Works!

Fix: Start Ollama service and pull the model first.

7. Batch Size for Performance

# ❌ Inefficient: One API call per document
embeddings = OpenAIEmbeddings()
for doc in large_doc_list:
    emb = embeddings.embed_query(doc)  # Slow!

# ✅ Efficient: Batch processing
embeddings = OpenAIEmbeddings(chunk_size=100)
all_embeddings = embeddings.embed_documents(large_doc_list)  # Fast!

Fix: Use embed_documents() for batch processing instead of calling embed_query() in a loop.

8. Dimension Mismatch

# ❌ Vector store expecting 1536 dimensions, model produces 512
embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    dimensions=512,
)

# Vector store initialized with different dimensions
vectorstore = FAISS.from_texts(
    ["text1"],
    OpenAIEmbeddings(),  # Uses default 1536 dimensions
)
# Adding with 512-dim embeddings will fail!

# ✅ Consistent dimensions
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = FAISS.from_texts(["text1"], embeddings)

Fix: Ensure all embeddings use consistent dimensions throughout your application.

Links and Resources

Official Documentation

Provider Documentation

Package Installation

# OpenAI
pip install langchain-openai

# Cohere
pip install langchain-cohere

# HuggingFace
pip install langchain-huggingface sentence-transformers

# Ollama
pip install langchain-ollama

# Google
pip install langchain-google-genai