Performs unsupervised clustering on single-cell RNA-seq data using Seurat. This process finds nearest neighbors, computes UMAP for visualization, and applies Louvain/Leiden algorithms to identify cell clusters. Clusters can be explored at multiple resolutions to balance granularity and biological relevance.
Install
npx skillscat add pwwang/immunopipe/seuratclustering Install via the SkillsCat registry.
SeuratClustering Process Configuration
Purpose
Performs unsupervised clustering on single-cell RNA-seq data using Seurat. This process finds nearest neighbors, computes UMAP for visualization, and applies Louvain/Leiden algorithms to identify cell clusters. Clusters can be explored at multiple resolutions to balance granularity and biological relevance.
When to Use
- After SeuratPreparing: Standard workflow after QC and normalization
- T/B cell selection: After SeuratClusteringOfAllCells (if TOrBCellSelection enabled)
- Reference-based annotation: Alternative to SeuratMap2Ref or CellTypeAnnotation
- Standard clustering: When you need unsupervised cell type discovery
- Multi-resolution exploration: When unsure of optimal cluster granularity
Configuration Structure
Process Enablement
[SeuratClustering]
cache = true # Cache intermediate results for faster re-runsInput Specification
[SeuratClustering.in]
srtobj = ["SeuratPreparing"] # Path or reference to Seurat objectEnvironment Variables
Core Parameters
[SeuratClustering.envs]
# Number of cores for parallelization
ncores = 1 # int; Higher values speed up computation
# Metadata column name for cluster labels
ident = "seurat_clusters" # Default column name
# Cache location for intermediate results
cache = "/tmp" # Path; Set to false to disable cachingFindNeighbors Parameters
[SeuratClustering.envs.FindNeighbors]
# K-nearest neighbors: Defines neighborhood size (default: 20)
# Larger values capture more global structure
k.param = 20 # int; Range: 5-100 depending on dataset size
# Reduction to use for building neighbor graph
# If not specified, uses sobj@misc$integrated_new_reduction
reduction = "pca" # Options: "pca", "integrated.cca", "integrated.rpca", etc.
# Dimensions to use from reduction
dims = 30 # int or 1:N; Automatically expanded to 1:dims
# Pruning threshold for shared nearest neighbor (SNN) graph
# 0 = no pruning, 1 = prune everything
prune.SNN = 0.067 # Default: 1/15; Controls graph connectivity
# Nearest neighbor method
nn.method = "annoy" # Options: "annoy", "rann"
# Graph naming (for multiple integration methods)
graph.name = ["pca_nn", "pca_snn"] # [NN, SNN] graph namesFull FindNeighbors Parameter List:
k.param(int): Number of nearest neighbors (default: 20)reduction(str): Dimensional reduction to use (default: "pca")dims(int): Number of dimensions to use (default: 1:10)assay(str): Assay to use when dims is NULLfeatures(list): Features to use when dims is NULLcompute.SNN(bool): Compute shared nearest neighbor graph (default: TRUE)prune.SNN(float): SNN pruning threshold (default: 1/15)nn.method(str): NN algorithm - "annoy" or "rann" (default: "annoy")n.trees(int): Annoy tree count (default: 50)annoy.metric(str): Distance metric - "euclidean", "cosine", "manhattan", "hamming" (default: "euclidean")graph.name(list): Names for NN and SNN graphsverbose(bool): Print output (default: TRUE)
RunUMAP Parameters
[SeuratClustering.envs.RunUMAP]
# Reduction to use for UMAP embedding
reduction = "pca" # Options: "pca", "integrated.cca", "integrated.rpca"
# Dimensions to use
dims = 30 # int; Automatically expanded to 1:dims
# Use specific features instead of dimensions
# Can be a list with "order" and "n" fields
features = 30 # or list: {order = "desc(abs(avg_log2FC))", n = 30}
# Number of neighboring points (global vs local structure)
n.neighbors = 30 # int; Range: 5-50; Higher = more global
# Min distance controls cluster tightness
min.dist = 0.3 # float; Range: 0.001-0.5; Higher = more spread out
# Effective scale of embedded points
spread = 1 # float; Works with min.dist for cluster distribution
# Number of embedding dimensions
n.components = 2 # int; Usually 2 for visualization
# Distance metric
metric = "cosine" # Options: "euclidean", "cosine", "manhattan", etc.
# Learning rate for optimization
learning.rate = 1 # float; Initial learning rate
# Random seed for reproducibility
seed.use = 42 # intFull RunUMAP Parameter List:
dims(int): Dimensions to use (default: NULL, uses reduction)reduction(str): Reduction to use (default: "pca")features(list/int): Features to use instead of dimsn.neighbors(int): Neighborhood size (default: 30)n.components(int): Embedding dimensions (default: 2)metric(str): Distance metric (default: "cosine")min.dist(float): Cluster tightness (default: 0.3)spread(float): Scale of embedding (default: 1.0)learning.rate(float): Initial learning rate (default: 1.0)n.epochs(int): Training epochs (default: auto: 200 large, 500 small)set.op.mix.ratio(float): Fuzzy set operation ratio (default: 1.0)local.connectivity(int): Local connectivity (default: 1)seed.use(int): Random seed (default: 42)reduction.name(str): Name for UMAP reduction (default: "umap")verbose(bool): Print output (default: TRUE)
FindClusters Parameters
[SeuratClustering.envs.FindClusters]
# Resolution: Higher = more clusters, Lower = fewer clusters
resolution = 0.8 # float; Default: 0.8
# Multiple resolutions supported: [0.4, 0.6, 0.8, 1.0]
# Range syntax: "0.2:1.0:0.1" -> [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
# Clustering algorithm (1=Louvain, 2=Louvain multilevel, 3=SLM, 4=Leiden)
# Leiden (4) is preferred over Louvain (1-3)
algorithm = 4 # int; 1-4; 4 = Leiden (recommended)
# Leiden implementation
leiden_method = "leidenbase" # Options: "leidenbase", "igraph"
# Leiden objective function
leiden_objective_function = "modularity" # Options: "modularity", "CPM"
# Random seed for reproducibility
random.seed = 0 # int
# Number of random starts
n.start = 10 # int
# Max iterations per start
n.iter = 10 # int
# Graph to use for clustering
graph.name = "pca_snn" # Must match FindNeighbors graph.name[1]
# Cluster name in metadata
cluster.name = "seurat_clusters" # Can use envs.ident instead
# Group singletons into nearest cluster
group.singletons = TRUE # boolFull FindClusters Parameter List:
resolution(float): Cluster granularity (default: 0.8)algorithm(int): 1=Louvain, 2=Louvain multilevel, 3=SLM, 4=Leiden (default: 1)leiden_method(str): "leidenbase" or "igraph" (default: "leidenbase")leiden_objective_function(str): "modularity" or "CPM" (default: "modularity")random.seed(int): Random seed (default: 0)n.start(int): Random starts (default: 10)n.iter(int): Max iterations (default: 10)graph.name(str): SNN graph name to usecluster.name(str): Metadata column for clustersmodularity.fxn(int): Modularity function (default: 1)group.singletons(bool): Group singletons (default: TRUE)verbose(bool): Print output (default: TRUE)
External References
FindNeighbors (Seurat v5)
https://satijalab.org/seurat/reference/findneighbors
- Constructs shared nearest neighbor (SNN) graph
- Computes k-nearest neighbors and Jaccard index for neighborhood overlap
- Pruning controls graph connectivity (higher = stricter)
RunUMAP (Seurat v5)
https://satijalab.org/seurat/reference/runumap
- Uniform Manifold Approximation and Projection for visualization
n.neighbors: Global structure vs local detail trade-off (5-50)min.dist: Cluster tightness (0.001-0.5)spread: Scale of embedding (works with min.dist)
FindClusters (Seurat v5)
https://www.rdocumentation.org/packages/Seurat/versions/5.3.1/topics/FindClusters
- Louvain (1-3) vs Leiden (4) clustering algorithms
- Leiden preferred: Better community detection, improved over Louvain
- Resolution: >1.0 = more clusters, <1.0 = fewer clusters
Algorithm Comparison: Leiden vs Louvain
- Leiden (algorithm=4): Preferred, better community detection, refined clusters
- Louvain (algorithm=1-3): Faster but less accurate, can produce badly connected communities
- Source: Single-cell best practices recommend Leiden
https://www.sc-best-practices.org/cellular_structure/clustering.html
Integration Method Support
When using SeuratData integration workflows, use integrated reduction:
- CCA integration:
reduction = "integrated.cca" - RPCA integration:
reduction = "integrated.rpca" - Harmony integration:
reduction = "integrated.harmony"
Configuration Examples
Minimal Configuration
[SeuratClustering]
[SeuratClustering.in]
srtobj = ["SeuratPreparing"]Result: Uses defaults (PCA, 30 dims, resolution 0.8, Louvain algorithm)
Standard Resolution Sweep
[SeuratClustering]
[SeuratClustering.in]
srtobj = ["SeuratPreparing"]
[SeuratClustering.envs.FindClusters]
resolution = [0.4, 0.6, 0.8, 1.0] # Test 4 resolutionsResult: Creates seurat_clusters_0.4, seurat_clusters_0.6, etc. Final = 1.0
Range Syntax for Resolution Sweep
[SeuratClustering.envs.FindClusters]
# From 0.2 to 1.0 with step 0.1
resolution = "0.2:1.0:0.1" # Equivalent to [0.2, 0.3, ..., 1.0]Leiden Algorithm with Custom Parameters
[SeuratClustering]
[SeuratClustering.envs.FindNeighbors]
k.param = 30 # Larger neighborhood
prune.SNN = 0.05 # Less pruning
graph.name = ["pca_nn", "pca_snn"]
[SeuratClustering.envs.FindClusters]
algorithm = 4 # Leiden
resolution = 1.2 # Higher resolution for more clusters
random.seed = 42 # Reproducible clustering
graph.name = "pca_snn"Integrated Data (CCA/RPCA)
[SeuratClustering]
[SeuratClustering.envs.FindNeighbors]
reduction = "integrated.cca" # Use integrated reduction
dims = 30
[SeuratClustering.envs.RunUMAP]
reduction = "integrated.cca"
dims = 30
reduction.name = "umap.cca"
[SeuratClustering.envs.FindClusters]
resolution = 1.0Custom UMAP Parameters for Better Separation
[SeuratClustering]
[SeuratClustering.envs.RunUMAP]
n.neighbors = 15 # More local detail
min.dist = 0.1 # Tighter clusters
spread = 1.5 # More spread out
seed.use = 123Using Top Markers for UMAP
[SeuratClustering]
[SeuratClustering.envs.RunUMAP]
# Use top 30 markers for UMAP instead of PCs
features = {order = "desc(abs(avg_log2FC))", n = 30}Multi-Process with Custom Cluster Names
[SeuratClustering]
[SeuratClustering.envs]
ident = "my_clusters" # Custom column name
[CellTypeAnnotation]
[CellTypeAnnotation.envs]
newcol = "cell_types" # Different column to avoid overwriting
[SeuratMap2Ref]
[SeuratMap2Ref.envs]
name = "ref_clusters" # Another clustering methodCommon Patterns
Pattern 1: Single Resolution (Standard)
[SeuratClustering]
[SeuratClustering.envs.FindClusters]
resolution = 0.8 # Default balanced resolutionPattern 2: Resolution Sweep for Exploration
[SeuratClustering]
[SeuratClustering.envs.FindClusters]
resolution = "0.4:1.2:0.2" # [0.4, 0.6, 0.8, 1.0, 1.2]Pattern 3: Leiden with High Resolution (Fine-grained)
[SeuratClustering]
[SeuratClustering.envs.FindNeighbors]
k.param = 25
[SeuratClustering.envs.FindClusters]
algorithm = 4 # Leiden
resolution = 1.5 # More clustersPattern 4: Integrated Data (Post-Integration)
[SeuratClustering]
[SeuratClustering.envs.FindNeighbors]
reduction = "integrated.cca"
dims = 30
[SeuratClustering.envs.RunUMAP]
reduction = "integrated.cca"
dims = 30
[SeuratClustering.envs.FindClusters]
resolution = 1.0Pattern 5: Sparse UMAP for Large Datasets
[SeuratClustering]
[SeuratClustering.envs.FindNeighbors]
nn.method = "annoy"
n.trees = 50
[SeuratClustering.envs.RunUMAP]
n.neighbors = 50 # More global for large datasetsDependencies
Upstream Processes
- Required:
SeuratPreparing(orSeuratClusteringOfAllCellsif TOrBCellSelection used) - Optional:
LoadingRNAFromSeuratwithprepared = false(if loading unprepared Seurat object)
Downstream Processes
- SeuratClusterStats: Cluster statistics and quality metrics
- ClusterMarkers: Differential expression between clusters
- MarkersFinder: Flexible marker finding with enrichment analysis
- ScRepCombiningExpression: If TCR data present (combines RNA + TCR)
- TESSA: TCR-specific clustering analysis
Validation Rules
Resolution Constraints
- Must be positive (resolution > 0)
- Single value or list of values allowed
- Range syntax:
"start:end:step"(step defaults to 0.1 if omitted)
Dimension Requirements
dimsmust not exceed available dimensions in reduction- Automatically truncated to
min(dims, ncol(reduction) - 1)
Graph Name Consistency
FindClusters.graph.namemust matchFindNeighbors.graph.name[1](SNN graph name)- When using multiple integration methods, use unique graph names
Algorithm Selection
- Louvain: algorithm = 1 (original), 2 (multilevel), 3 (SLM)
- Leiden: algorithm = 4 (recommended)
- Leiden requires
leiden_methodandleiden_objective_functionparameters
Troubleshooting
Issue: Too Many Small Clusters
Symptoms: Hundreds of tiny clusters, many singletons
Solutions:
[SeuratClustering.envs.FindClusters]
resolution = 0.4 # Lower resolution
algorithm = 4 # Leiden handles singletons better
group.singletons = TRUE # Group singletons into nearest clusterIssue: Clusters Overlapping in UMAP
Symptoms: Poor separation in UMAP visualization
Solutions:
[SeuratClustering.envs.RunUMAP]
min.dist = 0.1 # Tighter clusters
n.neighbors = 15 # More local detail
spread = 1.2 # More separationIssue: Clustering Not Reproducible
Symptoms: Different clusters on each run
Solutions:
[SeuratClustering.envs.FindNeighbors]
# Set seed for reproducible nearest neighbors
seed.use = 42
[SeuratClustering.envs.FindClusters]
random.seed = 42
n.start = 10 # More starts for stable resultsIssue: Slow Performance
Symptoms: Clustering takes hours
Solutions:
[SeuratClustering.envs]
ncores = 8 # Use more cores
[SeuratClustering.envs.FindNeighbors]
nn.method = "annoy" # Faster approximate NN
dims = 20 # Fewer dimensions
[SeuratClustering.envs.RunUMAP]
n.epochs = 200 # Fewer epochs (default auto-selects based on size)Issue: Badly Connected Communities (Louvain)
Symptoms: Leiden warning about disconnected clusters
Solutions:
[SeuratClustering.envs.FindClusters]
algorithm = 4 # Switch to Leiden
leiden_method = "leidenbase" # Use leidenbase implementationIssue: Graph Name Conflicts with Multiple Integrations
Symptoms: Wrong graph used for clustering
Solutions:
# CCA integration
[SeuratClustering.envs.FindNeighbors]
reduction = "integrated.cca"
graph.name = ["cca_nn", "cca_snn"]
[SeuratClustering.envs.FindClusters]
graph.name = "cca_snn" # Must match SNN graph name
# RPCA integration
[SeuratClustering.envs.FindNeighbors]
reduction = "integrated.rpca"
graph.name = ["rpca_nn", "rpca_snn"]
[SeuratClustering.envs.FindClusters]
graph.name = "rpca_snn"Issue: Clustering Not Using Integration
Symptoms: Clustering on raw RNA instead of integrated data
Solutions:
[SeuratClustering.envs.FindNeighbors]
reduction = "integrated.cca" # Specify integrated reduction
dims = 30
# If using SCTransform + integration:
[SeuratPreparing.envs]
integration_method = "CCA" # Ensure integration is performedBest Practices
- Use Leiden algorithm (algorithm = 4) for better community detection
- Test multiple resolutions to find optimal granularity
- Set random seeds for reproducible results
- Match reduction to integration if using CCA/RPCA/Harmony
- Custom cluster names when running multiple annotation methods to avoid overwriting
- Cache intermediate results for faster re-runs with different parameters
- Parallelize with ncores for large datasets (>50k cells)
- Use resolution sweeps when unsure of optimal granularity
Related Processes
- SeuratClusteringOfAllCells: Clustering before T/B cell selection
- SeuratSubClustering: Re-clustering within specific clusters
- SeuratMap2Ref: Reference-based supervised clustering
- CellTypeAnnotation: Automated cell type annotation