Performs fine-grained re-clustering on specific subsets of cells (e.g., individual clusters, cell types, or custom subsets). Unlike `Seurat::FindSubCluster` which only finds subclusters within a single cluster, this process performs the complete clustering workflow (PCA, UMAP, FindNeighbors, FindClusters) on any subset of cells defined by metadata filters or cell barcode lists.
Install
npx skillscat add pwwang/immunopipe/seuratsubclustering Install via the SkillsCat registry.
SeuratSubClustering Process Configuration
Purpose
Performs fine-grained re-clustering on specific subsets of cells (e.g., individual clusters, cell types, or custom subsets). Unlike Seurat::FindSubCluster which only finds subclusters within a single cluster, this process performs the complete clustering workflow (PCA, UMAP, FindNeighbors, FindClusters) on any subset of cells defined by metadata filters or cell barcode lists.
When to Use
- Cluster heterogeneity analysis: When initial clustering identifies mixed cell populations within a cluster
- Cell type sub-clustering: To resolve heterogeneity within annotated cell types (e.g., T cell subsets: CD4+, CD8+, naive, memory, effector)
- Lineage-specific analysis: To examine substructure within major cell lineages
- Differential sub-populations: When a cluster contains multiple biologically distinct populations (e.g., NK cells + CD4 T cells)
- Multi-resolution exploration: To test different clustering granularities on specific cell subsets
- Downstream marker discovery: When you need markers for sub-populations within larger clusters
Configuration Structure
Process Enablement
[SeuratSubClustering]
cache = true # Cache intermediate results for faster re-runsInput Specification
[SeuratSubClustering.in]
srtobj = ["SeuratClustering"] # Path or reference to Seurat objectEnvironment Variables
Core Parameters
[SeuratSubClustering.envs]
# Number of cores for parallelization
ncores = 1 # int; Higher values speed up computation
# Metadata mutaters to define subset cells
# Applied BEFORE subsetting to create temporary columns
mutaters = {} # json; Dictionary of dplyr-like mutations
# Expression to subset cells (dplyr::filter syntax)
# Applied to metadata using tidyseurat::filter()
subset = "seurat_clusters == 'c3'" # str; Filter expression
# Cache location for intermediate results
cache = "/tmp" # Path; Set to false to disable cachingSub-clustering Cases (Multiple Subsets)
[SeuratSubClustering.envs.cases]
# Keys are case names (prefixes for outputs)
# Values inherit envs parameters (except mutaters, cache)
# If empty, default case "subcluster" is createdCase Naming Rules:
- Case name becomes prefix for reductions:
<CASENAME>PC_,<CASENAME>UMAP_ - Case name becomes prefix for cluster columns:
<CASENAME>.<resolution> - Case name becomes final cluster column:
<CASENAME> - Non-alphanumeric characters in case names are removed
Metadata Output:
- Each case adds new metadata columns to original Seurat object
- Reductions saved:
<CASENAME>.pc,<CASENAME>.umap - Clusters saved:
<CASENAME>.<resolution>for each resolution - Final clusters:
<CASENAME>column
RunPCA Parameters
[SeuratSubClustering.envs.RunPCA]
# See https://satijalab.org/seurat/reference/runpca
# object specified internally as subset object
npcs = 30 # int; Number of PCs to computeRunUMAP Parameters
[SeuratSubClustering.envs.RunUMAP]
# See https://satijalab.org/seurat/reference/runumap
# object specified internally as subset object
# dims=N expanded to dims=1:N (min(N, ncol-1))
dims = 30 # int; Number of PCs to use
# Use specific features instead of dimensions
# Can be list: {order = "desc(abs(avg_log2FC))", n = 30}
# Or numeric (treated as n with default order)
features = 30 # int or list; Top markers for UMAP
# Reduction to use for UMAP
reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted
n.neighbors = 30 # int; Neighborhood size
min.dist = 0.3 # float; Cluster tightness (0.001-0.5)
spread = 1 # float; Embedding scale
seed.use = 42 # int; Random seedFindNeighbors Parameters
[SeuratSubClustering.envs.FindNeighbors]
# See https://satijalab.org/seurat/reference/findneighbors
# object specified internally
reduction = "pca" # str; Uses sobj@misc$integrated_new_reduction if omitted
dims = 30 # int; Dimensions to use
k.param = 20 # int; K-nearest neighbors
prune.SNN = 0.067 # float; SNN pruning threshold (default: 1/15)
nn.method = "annoy" # str; "annoy" or "rann"FindClusters Parameters
[SeuratSubClustering.envs.FindClusters]
# See https://satijalab.org/seurat/reference/findclusters
# object specified internally
# Resolution: Higher = more clusters, Lower = fewer clusters
# Multiple resolutions supported: [0.4, 0.6, 0.8, 1.0]
# Range syntax: "0.1:0.5:0.1" -> [0.1, 0.2, 0.3, 0.4, 0.5]
resolution = 0.8 # float or list; Default: 0.8
# Cluster labels prefixed with "s" (s1, s2, ...) instead of (s0, s1, ...)
algorithm = 1 # int; 1=Louvain, 4=Leiden (recommended)
graph.name = "pca_snn" # str; Must match FindNeighbors SNN graph
random.seed = 0 # int; ReproducibilityMulti-resolution Output:
- Multiple resolutions create columns:
<CASENAME>_0.4,<CASENAME>_0.6,<CASENAME>_0.8,<CASENAME> - Final resolution uses last value in list
External References
Seurat Functions
- RunPCA(): https://satijalab.org/seurat/reference/runpca
- Principal component analysis on subset of cells
- RunUMAP(): https://satijalab.org/seurat/reference/runumap
- Non-linear dimensionality reduction for visualization
- FindNeighbors(): https://satijalab.org/seurat/reference/findneighbors
- K-nearest neighbor graph construction
- FindClusters(): https://satijalab.org/seurat/reference/findclusters
- Community detection (Louvain/Leiden algorithms)
tidyseurat::filter()
https://stemangiola.github.io/tidyseurat/reference/filter.html
- Subset Seurat objects using dplyr-like filter syntax
- Supports logical expressions:
seurat_clusters == 'c3',celltype %in% c('CD4', 'CD8') - Can use any metadata column created by
mutaters
Configuration Examples
Minimal Configuration (Default Case)
[SeuratSubClustering]
[SeuratSubClustering.in]
srtobj = ["SeuratClustering"]Result: Creates default case "subcluster" with all cells
Single Cluster Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs]
subset = "seurat_clusters == 'c3'"Result: Re-clusters only cells in cluster c3
Metadata-Based Sub-clustering (Cell Type)
[SeuratSubClustering]
[SeuratSubClustering.envs]
# First add cell type annotation via mutaters
mutaters = {is_cd4 = "if_else(celltype == 'CD4 T cell', TRUE, FALSE)"}
[SeuratSubClustering.envs.RunPCA]
npcs = 50
[SeuratSubClustering.envs.FindClusters]
resolution = 1.2
algorithm = 4 # LeidenResult: Creates subcluster case for CD4+ cells only
Multiple Sub-clustering Cases
[SeuratSubClustering]
[SeuratSubClustering.envs]
# Define multiple sub-clustering cases
[SeuratSubClustering.envs.cases.TEffector]
subset = "celltype == 'CD8 T cell' & state == 'Effector'"
resolution = 1.0
[SeuratSubClustering.envs.cases.TNaive]
subset = "celltype == 'CD8 T cell' & state == 'Naive'"
resolution = 0.8
[SeuratSubClustering.envs.cases.CD4Memory]
subset = "celltype == 'CD4 T cell' & state == 'Memory'"
resolution = 1.5Result: Three sub-clustering analyses with different resolutions
- Metadata columns:
TEffector,TNaive,CD4Memory - Reductions:
TEFFECTORPC_,TNAIVEPC_,CD4MEMORYPC_, etc. - Clusters:
TEffector,TNaive,CD4Memory
Multi-resolution Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.Cluster3]
subset = "seurat_clusters == 'c3'"
[SeuratSubClustering.envs.cases.Cluster3.FindClusters]
# Test multiple resolutions
resolution = "0.4:1.2:0.2" # [0.4, 0.6, 0.8, 1.0, 1.2]
algorithm = 4 # LeidenResult: Cluster3 has columns Cluster3_0.4, Cluster3_0.6, Cluster3_0.8, Cluster3_1.0, Cluster3
Using Top Markers for UMAP
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.MixedCluster]
subset = "seurat_clusters == 'c5'"
[SeuratSubClustering.envs.cases.MixedCluster.RunUMAP]
# Use top 30 DEGs for UMAP instead of PCs
features = {order = "desc(abs(avg_log2FC))", n = 30}Result: Sub-cluster based on top DEGs for better separation
Leiden Algorithm with Custom Parameters
[SeuratSubClustering]
[SeuratSubClustering.envs]
ncores = 4
[SeuratSubClustering.envs.FindNeighbors]
k.param = 30
prune.SNN = 0.05
[SeuratSubClustering.envs.FindClusters]
algorithm = 4 # Leiden
resolution = 1.0
random.seed = 42Complex Subset with Multiple Conditions
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.ActivatedT]
subset = "celltype %in% c('CD4 T cell', 'CD8 T cell') & activation == 'Activated'"
[SeuratSubClustering.envs.cases.ActivatedT.RunPCA]
npcs = 40
[SeuratSubClustering.envs.cases.ActivatedT.RunUMAP]
dims = 40
n.neighbors = 20
min.dist = 0.2Common Patterns
Pattern 1: Single Cluster Deep Dive
[SeuratSubClustering]
[SeuratSubClustering.envs]
# Re-cluster cluster 3 to resolve heterogeneity
subset = "seurat_clusters == 'c3'"Pattern 2: Multiple Lineage Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.TCD4]
subset = "celltype == 'CD4 T cell'"
[SeuratSubClustering.envs.cases.TCD8]
subset = "celltype == 'CD8 T cell'"
[SeuratSubClustering.envs.cases.TGD]
subset = "celltype == 'Gamma delta T cell'"
[SeuratSubClustering.envs.cases.NK]
subset = "celltype == 'NK cell'"Pattern 3: Functional State Sub-clustering
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.Effector]
subset = "state == 'Effector'"
[SeuratSubClustering.envs.cases.Effector.FindClusters]
resolution = 1.5 # Higher resolution for more sub-states
[SeuratSubClustering.envs.cases.Memory]
subset = "state == 'Memory'"
[SeuratSubClustering.envs.cases.Naive]
subset = "state == 'Naive'"Pattern 4: Re-clustering Based on Clonality (TCR+)
[SeuratSubClustering]
[SeuratSubClustering.envs]
# After ScRepCombiningExpression adds clonality metadata
[SeuratSubClustering.envs.cases.ExpandedClones]
subset = "clone_size >= 5" # Large clones
[SeuratSubClustering.envs.cases.ExpandedClones.FindClusters]
resolution = 0.6 # Lower resolution for broader groups
[SeuratSubClustering.envs.cases.RareClones]
subset = "clone_size == 1" # Unique clones
[SeuratSubClustering.envs.cases.RareClones.FindClusters]
resolution = 1.2 # Higher resolution to capture diversityPattern 5: Multi-resolution Exploration
[SeuratSubClustering]
[SeuratSubClustering.envs.cases.TumorCluster]
subset = "seurat_clusters == 'c8'"
[SeuratSubClustering.envs.cases.TumorCluster.FindClusters]
resolution = "0.2:2.0:0.2" # Sweep: [0.2, 0.4, ..., 2.0]
algorithm = 4 # LeidenDependencies
Upstream Processes
- Required:
SeuratClustering(orSeuratClusteringOfAllCellsif TOrBCellSelection used) - Optional:
ScRepCombiningExpression(if TCR/BCR data present, adds clonality metadata for subsetting) - Optional:
CellTypeAnnotation(if using annotated cell types for subsetting)
Downstream Processes
- SeuratClusterStats: Statistics for sub-clusters
- ClusterMarkers: Differential expression between sub-clusters
- MarkersFinder: Flexible marker finding with enrichment analysis
- ScFGSEA: Pathway analysis on sub-cluster markers
- ModuleScoreCalculator: Module scoring within sub-clusters
Validation Rules
Subset Expression Validation
- Must be valid dplyr::filter() expression
- Can reference any metadata column in Seurat object
- Complex expressions supported:
&(AND),|(OR),%in%(in operator) - Example:
seurat_clusters == 'c3' & percent.mt < 5 - Example:
celltype %in% c('CD4 T cell', 'CD8 T cell')
Case Name Validation
- Must contain only alphanumeric characters
- Non-alphanumeric characters automatically removed
- Used as prefix: reductions and cluster names
- Avoid spaces, special characters in case names
Resolution Constraints
- Must be positive (resolution > 0)
- Single value, list, or range syntax allowed
- Range:
"start:end:step"(step defaults to 0.1 if omitted) - Multi-resolution creates multiple metadata columns
Dimension Requirements
RunPCA.npcsmust not exceed cells in subsetRunUMAP.dimsautomatically truncated tomin(dims, ncol(reduction) - 1)- Use fewer dimensions for small subsets (< 100 cells)
Graph Name Consistency
FindClusters.graph.namemust matchFindNeighborsoutput- Default:
pca_snnwhen not specified - When using integrated reductions, ensure consistency
Troubleshooting
Issue: Subset Returns Zero Cells
Symptoms: Sub-clustering produces empty subset
Solutions:
- Verify subset expression syntax
# Check if column exists and values are correct
[SeuratSubClustering.envs]
# Use single quotes for string comparison
subset = "seurat_clusters == 'c3'" # Correct
subset = "seurat_clusters == c3" # Wrong (treated as variable)- Verify column names exist in metadata
# Use existing columns only
subset = "seurat_clusters == 'c3'" # seurat_clusters exists
subset = "cluster_id == 'c3'" # cluster_id may not exist- Check for exact string matching
# Case-sensitive
subset = "celltype == 'CD4 T cell'" # Exact match
subset = "celltype == 'CD4 T Cell'" # Wrong caseIssue: Too Many Small Sub-clusters
Symptoms: Hundreds of tiny sub-clusters, many singletons
Solutions:
[SeuratSubClustering.envs.FindClusters]
resolution = 0.4 # Lower resolution
algorithm = 4 # Leiden handles singletons betterIssue: Sub-clusters Overlapping in UMAP
Symptoms: Poor separation in sub-cluster visualization
Solutions:
[SeuratSubClustering.envs.RunUMAP]
min.dist = 0.1 # Tighter clusters
n.neighbors = 15 # More local detail
spread = 1.2 # More separationIssue: Sub-clustering Uses Wrong Reduction
Symptoms: Clustering on raw RNA instead of integrated data
Solutions:
[SeuratSubClustering.envs.FindNeighbors]
reduction = "integrated.cca" # Use integrated reduction
[SeuratSubClustering.envs.RunUMAP]
reduction = "integrated.cca"Issue: Multi-resolution Columns Not Created
Symptoms: Only final resolution column appears
Solutions:
[SeuratSubClustering.envs.FindClusters]
# Use list syntax (not single value with range)
resolution = [0.4, 0.6, 0.8, 1.0] # Correct
resolution = "0.4:1.0:0.2" # Also correctIssue: Case Names Too Similar
Symptoms: Confusion between multiple cases
Solutions:
# Use descriptive, unique case names
[SeuratSubClustering.envs.cases]
T_CD4_Effector = {subset = "..."}
T_CD4_Naive = {subset = "..."}
B_Memory = {subset = "..."}Issue: Sub-clustering on All Cells (Not Subset)
Symptoms: Default case runs on entire object
Solutions:
# Always specify subset or use cases
[SeuratSubClustering.envs]
subset = "seurat_clusters == 'c3'" # Explicit subset
# Or define specific cases
[SeuratSubClustering.envs.cases.MyCase]
subset = "seurat_clusters == 'c3'"Issue: Reductions Not Saved
Symptoms: Cannot find <CASENAME>PC_ or <CASENAME>UMAP_
Solutions:
# Ensure case name is alphanumeric only
[SeuratSubClustering.envs.cases]
MySubCluster1 = {subset = "..."} # Correct
Sub-Cluster = {subset = "..."} # Hyphen removed -> SubCluster
# Check metadata for actual reduction names
# Reductions are: <CASENAME>pc, <CASENAME>umap (lowercase)Best Practices
- Define explicit subsets: Always specify
subsetor definecasesto avoid default case on all cells - Use descriptive case names: Make case names clear and unique (e.g.,
T_Effector, notcase1) - Test multiple resolutions: Sweep resolution range to find optimal granularity for each subset
- Use Leiden algorithm: Prefer
algorithm = 4for better community detection - Leverage metadata columns: Use CellTypeAnnotation results, TCR clonality, or custom mutaters for subsetting
- Set random seeds: Ensure reproducible sub-clustering results with
random.seed - Parallelize large subsets: Use
ncores > 1for subsets > 10k cells - Adjust UMAP parameters: Smaller subsets may need different
n.neighborsandmin.dist - Document sub-clustering strategy: Comment on biological rationale for each case in config
- Use multi-resolution: Test
[0.4, 0.6, 0.8, 1.0]to capture different granularities
Related Processes
- SeuratClustering: Initial clustering before sub-clustering
- SeuratClusteringOfAllCells: Clustering before T/B cell selection
- CellTypeAnnotation: Annotate clusters before sub-clustering by cell type
- ClusterMarkers: Find markers for sub-clusters
- MarkersFinder: Flexible marker finding with multiple comparison groups