Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
Install
npx skillscat add pwwang/immunopipe/topexpressinggenesofallcells Install via the SkillsCat registry.
TopExpressingGenesOfAllCells Process Configuration
Purpose
Identifies and visualizes the top expressing genes per cluster across ALL cells (before T/B cell selection), followed by pathway enrichment analysis. Provides initial overview of all cell populations by highlighting the most highly expressed genes and their biological functions.
When to Use
- After:
SeuratClusteringOfAllCellsprocess - Before:
TOrBCellSelection(this is a pre-selection analysis) - Use cases:
- Quick overview of ALL cell populations before separation
- Initial assessment of broad cell type signatures
- Understanding overall cell composition before T/B selection
- Pathway enrichment on cell type markers before detailed analysis
- Quality check for unexpected cell types
- Complementary to
ClusterMarkersOfAllCellsfor complete pre-selection profiling
- Optional process: Enable only when pre-selection analysis is needed
Configuration Structure
Process Enablement
[TopExpressingGenesOfAllCells]
cache = trueInput Specification
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]Note: srtobj accepts the output from SeuratClusteringOfAllCells.
Environment Variables
Core Parameters
[TopExpressingGenesOfAllCells.envs]
# Number of top expressing genes to identify per cluster
n = 250
# Enrichment style
enrich_style = "enrichr" # Options: "enrichr", "clusterprofiler"
# Enrichment databases
dbs = ["KEGG_2021_Human", "MSigDB_Hallmark_2020"]Enrichment Plot Settings
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
# Plot type for enrichment results
plot_type = "bar" # Options: "bar", "dot", "lollipop", "network", "enrichmap", "wordcloud"
# Device parameters
devpars = {res = 100, width = 800, height = 600}
# Additional output formats
more_formats = []
# Save R code to reproduce plots
save_code = false
# Top terms to display
top_term = 10 # Number of top enriched pathways to show
ncol = 1 # Number of columns in multi-panel plotsCell Subsetting
[TopExpressingGenesOfAllCells.envs]
# Subset cells before analysis (optional)
subset = ""Cache Control
[TopExpressingGenesOfAllCells.envs]
# Cache intermediate results
cache = "/tmp" # true, false, or directory pathConfiguration Examples
Minimal Configuration
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]Top 10 Genes for Broad Cell Type ID
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]Multiple Databases for Comprehensive Overview
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 100
dbs = [
"KEGG_2021_Human",
"MSigDB_Hallmark_2020",
"GO_Biological_Process_2025"
]Common Patterns
Pattern 1: Quick All-Cell Overview (Pre-Selection)
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 10
dbs = ["MSigDB_Hallmark_2020"]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "bar"
top_term = 10What to expect: Top 10 genes per cluster showing broad cell type markers (CD3 for T cells, CD19 for B cells, CD14 for monocytes, etc.)
Pattern 2: Broad Cell Type Signature Identification
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 50
[TopExpressingGenesOfAllCells.envs.enrich_plots]
"T Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"B Cell Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}
"Myeloid Pathways" = {plot_type = "bar", dbs = ["KEGG_2021_Human"]}What to expect: Identification of T cell (CD3E, CD3D), B cell (CD19, MS4A1), and myeloid (CD14, LYZ) signatures across clusters
Pattern 3: Quality Check for Unexpected Cell Types
[TopExpressingGenesOfAllCells]
[TopExpressingGenesOfAllCells.in]
srtobj = ["SeuratClusteringOfAllCells"]
[TopExpressingGenesOfAllCells.envs]
n = 20
dbs = [
"GO_Biological_Process_2025",
"GO_Cellular_Component_2025"
]
[TopExpressingGenesOfAllCells.envs.enrich_plots_defaults]
plot_type = "dot"
top_term = 15What to expect: Detection of contamination (e.g., EPCAM for epithelial, COL1A1 for fibroblasts, RBC markers)
Difference from TopExpressingGenes
TopExpressingGenesOfAllCells vs TopExpressingGenes:
| Aspect | TopExpressingGenesOfAllCells | TopExpressingGenes |
|---|---|---|
| When it runs | BEFORE TOrBCellSelection |
AFTER TOrBCellSelection |
| Input data | All cells (unfiltered) | Only selected T or B cells |
| Upstream process | SeuratClusteringOfAllCells |
SeuratClustering + TOrBCellSelection |
| Use case | Initial assessment, quality check | Detailed T/B cell analysis |
| Cell types | ALL cell types present | Only T OR B cells |
| Typical markers | CD3, CD19, CD14, etc. | Specific T/B cell subtypes |
| Position in workflow | Pre-selection overview | Post-selection deep dive |
Workflow context:
RNA Input → SeuratPreparing → SeuratClusteringOfAllCells
↓
TopExpressingGenesOfAllCells ← Runs here
↓
TOrBCellSelection (separates T/B)
↓
SeuratClustering (on selected cells)
↓
TopExpressingGenes ← Runs hereRecommendation:
- Use
TopExpressingGenesOfAllCellsto assess overall data quality and cell type composition - Use
TopExpressingGenesfor detailed analysis of T or B cell subtypes - Enable both for comprehensive analysis: pre-selection overview + post-selection deep dive
Dependencies
- Upstream:
SeuratClusteringOfAllCells - Downstream:
TOrBCellSelection(optional - this process provides pre-selection context) - Data: Seurat object with cluster assignments for ALL cells
Validation Rules
nparameter: Must be positive integer (typically 10-500)dbs: Must be valid enrichit/Enrichr database names or local GMT file pathsenrich_style: Must be "enrichr" or "clusterprofiler"plot_type: Must be valid scplotter plot type- Workflow requirement: Only runs when
SeuratClusteringOfAllCellsis enabled
Troubleshooting
Process Not Running
Issue: TopExpressingGenesOfAllCells not executed despite being in config
Causes:
SeuratClusteringOfAllCellsnot enabled- Missing dependency in workflow
- Process disabled via validation warning
Solutions:
- Ensure
SeuratClusteringOfAllCellsis enabled in config - Check validation warnings:
python -m immunopipe.validate_config config.toml - Verify both processes in config:
[SeuratClusteringOfAllCells] [TopExpressingGenesOfAllCells]
Mixed Cell Types in Results
Issue: Clusters show multiple cell type markers (CD3 + CD19)
Causes:
- Overlapping clusters (resolution too low)
- Doublets/multiplets not filtered
- Contamination in data
Solutions:
- Adjust clustering resolution in
SeuratClusteringOfAllCells - Filter doublets in
SeuratPreparingstep - Use
TOrBCellSelectionafter assessment to clean data
No Clear Cell Type Signatures
Issue: Top genes list lacks expected markers (CD3, CD19, CD14)
Causes:
- Data quality issues (low counts, high mitochondrial)
- Wrong organism (human vs mouse gene symbols)
- Incomplete clustering
Solutions:
- Check QC metrics in
SeuratClusterStatsOfAllCells - Verify organism (uppercase=human, titlecase=mouse)
- Review clustering results from
SeuratClusteringOfAllCells
Ribosomal/Mitochondrial Gene Dominance
Issue: Top genes list dominated by housekeeping genes (RPS, RPL, MT-)
Solutions:
- Increase
nparameter to see beyond housekeeping genes - Filter out ribosomal/mitochondrial genes in
SeuratPreparingstep - Use
ClusterMarkersOfAllCellsfor differential expression
Empty Enrichment Results
Issue: No pathways enriched despite top genes identified
Causes:
- Gene identifiers don't match database
ntoo small for meaningful enrichment- Database not appropriate for cell type
Solutions:
- Increase
nto 100-500 genes - Verify species match (check gene symbols)
- Try different databases (e.g.,
GO_Biological_Process_2025)
Plot Rendering Errors
Issue: Enrichment plots fail to render
Causes:
- Network plots with too many terms
- Missing dependencies in R environment
Solutions:
- Reduce
top_termparameter - Use simpler plot types (
bar,dot) - Verify R packages installed:
enrichit,scplotter
Output Structure
<srtobj_stem>.top_expressing_genes/
├── <cluster_name>/ # One subdirectory per cluster (ALL cells)
│ ├── top_genes.tsv # Top N genes with expression metrics
│ └── enrich/ # Enrichment results
│ ├── <db_name>/ # One subdirectory per database
│ │ ├── *.Bar-Plot.png # Enrichment plots
│ │ ├── *.enrich.tsv # Enrichment tables
│ │ └── ...External References
Enrichment Databases (enrichit)
Built-in databases:
KEGG_2021_Human- KEGG pathways (human)MSigDB_Hallmark_2020- MSigDB Hallmark gene setsGO_Biological_Process_2025- GO Biological Process termsGO_Cellular_Component_2025- GO Cellular Component termsGO_Molecular_Function_2025- GO Molecular Function termsReactome_Pathways_2024- Reactome pathwaysWikiPathways_2024_Human- WikiPathways (human)
Enrichr libraries: See https://maayanlab.cloud/Enrichr/#libraries
Enrichment Plot Types (scplotter)
bar- Bar chart of enriched termsdot- Dot plot (bubble chart)lollipop- Lollipop plotnetwork- Network visualization of term relationshipsenrichmap- Enrichment map (similar to Cytoscape)wordcloud- Word cloud visualization
Enrichment Styles
enrichr- Fisher's exact test (Enrichr-style)clusterprofiler- Hypergeometric test (clusterProfiler-style)
See Also
TopExpressingGenes- Top genes for selected T/B cells after selectionClusterMarkersOfAllCells- Differential expression for all cells before selectionSeuratClusteringOfAllCells- Clustering on all cells before T/B selectionTOrBCellSelection- T/B cell separation process