screploading

Load single-cell TCR-seq or scBCR-seq data from various formats into a scRepertoire-compatible object. This process reads VDJ (variable, diversity, joining) receptor contig data from multiple single-cell sequencing platforms and prepares it for integration with scRNA-seq data.

pwwang 21 4 Updated 6mo ago

GitHub

Install

npx skillscat add pwwang/immunopipe/screploading

Install via the SkillsCat registry.

SKILL.md

ScRepLoading Process Configuration

Purpose

When to Use

When analyzing scTCR-seq or scBCR-seq data alongside scRNA-seq
Required for TCR/BCR clonotype analysis (CDR3 clustering, clone expansion, TESSA analysis)
Enables integration of immune receptor information with single-cell expression data
Supports multiple sequencing platforms: 10x Genomics, AIRR, BD, Dandelion, Immcantation, MiXCR, ParseBio, TRUST4, WAT3R, Omniscope

Important: This process is automatically enabled when your sample info file contains TCRData or BCRData columns.

Configuration Structure

Process Enablement

[ScRepLoading]
cache = true  # Enable caching (default: true)

Input Specification

[ScRepLoading.in]
# Type: file
# Required: yes
# Description: Sample metadata file (tab-delimited) with TCR/BCR data paths
metafile = "path/to/sample_info.txt"

Required input file columns:

Sample: Unique identifier for each sample (required)
TCRData (for TCR analysis): Directory path to scTCR-seq data
BCRData (for BCR analysis): Directory path to scBCR-seq data
Additional columns: Treated as sample metadata (optional)

Data format requirements:

10x Genomics: Directory containing filtered_contig_annotations.csv or all_contig_annotations.csv
AIRR format: Directory containing airr_rearrangement.tsv
BD platform: Directory containing Contigs_AIRR.tsv
Dandelion: Directory containing all_contig_dandelion.tsv
Immcantation: Directory containing _data.tsv or similar
JSON: File with .json extension
MiXCR: Directory containing clones.tsv
ParseBio: Directory containing barcode_report.tsv
TRUST4: Directory containing barcode_report.tsv
WAT3R: Directory containing barcode_results.csv
Omniscope: Directory containing .csv files

Path handling:

If TCRData/BCRData specifies a directory: Process uses scRepertoire::loadContigs() directly
If TCRData/BCRData specifies a file: Creates symbolic link to temp directory for processing
When filename is not recognized by scRepertoire: Set envs.format explicitly

Environment Variables

[ScRepLoading.envs]
# type: choice - Data type to load (default: "auto")
# Options:
#   "TCR" - T cell receptor data
#   "BCR" - B cell receptor data
#   "auto" - Auto-detect from column names in sample info
# Note: If both TCRData and BCRData present, TCR selected by default
type = "auto"

# format: choice - Format of TCR/BCR data files (optional)
# Options: auto, 10X, AIRR, BD, Dandelion, Immcantation,
#          JSON, MiXCR, Omniscope, ParseBio, TRUST4, WAT3R
# If not provided, scRepertoire guesses from filename
format = "auto"

# combineTCR: json - Extra arguments for scRepertoire::combineTCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinetcr
combineTCR = {"samples": true}

# combineBCR: json - Extra arguments for scRepertoire::combineBCR()
# See: https://rdrr.io/github/ncborcherding/scRepertoire/man/combinebcr
combineBCR = {"samples": true}

# exclude: auto or list - Columns to exclude from metadata (default: auto)
# auto = ["BCRData", "TCRData", "RNAData"]
# Can also be comma-separated string: "BCRData,TCRData,RNAData"
exclude = "auto"

# tmpdir: str - Temporary directory for symbolic links (default: "/tmp")
tmpdir = "/tmp"

Detailed combineTCR Parameters

[ScRepLoading.envs.combineTCR]
# samples: bool or list - Sample labels (default: true)
# true = use Sample column from metadata
# false = no sample grouping
# list = explicit sample labels
samples = true

# ID: str or null - Additional sample labeling (optional)
# Adds prefix to barcodes to prevent duplicate issues
ID = null

# removeNA: bool - Remove cells with missing chain values (default: false)
# true = filter out cells with NA in any chain
# false = include cells with 1 NA value (default)
removeNA = false

# removeMulti: bool - Remove cells with >2 chains (default: false)
# true = filter out multi-chain cells (>2 chains)
# false = include multi-chain cells (default)
removeMulti = false

# filterMulti: bool - Select highest-expression chain for multi-chain (TCR default: false)
# true = keep highest UMI count chain if multiple chains present
# false = keep all chains (default)
filterMulti = false

# filterNonproductive: bool - Remove non-productive rearrangements (default: true)
# true = filter out non-functional receptors
# false = include all rearrangements
filterNonproductive = true

Detailed combineBCR Parameters

[ScRepLoading.envs.combineBCR]
# samples: bool or list - Sample labels (default: true)
samples = true

# ID: str or null - Additional sample labeling (optional)
ID = null

# call.related.clones: bool - Cluster related BCR clones (default: true)
# Uses nucleotide sequence + V gene with Levenshtein distance
# false = uses V gene + amino acid sequence for CTstrict
call.related.clones = true

# threshold: num - Normalized edit distance for clustering (default: 0.85)
# Higher = more permissive clustering (more sequences grouped)
# Range: 0.0 - 1.0
threshold = 0.85

# removeNA: bool - Remove cells with missing chain values (default: false)
removeNA = false

# removeMulti: bool - Remove cells with >2 chains (default: false)
removeMulti = false

# filterMulti: bool - Select highest-expression chain (default: true)
# true = keep highest UMI count chain
# false = keep all chains
filterMulti = true

# filterNonproductive: bool - Remove non-productive rearrangements (default: true)
filterNonproductive = true

Configuration Examples

Minimal Configuration (10x TCR Data)

[SampleInfo.in]
infile = "sample_info.txt"

# Sample info file contents:
# Sample  Age  Sex  Diagnosis  RNAData            TCRData
# C1      62   F    Colitis    /data/C1/rna      /data/C1/tcr
# C2      71   F    Colitis    /data/C2/rna      /data/C2/tcr

# ScRepLoading auto-enables when TCRData column present
# No explicit ScRepLoading section needed

Single Sample with Format Specification

[ScRepLoading]
cache = true

[ScRepLoading.in]
metafile = "metadata/single_sample.txt"

[ScRepLoading.envs]
type = "TCR"
format = "10X"

[ScRepLoading.envs.combineTCR]
removeNA = true
filterNonproductive = true

Multi-Sample BCR Analysis with Clustering

[ScRepLoading]
cache = true

[ScRepLoading.in]
metafile = "metadata/bcr_samples.txt"

[ScRepLoading.envs]
type = "BCR"

[ScRepLoading.envs.combineBCR]
call.related.clones = true
threshold = 0.85  # Higher threshold for more permissive clustering
filterMulti = true
removeMulti = false

Non-10x Format (AIRR)

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/airr_samples.txt"

[ScRepLoading.envs]
format = "AIRR"
type = "auto"

[ScRepLoading.envs.combineTCR]
removeNA = false
removeMulti = false

TRUST4 Format

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/trust4_samples.txt"

[ScRepLoading.envs]
format = "TRUST4"

[ScRepLoading.envs.combineTCR]
removeNA = true
filterNonproductive = true

Common Patterns

Pattern 1: 10x Genomics TCR Data (Most Common)

# sample_info.txt
# Sample    RNAData              TCRData
# Sample1   /data/Sample1/rna   /data/Sample1/vdj
# Sample2   /data/Sample2/rna   /data/Sample2/vdj

[SampleInfo.in]
infile = "sample_info.txt"

# TCR directories must contain filtered_contig_annotations.csv
# No ScRepLoading configuration needed - auto-detected

Pattern 2: Both TCR and BCR Data (Auto-Detect TCR)

# sample_info.txt
# Sample    RNAData              TCRData              BCRData
# Sample1   /data/Sample1/rna   /data/Sample1/tcr   /data/Sample1/bcr

[SampleInfo.in]
infile = "sample_info.txt"

# TCR selected by default when both columns present
# To explicitly analyze BCR instead:
[ScRepLoading.envs]
type = "BCR"

Pattern 3: Filtered TCR Data (Remove NA and Multi-Chain)

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/tcr_filtered.txt"

[ScRepLoading.envs.combineTCR]
removeNA = true      # Remove cells with missing chains
removeMulti = true   # Remove cells with >2 chains
filterNonproductive = true  # Remove non-functional receptors

Pattern 4: Relaxed Filtering for Exploratory Analysis

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/tcr_exploratory.txt"

[ScRepLoading.envs.combineTCR]
removeNA = false     # Keep cells with single chain
removeMulti = false  # Include multi-chain cells for inspection
filterNonproductive = false  # Include non-productive rearrangements

Pattern 5: BCR Clone Clustering with Custom Threshold

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/bcr_clustering.txt"

[ScRepLoading.envs.combineBCR]
call.related.clones = true
threshold = 0.90  # More stringent clustering (lower = more permissive)

Pattern 6: Sample-Specific Labeling

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/longitudinal.txt"

[ScRepLoading.envs.combineTCR]
samples = true  # Use Sample column from metadata
ID = "Timepoint"  # Add Timepoint as additional label prefix

# Creates barcodes like: "Sample1_Timepoint1_AAACCC..."
# Prevents duplicate barcode issues across timepoints

Pattern 7: Custom Metadata Exclusion

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/custom_columns.txt"

[ScRepLoading.envs]
exclude = ["RNAData", "TCRData", "BCRData", "ExperimentID", "Batch"]

# These columns excluded from scRepertoire object metadata
# Helps reduce metadata clutter in downstream analysis

Pattern 8: Paired Chain Analysis (TRA+TRB for TCR)

# Default behavior - ScRepLoading automatically pairs chains
# at cell barcode level when both TRA and TRB present

[ScRepLoading]

[ScRepLoading.in]
metafile = "metadata/tcr_paired.txt"

[ScRepLoading.envs.combineTCR]
removeNA = false  # Keep single-chain cells for inspection
filterMulti = false  # Don't filter multi-chain cells

# Later analysis can filter for true paired chains
# Using downstream processes like CDR3Clustering

Dependencies

Upstream Processes

SampleInfo (required): Provides sample metadata with TCRData/BCRData columns
LoadingRNAFromSeurat (alternative): When loading RNA from Seurat instead of SampleInfo

Downstream Processes

ScRepCombiningExpression: Integrates TCR/BCR data with scRNA-seq expression
CDR3Clustering: Clones cells by CDR3 sequence similarity
TESSA: TCR-specific analysis (epitope specificity prediction)
CDR3AAPhyschem: Physicochemical properties of CDR3 sequences
ClonalStats: Clonality statistics and diversity metrics

Validation Rules

Common Configuration Errors

Missing TCRData/BCRData column:
- Error: Process not enabled, no TCR/BCR analysis
- Fix: Add TCRData or BCRData column to sample info file
Invalid format specified:
- Error: scRepertoire fails to recognize file format
- Fix: Set envs.format to one of: 10X, AIRR, BD, Dandelion, Immcantation, JSON, MiXCR, ParseBio, TRUST4, WAT3R, Omniscope
Directory path not found:
- Error: Cannot access TCR/BCR data directory
- Fix: Verify paths in TCRData/BCRData columns exist and are readable
Missing required files in directory:
- Error: Expected contig file not found (e.g., filtered_contig_annotations.csv)
- Fix: Ensure directory contains appropriate file for specified format
Both TCR and BCR specified without type selection:
- Warning: TCR selected by default
- Fix: Set envs.type = "BCR" if BCR analysis intended

File Format Requirements

10x Genomics: Must have filtered_contig_annotations.csv in directory
AIRR: Must have airr_rearrangement.tsv in directory
BD: Must have Contigs_AIRR.tsv in directory
Dandelion: Must have all_contig_dandelion.tsv in directory
MiXCR: Must have clones.tsv in directory
TRUST4: Must have barcode_report.tsv in directory
ParseBio: Must have barcode_report.tsv in directory
WAT3R: Must have barcode_results.csv in directory

Chain Compatibility

TCR chains: Supports TRA, TRB, TRG, TRD (auto-detected from data)
BCR chains: Supports IGH, IGL, IGK (auto-detected from data)
Paired analysis: Automatically pairs TRA+TRB or IGH+IGL/IGK when both present
Single-chain: Keeps single-chain cells when removeNA = false

Troubleshooting

Issue: ScRepLoading not running

Cause: No TCRData or BCRData column in sample info file
Solution:

Add TCRData or BCRData column to sample info
Verify column name exactly matches (case-sensitive)
Check that SampleInfo.in.infile is correctly specified

Issue: "File format not recognized"

Cause: Filename doesn't match expected pattern for auto-detection
Solution:

Set envs.format explicitly to your format type
Example: format = "TRUST4" for TRUST4 output
Verify directory contains expected file for that format

Issue: "No cells loaded" or empty output

Cause: Too aggressive filtering or mismatched barcodes
Solution:

Set removeNA = false and removeMulti = false temporarily
Check that TCR/BCR barcodes match RNA barcodes
Verify filterMulti is appropriate for your data type

Issue: Duplicate barcode errors

Cause: Multiple samples have identical cell barcodes
Solution:

Set ID = "Sample" or use explicit sample labels
This adds sample prefix to barcodes: Sample1_AAACCC...
Required when merging samples from same run

Issue: BCR clustering too strict/too permissive

Cause: Default threshold (0.85) not optimal for data
Solution:

Adjust envs.combineBCR.threshold
Higher (0.90+): More stringent, fewer clusters
Lower (0.80-): More permissive, more sequences clustered together

Issue: Single-chain cells lost

Cause: filterNonproductive = true or removeNA = true
Solution:

For exploratory analysis, set removeNA = false
For developmental studies, consider filterNonproductive = false
Use filterMulti = true only when confident in data quality

Issue: Metadata columns missing from output

Cause: Excluded by default (exclude = "auto")
Solution:

Set exclude = [] to keep all metadata columns
Or specify custom list: exclude = ["RNAData"]
Default excludes: RNAData, TCRData, BCRData

Issue: Cannot load from specific directory path

Cause: Path not accessible or permission issues
Solution:

Verify directory exists and is readable
Check file permissions: ls -la path/to/tcr/
Use absolute paths if relative paths fail

Issue: Combining TCR and BCR data separately

Cause: Need to analyze both receptor types
Solution:

Run pipeline twice with different type settings
First run: [ScRepLoading.envs] type = "TCR"
Second run: [ScRepLoading.envs] type = "BCR"
Use different output directories to avoid conflicts

Issue: Integration with ScRepCombiningExpression fails

Cause: Barcodes don't match between RNA and VDJ data
Solution:

Ensure same samples used in both RNA and VDJ data
Check that SampleInfo has correct paths for both data types
Verify barcode prefixes match (if using ID parameter)

screploading

Install

ScRepLoading Process Configuration

Purpose

When to Use

Configuration Structure

Process Enablement

Input Specification

Environment Variables

Detailed combineTCR Parameters

Detailed combineBCR Parameters

Configuration Examples

Minimal Configuration (10x TCR Data)

Single Sample with Format Specification

Multi-Sample BCR Analysis with Clustering

Non-10x Format (AIRR)

TRUST4 Format

Common Patterns

Pattern 1: 10x Genomics TCR Data (Most Common)

Pattern 2: Both TCR and BCR Data (Auto-Detect TCR)

Pattern 3: Filtered TCR Data (Remove NA and Multi-Chain)

Pattern 4: Relaxed Filtering for Exploratory Analysis

Pattern 5: BCR Clone Clustering with Custom Threshold

Pattern 6: Sample-Specific Labeling

Pattern 7: Custom Metadata Exclusion

Pattern 8: Paired Chain Analysis (TRA+TRB for TCR)

Dependencies

Upstream Processes

Downstream Processes

Validation Rules

Common Configuration Errors

File Format Requirements

Chain Compatibility

Troubleshooting

Issue: ScRepLoading not running

Issue: "File format not recognized"

Issue: "No cells loaded" or empty output

Issue: Duplicate barcode errors

Issue: BCR clustering too strict/too permissive

Issue: Single-chain cells lost

Issue: Metadata columns missing from output

Issue: Cannot load from specific directory path

Issue: Combining TCR and BCR data separately

Issue: Integration with ScRepCombiningExpression fails

Categories

Install

Recommended Skills