Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
Install
npx skillscat add pwwang/immunopipe/cdr3aaphyschem Install via the SkillsCat registry.
CDR3AAPhyschem Process Configuration
Purpose
Analyzes physicochemical properties of CDR3 amino acid sequences to understand biochemical characteristics of T-cell receptor repertoires. Performs regression analysis between two cell groups at different CDR3 lengths for each physicochemical feature (hydrophobicity, volume, isoelectric point, etc.).
When to Use
- To analyze CDR3 biochemical properties differences between cell groups (e.g., Treg vs Tconv)
- For feature engineering in TCR machine learning models
- To identify sequence features that distinguish cell subsets
- After
ScRepCombiningExpression(requires combined TCR + RNA data) - When investigating T cell fate determination (regulatory vs conventional T cells)
Configuration Structure
Process Enablement
[CDR3AAPhyschem]
cache = trueInput Specification
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]scrfile: Output fromScRepCombiningExpression(RDS or qs/qs2 format)- Must contain both TRA and TRB chains
- Generated by
scRepertoire::combineExpression()
Environment Variables
[CDR3AAPhyschem.envs]
# Group comparison specification
group = "CellType"
comparison = {Treg = ["CD4 CTL", "CD4 Naive", "CD4 TCM", "CD4 TEM"], Tconv = "Tconv"}
target = "Treg"
each = "Sample"
# Chain selection
chain = "TRB"Key Parameters:
group: Column name in metadata defining groups to compare (e.g.,CellType,seurat_clusters)comparison: Two-group specification for regression analysis- Format 1 (dict):
Group1 = ["cell1", "cell2"], Group2 = "cell3" - Format 2 (list):
["Group1", "Group2"](when groups exist in column)
- Format 1 (dict):
target: Which group to label as 1 in regression (default: first group incomparison)each: Column(s) to split data for separate analyses- Single column:
"Sample" - Multiple columns:
["Sample", "Patient"] - Comma-separated:
"Sample,Patient" - If not provided, all cells used together
- Single column:
Configuration Examples
Minimal Configuration
[CDR3AAPhyschem]
[CDR3AAPhyschem.in]
scrfile = ["ScRepCombiningExpression"]Standard Treg vs Tconv Analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Define cell type groups for comparison
group = "CellType"
comparison = {Treg = ["Treg"], Tconv = ["Tconv"]}
target = "Treg"
chain = "TRB"Multi-Sample Analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
# Run regression separately for each sample
each = "Sample"
chain = "TRB"Custom Group Definition
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "Cluster"
# Define clusters to compare
comparison = {
HighQuality = ["c1", "c2", "c5"],
LowQuality = ["c3", "c4"]
}
target = "HighQuality"
chain = "TRB"Physicochemical Properties
Available Properties
The process calculates 8 key physicochemical properties from CDR3 amino acid sequences:
| Property | Description | Biological Significance |
|---|---|---|
| length | Total amino acid count in CDR3 | Influences binding loop size and flexibility |
| gravy | Grand Average of Hydrophobicity (Kyte-Doolittle scale) | Hydrophobic CDR3s associate with self-reactivity and Treg fate |
| bulkiness | Average bulkiness (Zimmerman scale) | Measures steric bulk of amino acids |
| polarity | Average polarity (Grantham scale) | Influences interactions with peptide-MHC |
| aliphatic | Normalized aliphatic index (Ikai scale) | Related to thermal stability |
| charge | Normalized net charge at physiological pH | Affects electrostatic interactions |
| acidic | Acidic side chain residue content (D, E proportion) | Contributes to negative charge |
| aromatic | Aromatic side chain content (F, W, Y proportion) | Important for π-π interactions |
Property Calculation Methods
- Default scales: Standard biophysical scales from peer-reviewed literature
- GRAVY: Kyte & Doolittle (1982) hydropathy scale
- Bulkiness: Zimmerman et al. (1968) bulkiness parameters
- Polarity: Grantham (1974) amino acid difference index
- Aliphatic index: Ikai (1980) thermodynamic stability scale
- Charge: Normalized based on pKa values (EMBOSS database)
- Acidic/Basic/Aromatic: Direct residue counting proportions
Regression Analysis
- Performed for each physicochemical property independently
- Compares properties across CDR3 length distributions
- Binary classification: target group (1) vs non-target (0)
- Output: Statistical significance of property differences
Common Patterns
Pattern 1: Treg vs Tconv (TRB Chain)
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Literature-based: hydrophobic CDR3β promotes Treg fate
group = "CellType"
comparison = {Treg = ["Treg", "CD4+Treg"], Tconv = ["Tconv", "CD4+Tconv"]}
target = "Treg"
chain = "TRB"
each = "" # Analyze all samples togetherPattern 2: Selected Properties Only
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Focus on hydrophobicity (key Treg feature)
group = "CellType"
comparison = ["Treg", "Tconv"]
target = "Treg"
chain = "TRB"
# To analyze specific chains separatelyPattern 3: Multi-Chain Analysis
Run separate processes for different chains:
# TRB analysis
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
chain = "TRB"
group = "CellType"
comparison = ["Treg", "Tconv"]
# Note: Create separate config for TRA analysis if neededPattern 4: Multi-Group Comparisons
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {
Naive = ["CD4 Naive", "CD8 Naive"],
Memory = ["CD4 TEM", "CD4 TCM", "CD8 TEM", "CD8 TCM"],
Effector = ["CD4 CTL", "CD8 CTL"]
}
target = "Naive"
chain = "TRB"Dependencies
- Upstream:
ScRepCombiningExpression(required) - Downstream: Feature analysis, ML model training, publication figures
- Required data: Both TRA and TRB chains in combined object
Validation Rules
- CDR3 sequence requirements: Must have valid amino acid sequences (no Ns)
- Chain requirement: Data must contain specified chain (TRA or TRB)
- Group specification: Groups must exist in metadata
- Minimum cells: Sufficient cells per group for statistical regression
- Length distribution: CDR3 length range must be adequate for regression
Troubleshooting
Issue: "Missing chain in data"
Cause: Specified chain (TRA/TRB) not found in combined object
Solution:
# Change to available chain
[CDR3AAPhyschem.envs]
chain = "TRA" # or "TRB"Issue: "Group not found in metadata"
Cause: group column or comparison values don't exist
Solution:
- Check available metadata columns in
ScRepCombiningExpressionoutput - Verify group names match exactly (case-sensitive)
[CDR3AAPhyschem.envs]
group = "seurat_clusters" # If CellType not available
comparison = ["0", "1"] # Use cluster IDsIssue: "Insufficient cells for regression"
Cause: Too few cells in one or more groups
Solution:
- Use
eachto analyze samples separately if pooled analysis fails - Combine similar cell types in
comparison
[CDR3AAPhyschem.envs]
# Combine rare subtypes
comparison = {HighExpander = ["Treg", "Tconv"], LowExpander = ["Tfh"]}Issue: "No significant property differences"
Cause: Groups may not differ in physicochemical properties
Solution:
- Check if
comparisongroups are biologically distinct - Consider different
groupcolumn (e.g., gene expression clusters) - Verify CDR3 sequences are high-quality
Scientific Context
Key Publications
- Stadinski et al. (2016): "Hydrophobic CDR3 residues promote development of self-reactive T cells" - Nature Immunology
- Lagattuta et al. (2022): "TCR sequence features influence T cell fate" - Nature Immunology
- Ostmeyer et al. (2019): "Biophysicochemical motifs distinguish TILs from healthy tissue" - Cancer Research
Interpretation Guidelines
- High GRAVY: More hydrophobic CDR3 (associated with self-reactivity, Treg)
- High charge: Electrostatic potential may affect binding affinity
- High aromaticity: Increased π-π interactions, structural stability
- Length distribution: Longer CDR3s may provide broader specificity
Feature Engineering Applications
Use properties as features for:
- TCR specificity prediction models
- T cell fate classification (Treg vs Tconv)
- Antigen binding affinity estimation
- Cross-reactivity assessment
Output Format
- Directory:
{{in.scrfile | stem}}.cdr3aaphyschem/ - Files:
- Regression plots per property (hydrophobicity, volume, pI)
- Statistical tables comparing groups
- CDR3 length distributions
- Property correlation matrices
- Visualizations:
- Property vs length scatter plots
- Group-wise property boxplots
- Regression curves with confidence intervals
Advanced Usage
Custom Property Scales
If using non-default scales (requires modifying underlying R script):
# Note: Advanced usage - may require script modification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Specify alternative hydrophobicity scale
hydro_scale = "Wimley"
pK_source = "Murray"Length-Based Stratification
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
# Analyze by CDR3 length bins
group = "CellType"
comparison = ["Treg", "Tconv"]
# Use metadata column with length information
each = "CDR3_Length_Bin"
chain = "TRB"Publication-Ready Plots
[CDR3AAPhyschem]
[CDR3AAPhyschem.envs]
group = "CellType"
comparison = {Treg = "Treg", Tconv = "Tconv"}
target = "Treg"
chain = "TRB"
# Publication parameters
plot_theme = "nature"
fig_dpi = 300
fig_format = "pdf"