IOBR

IOBRskill

End-to-end bulk RNA-seq / microarray tumor microenvironment (TME) analysis using the IOBR R package. Covers data preprocessing, gene annotation, TME deconvolution (CIBERSORT/MCPcounter/EPIC/xCell/ESTIMATE/TIMER/quanTIseq/IPS), signature scoring (PCA/ssGSEA/Z-score), ligand-receptor analysis, pathway enrichment, survival modeling, and Nature-style visualization. Use this skill when the user mentions: tumor microenvironment analysis, TME deconvolution, immune infiltration analysis, 肿瘤微环境分析, 肿瘤微环境解析, 免疫浸润分析, ligand-receptor analysis, 受体配体分析, IOBR, immune cell deconvolution, CIBERSORT, MCPcounter, ESTIMATE, or any bulk RNA-seq immuno-oncology workflow. Also trigger on /IOBRskill.

IOBR 1 Updated 3w ago

Resources

5
GitHub

Install

npx skillscat add iobr/iobrskill

Install via the SkillsCat registry.

SKILL.md

IOBRskill — Bulk Transcriptome Tumor Microenvironment Analysis

IOBR Citation (MUST include in every R script and log)

Every R script generated by this skill MUST begin with this citation header:

###############################################################################
# IOBR Citation:
# Zeng DQ, Fang YR, … , Yu GC, Liao WJ.
# Enhancing Immuno-Oncology Investigations Through Multidimensional Decoding of
# Tumour Microenvironment with IOBR 2.0, Cell Reports Methods, 2024
# https://doi.org/10.1016/j.crmeth.2024.100910
###############################################################################

The same citation header MUST appear at the top of every log file in 06-log/.


Prerequisites

Environment Check (first step)

Verify IOBR is installed in the base conda environment:

source /home/bio/miniconda3/etc/profile.d/conda.sh
conda activate base
Rscript -e 'packageVersion("IOBR")'

If IOBR is not installed:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("IOBR/IOBR")

Required Dependencies

IOBR relies on several packages that are NOT auto-installed. Install these before starting:

# For CIBERSORT
BiocManager::install("preprocessCore")

# For quanTIseq deconvolution
install.packages("limSolve", repos = "https://cloud.r-project.org")

# For tme_cluster()
install.packages("NbClust", repos = "https://cloud.r-project.org")

# For iobr_pca()
install.packages("FactoMineR", repos = "https://cloud.r-project.org")

# For sig_heatmap() — only if using tidyHeatmap approach (pheatmap preferred)
# install.packages("tidyHeatmap", repos = "https://cloud.r-project.org")

IOBR Built-in Data

IOBR ships with rich built-in data — you rarely need external annotation files. For full details, read references/iobr_built_in_data.md. Key resources:

Resource Access What It Contains
Gene length / ID mapping count2tpm() auto-loads 111 genes with HUGO, ENTREZID, ENSEMBL
Microarray probe sets extdata/probesets.txt 176 probes mapped to cell types
TME signatures (186) IOBR:::signature_tme Immune, stromal, EMT, checkpoint signatures
Full collection (323) data("signature_collection") All signatures + Hallmark + Metabolism
Signature groupings data("sig_group") 35+ categories for filtering signatures
Pathway data (on-demand) IOBR::load_data() hallmark (50), go_bp (7658), kegg (186), reactome (1615)
Example datasets data("pdata_stad") etc. TCGA-STAD, IMvigor210 toy data

On-demand pathway data via load_data()

IOBR provides IOBR::load_data() to download and cache pathway gene sets from GitHub on first use. These are essential for comprehensive pathway scoring:

hallmark  <- IOBR::load_data("hallmark")   # 50 pathways
go_bp     <- IOBR::load_data("go_bp")      # 7,658 gene sets
go_cc     <- IOBR::load_data("go_cc")
go_mf     <- IOBR::load_data("go_mf")
kegg      <- IOBR::load_data("kegg")       # 186 pathways
reactome  <- IOBR::load_data("reactome")   # 1,615 pathways

Project Directory Structure

Create this structure before starting analysis:

project/
├── 01-script/         # R scripts (numbered, each with IOBR citation header)
├── 02-input/          # Normalized and annotated expression matrices
├── 03-tme/            # TME deconvolution and signature scoring results
├── 04-figs/
│   ├── *.png/pdf      # All figures (PNG 300dpi + PDF, numbered naming)
│   ├── kmplot/        # KM survival plots (auto-saved by sig_surv_plot)
│   └── data/          # Statistical result tables matching figures
├── 05-note/
│   ├── IOBR-pipeline.md           # Analysis plan (generated in Phase 0)
│   ├── pdata_summary.csv          # Phenotype variable summary
│   └── IOBR-analysis-README.md    # Final analysis summary
└── 06-log/            # R execution logs (one per script, with citation header)

Naming conventions:

  • Scripts: 01-data_preprocessing.R, 02-tme_deconvolution.R, ...
  • Figures: Fig01-barplot_cibersort.png + Fig01-barplot_cibersort.pdf (always dual format)
  • Statistical tables: 04-figs/data/01-tme_pdata_merged.csv, 04-figs/data/04-wilcoxon_results.csv, ...
  • Logs: 01-data_preprocessing.log, 02-tme_deconvolution.log, ...
  • Note: IOBR-analysis-README.md (tree diagram + brief interpretation)

When running a script, capture the log:

Rscript 01-script/01-data_preprocessing.R > 06-log/01-data_preprocessing.log 2>&1

Analysis Pipeline

6 sequential phases (Phase 0–5). Pause at decision points to let the user choose.

Phase 0: Plan Generation

Before writing any analysis code, generate an analysis plan document.

After confirming data type / species / deconvolution methods with the user (Phase 1 questions), create 05-note/IOBR-pipeline.md with:

  1. Project metadata: project name, data source, data type, species
  2. ASCII tree diagram of all scripts, their inputs/outputs, and expected results:
01-data_preprocessing.R
  ├─ Input:  raw expression matrix + phenotype data
  ├─ Output: 02-input/annotated_eset.csv, 02-input/pdata.csv
  └─ Notes:  log2 transformation, gene ID conversion

02-tme_deconvolution.R
  ├─ Input:  02-input/annotated_eset.csv
  ├─ Output: 03-tme/tme_cibersort.csv, 03-tme/tme_mcpcounter.csv, ...
  └─ Notes:  CIBERSORT + MCPcounter + ESTIMATE (linear-scale data)

03-signature_analysis.R
  ├─ Input:  02-input/annotated_eset.csv
  ├─ Output: 03-tme/sig_score_tme.csv, 03-tme/sig_score_pathway.RData
  └─ Notes:  ssGSEA for TME signatures, Hallmark + KEGG pathways

04-statistical_analysis.R
  ├─ Input:  03-tme/*.csv + 02-input/pdata.csv
  ├─ Output: 04-figs/data/01-tme_pdata_merged.csv
  │          04-figs/data/02-tme_scaled.csv
  │          04-figs/data/03-tme_subtype.csv
  │          04-figs/data/04-wilcoxon_results.csv
  │          04-figs/data/05-surv_results.csv
  │          04-figs/data/06-cor_matrix.csv
  └─ Notes:  merge → scale → cluster → wilcoxon → survival → correlation

05-visualization.R
  ├─ Input:  04-figs/data/*.csv + 03-tme/*.csv
  ├─ Output: 04-figs/Fig01-09 (PNG + PDF)
  └─ Notes:  barplots → heatmaps → correlation → boxplots → forest → KM
  1. Present to user: "开始分析" or "调整计划"
  2. Only after approval, begin Phase 1

Phase 1: Data Preprocessing and Annotation

Script: 01-data_preprocessing.R

Confirm with the user:

  • Data source: User matrix, GEO (GSE#####), or TCGA
  • Data type: "Count" (raw RNA-seq), "Tpm" (already normalized), or "Array" (microarray)
  • Species: "hsa" (human) or "mmu" (mouse)

Sample matching (if both eset and pdata exist):

Before any analysis, verify sample overlap between expression matrix and phenotype data:

# Report sample counts
eset_samples  <- colnames(eset)
pdata_samples <- pdata$ID   # or rownames(pdata)
intersection  <- intersect(eset_samples, pdata_samples)

cat("Expression matrix samples:", length(eset_samples), "\n")
cat("Phenotype data samples:",    length(pdata_samples), "\n")
cat("Matched samples:",           length(intersection), "\n")

# Ask user: use all samples, intersection only, or check data?
# Then filter both to agreed sample set
eset  <- eset[, intersection]
pdata <- pdata[pdata$ID %in% intersection, ]

pdata variable summary:

After loading pdata, generate a summary report:

# Generate pdata_summary.csv
pdata_summary <- data.frame(
  var       = character(),
  type      = character(),
  n_levels  = integer(),
  levels    = character(),
  stringsAsFactors = FALSE
)

for (col in colnames(pdata)) {
  vals <- pdata[[col]]
  if (is.numeric(vals)) {
    tp <- ifelse(length(unique(vals)) <= 2, "binary", "numeric")
    lvls <- paste(range(vals, na.rm = TRUE), collapse = " - ")
    n_lv <- length(unique(vals))
  } else {
    tp <- "factor"
    lvls <- paste(unique(vals)[1:min(10, length(unique(vals)))], collapse = ", ")
    n_lv <- length(unique(vals))
  }
  pdata_summary <- rbind(pdata_summary, data.frame(
    var = col, type = tp, n_levels = n_lv, levels = lvls
  ))
}

write.csv(pdata_summary, file = "05-note/pdata_summary.csv", row.names = FALSE)
cat(">>>>> pdata summary saved to 05-note/pdata_summary.csv\n")

This summary helps decide which statistical tests to apply in Phase 4.

Key functions:

Function Purpose When to Use
count2tpm() Count → TPM + gene ID mapping Raw RNA-seq counts
log2eset() Auto-detect and apply log2 transform Ensures log2 scale for downstream analysis
anno_eset() Probe → gene symbol annotation Microarray data or unannotated matrices
remove_batcheffect() ComBat / ComBat-seq batch correction Multi-batch datasets
find_outlier_samples() WGCNA-based outlier detection QC step
iobr_pca() PCA visualization for batch assessment QC step
mouse2human_eset() Mouse → human gene ortholog conversion Mouse data
# RNA-seq: counts → TPM
eset <- count2tpm(countMat = expr_matrix, idType = "Ensembl", org = "hsa")

# Microarray: probe annotation
eset <- anno_eset(eset = expr_matrix, annotation = anno_df,
                  symbol = "gene_symbol", probe = "probe_id", method = "mean")

# Ensure log2 scale (auto-detects — only transforms if not already log2)
eset <- log2eset(eset)

Save output:

write.csv(eset,  file = "02-input/annotated_eset.csv")
write.csv(pdata, file = "02-input/pdata.csv")

Phase 2: TME Deconvolution

Script: 02-tme_deconvolution.R

Ask the user which method(s) to use.

Method Cell Types Best For
"cibersort" 22 (LM22) Gold standard, comprehensive
"cibersort_abs" 22 (LM22) CIBERSORT absolute mode (fraction-independent)
"mcpcounter" 8 populations Quick, stromal + immune
"epic" 8 incl. cancer cells Tumor purity
"xcell" 64 types Broadest coverage (slow)
"estimate" 4 scores Stromal/Immune/Tumor purity
"timer" 6 types TCGA cancer-specific
"quantiseq" 10 incl. M1/M2 Macrophage polarization
"ips" 4 axes Immunotherapy response

Recommended: CIBERSORT + MCPcounter + ESTIMATE (most cited). For comprehensive analysis, run all 9 methods and merge.

Single-method usage:

tme_result <- deconvo_tme(eset = eset, method = "cibersort", arrays = FALSE, perm = 1000)
write.csv(tme_result, file = "03-tme/tme_cibersort.csv")

Full 9-method deconvolution + merge (production workflow):

# Each method may take 1-5 minutes depending on sample size
# Use tryCatch for each method — some may fail on certain datasets
cibersort <- tryCatch(deconvo_tme(eset = eset, method = "cibersort",
                                   arrays = FALSE, perm = 1000), error = function(e) NULL)
cibersort_abs <- tryCatch(deconvo_tme(eset = eset, method = "cibersort_abs",
                                       arrays = FALSE, perm = 1000), error = function(e) NULL)
epic      <- tryCatch(deconvo_tme(eset = eset, method = "epic",
                                   arrays = FALSE), error = function(e) NULL)
mcp       <- tryCatch(deconvo_tme(eset = eset, method = "mcpcounter"), error = function(e) NULL)
xcell     <- tryCatch(deconvo_tme(eset = eset, method = "xcell",
                                   arrays = FALSE), error = function(e) NULL)
estimate  <- tryCatch(deconvo_tme(eset = eset, method = "estimate"), error = function(e) NULL)
timer     <- tryCatch(deconvo_tme(eset = eset, method = "timer",
                                   group_list = rep(tumor_type, ncol(eset))), error = function(e) NULL)
quantiseq <- tryCatch(deconvo_tme(eset = eset, method = "quantiseq",
                                   tumor = TRUE, arrays = FALSE, scale_mrna = TRUE), error = function(e) NULL)
ips       <- tryCatch(deconvo_tme(eset = eset, method = "ips", plot = FALSE), error = function(e) NULL)

# Merge all results by sample ID using base R merge()
results_list <- list(cibersort, cibersort_abs, mcp, xcell, epic, estimate, quantiseq, timer, ips)
results_list <- results_list[!sapply(results_list, is.null)]

tme_combine <- results_list[[1]]
for (i in seq_along(results_list)[-1]) {
  tme_combine <- merge(tme_combine, results_list[[i]], by = "ID")
}

save(tme_combine, file = "03-tme/tme_combine.RData")
write.csv(tme_combine, file = "03-tme/tme_combine.csv", row.names = FALSE)

Critical caveats:

  1. All deconvolution methods (except IPS) need linear-scale data — Anti-log first: eset_linear <- 2^eset if log2-transformed. CIBERSORT, MCPcounter, ESTIMATE, EPIC, quanTIseq, xCell all warn "Data values appear small (<50)" when fed log2 data. Only IPS works with log2 data directly.
  2. Column names have method suffixT_cells_CD8_CIBERSORT, T_cells_MCPcounter, etc.
  3. Output CSV has ID column — Set rownames: rownames(df) <- df$ID; df$ID <- NULL.
  4. arrays = TRUE for microarray; perm = 1000 recommended for CIBERSORT.
  5. Required packagespreprocessCore (CIBERSORT), limSolve (quanTIseq). Install before running.
  6. TIMER requires group_list — Provide cancer type per sample: rep(tumor_type, ncol(eset)).
  7. quanTIseq parameters — Use tumor = TRUE, arrays = FALSE, scale_mrna = TRUE for tumor samples.
  8. Use tryCatch for each method — some methods may fail on certain datasets. Wrap each call to avoid losing partial results.

Phase 3: Signature Scoring and Pathway Analysis

Script: 03-signature_analysis.R

Scoring methods (ask user to choose):

Method Best For
"ssgsea" Default, robust
"pca" Continuous scores, large gene sets
"zscore" Simple, fast
"integration" Combined approach

Step 3a: TME / IO biomarker signature scoring:

# Option A: TME signatures (186 sets, internal — use :::)
signature_tme <- IOBR:::signature_tme
sig_tme <- calculate_sig_score(pdata = NULL, eset = eset,
                                signature = signature_tme,
                                method = "ssgsea",
                                mini_gene_count = 3,
                                adjust_eset = TRUE)
write.csv(sig_tme, file = "03-tme/sig_score_tme.csv")

# Option B: Full collection (323 sets) — for comprehensive analysis
data("signature_collection")
sig_res <- calculate_sig_score(pdata = NULL, eset = eset,
                                signature = signature_collection,
                                method = "pca",
                                mini_gene_count = 2,
                                adjust_eset = TRUE)
write.csv(sig_res, file = "03-tme/sig_score_collection.csv")

Step 3b: Pathway scoring (Hallmark + GO + KEGG + Reactome):

hallmark  <- IOBR::load_data("hallmark")    # 50 pathways
go_bp     <- IOBR::load_data("go_bp")       # 7,658 gene sets
go_cc     <- IOBR::load_data("go_cc")
go_mf     <- IOBR::load_data("go_mf")
kegg      <- IOBR::load_data("kegg")        # 186 pathways
reactome  <- IOBR::load_data("reactome")    # 1,615 pathways

sig_pathway <- calculate_sig_score(pdata = NULL, eset = eset,
                                    signature = c(hallmark, go_bp, go_cc, go_mf, kegg, reactome),
                                    method = "ssgsea",
                                    mini_gene_count = 3)
write.csv(sig_pathway, file = "03-tme/sig_score_pathway.csv")

Step 3c: Merge all results into master table:

tme_sig_combine <- merge(tme_combine, sig_res, by = "ID")
tme_sig_combine <- merge(tme_sig_combine, sig_pathway, by = "ID")
write.csv(tme_sig_combine, file = "03-tme/tme_sig_combine.csv", row.names = FALSE)

Key notes:

  1. pdata = NULL, adjust_eset = TRUE — This is the correct production pattern for calculate_sig_score(). Do NOT set adjust_eset = FALSE unless data is already properly scaled.
  2. Method names are lowercase — Use "pca", "ssgsea", "zscore", "integration" (NOT uppercase "PCA").
  3. signature_tme is internal — Access via IOBR:::signature_tme (triple colon).
  4. signature_collection needs data() — Call data("signature_collection") before use.
  5. IOBR::load_data() downloads on first use — Caches locally for subsequent calls.
  6. Pathway scoring is computationally heavy — Consider running Hallmark (50) and KEGG (186) first for quick results, then GO/Reactome for comprehensive analysis.
  7. Use base R merge() for joining — Not all environments have tidyverse. Prefer merge(df1, df2, by = "ID") over %>% inner_join().
  8. tme_cluster() requires NbClust — Install: install.packages("NbClust").

Use references/iobr_built_in_data.md for the full signature catalog and data("sig_group") for filtering by category.

TME subtype clustering:

tme_clusters <- tme_cluster(input = tme_result, features = cell_types,
                             id = "sample_id", method = "kmeans")

Phase 4: Statistical Analysis

Script: 04-statistical_analysis.R

This phase produces structured output: every statistical result is saved to 04-figs/data/ as a numbered CSV file for traceability and reuse in Phase 5 visualization.

4a. Data Preparation

Merge TME results with pdata, then scale all TME cell columns:

# Load data
pdata <- read.csv("02-input/pdata.csv")
tme   <- read.csv("03-tme/tme_combine.csv")

# Merge pdata with TME results
tme_pdata <- merge(pdata, tme, by = "ID")
write.csv(tme_pdata, file = "04-figs/data/01-tme_pdata_merged.csv", row.names = FALSE)

# Collect TME cell columns — MUST match actual column names in merged data
# After merge, cibersort + cibersort_abs shared names get .x/.y suffixes
# So match by checking actual colname OR colname-without-suffix against original list
qc_patterns <- c("^P[.]", "^Correlation", "^RMSE")
all_tme_cell_cols <- colnames(tme_pdata)[
  colnames(tme_pdata) %in% original_cell_cols |
  gsub("\\.[xy]$", "", colnames(tme_pdata)) %in% original_cell_cols
]
all_tme_cell_cols <- setdiff(all_tme_cell_cols, "ID")

# Z-score scale all TME cell columns (per column = per variable)
tme_scaled <- tme_pdata
tme_scaled[, all_tme_cell_cols] <- scale(tme_scaled[, all_tme_cell_cols])
write.csv(tme_scaled, file = "04-figs/data/02-tme_scaled.csv", row.names = FALSE)

4b. TME Subtyping

Cluster samples by TME cell composition using CIBERSORT fractions:

# Extract CIBERSORT cell columns for clustering
cb_cols <- grep("_CIBERSORT$", colnames(tme_pdata), value = TRUE)
cb_cols <- setdiff(cb_cols, grep("P[.]value|Correlation|RMSE", cb_cols, value = TRUE))

# K-means clustering on scaled CIBERSORT data
set.seed(123)
tme_subtype <- tme_cluster(input = tme_scaled, features = cb_cols,
                            id = "ID", method = "kmeans", max.nc = 6)

# tme_cluster returns data.frame with columns: ID, cluster, <cell_cols>
# Extract cluster assignments and label as TME subtypes
subtype_df <- data.frame(ID = tme_subtype$ID, TME_subtype = paste0("TME", tme_subtype$cluster))
write.csv(subtype_df, file = "04-figs/data/03-tme_subtype.csv", row.names = FALSE)

# Find representative TME variables per subtype
markers <- find_markers_in_bulk(pdata = tme_scaled, eset = tme_scaled,
                                 group = "TME_subtype", nfeatures = 50)
write.csv(markers, file = "04-figs/data/03-tme_subtype_markers.csv", row.names = FALSE)

4c. Statistical Testing (conditional)

Run statistical tests based on what's available in pdata:

# --- Wilcoxon test (IF categorical variables exist in pdata) ---
# Check pdata_summary.csv for factor/binary variables
pdata_summary <- read.csv("05-note/pdata_summary.csv")
cat_vars <- pdata_summary$var[pdata_summary$type %in% c("factor", "binary")]

if (length(cat_vars) > 0) {
  for (v in cat_vars) {
    lvls <- unique(tme_pdata[[v]])
    lvls <- lvls[!is.na(lvls)]
    if (length(lvls) == 2) {
      wilcox_res <- batch_wilcoxon(data = tme_pdata, target = v, feature = cell_cols)
      write.csv(wilcox_res,
                file = paste0("04-figs/data/04-wilcoxon_", v, ".csv"),
                row.names = FALSE)
    }
  }
}

# --- Survival analysis (IF survival data exists) ---
# Check for time + status columns
time_cols   <- grep("time|Time|OS_time|PFS_time", colnames(tme_pdata), value = TRUE)
status_cols <- grep("status|Status|OS_status|PFS_status", colnames(tme_pdata), value = TRUE)

if (length(time_cols) > 0 && length(status_cols) > 0) {
  time_col   <- time_cols[1]
  status_col <- status_cols[1]

  # Batch Cox regression for all TME cell columns
  surv_res <- batch_surv(pdata = tme_pdata, variable = cell_cols,
                          time = time_col, status = status_col)
  write.csv(surv_res, file = "04-figs/data/05-surv_results.csv", row.names = FALSE)

  # Per-method survival (if multiple deconvolution methods were run)
  methods_run <- unique(gsub(".*_", "", cell_cols))
  for (m in methods_run) {
    m_cols <- grep(paste0("_", m, "$"), cell_cols, value = TRUE)
    if (length(m_cols) > 0) {
      m_surv <- batch_surv(pdata = tme_pdata, variable = m_cols,
                           time = time_col, status = status_col)
      write.csv(m_surv,
                file = paste0("04-figs/data/05-surv_", m, ".csv"),
                row.names = FALSE)
    }
  }
}

Key notes for survival:

  • batch_surv() requires merged pdata — All variables must be columns in pdata. Use merge() first.
  • batch_surv() parameter is variable — A vector of column names, NOT a separate data frame.
  • batch_surv() output columns: P, HR, CI_low_0.95, CI_up_0.95, ID.

4d. Correlation

Correlation matrix of all TME variables:

# batch_cor(data, target, feature, method) — correlates one target vs multiple features
# NOTE: batch_cor() returns target-vs-each-feature, NOT a pairwise matrix.
# For a full pairwise correlation matrix, use base R cor() instead (see below).
# Key immune cells as target, all TME cells as features
key_cells <- c("T_cells_CD8_CIBERSORT", "Macrophages_M1_CIBERSORT")
cor_all <- data.frame()
for (tc in key_cells) {
  cor_res <- batch_cor(data = tme_pdata, target = tc, feature = cell_cols, method = "spearman")
  cor_res$target <- tc
  cor_all <- rbind(cor_all, cor_res)
}
write.csv(cor_all, file = "04-figs/data/06-cor_matrix.csv", row.names = FALSE)

# For full pairwise correlation matrix (for heatmap visualization), use base R:
cor_mat <- cor(tme_pdata[, cell_cols], method = "spearman", use = "pairwise.complete.obs")
save(cor_mat, file = "04-figs/data/06-cor_matrix_full.RData")

4e. Other Statistical Functions

Function Purpose Output
batch_pcc() Batch partial correlation controlling for a third variable Statistics table
iobr_deg() DEG using DESeq2 or limma, with built-in volcano plot and heatmap Statistics + figures
sig_gsea() Gene Set Enrichment Analysis via fgsea, with built-in plots Statistics + figures
subgroup_survival() Subgroup Cox analysis by covariates Statistics table
LR_cal() Ligand-receptor interaction analysis Statistics + figures
find_mutations() Mutation-TME interaction (TCGA only) Statistics table
feature_manipulation() Clean features: remove NA, outliers, zero-variance Cleaned data

Phase 5: Visualization

Script: 05-visualization.R

ALL TME data MUST be z-score scaled before visualization. Statistical result tables are in 04-figs/data/.

Every figure MUST:

  1. Use IOBR's built-in visualization functions whenever available
  2. Export as both PNG (300dpi) + PDF
  3. Follow the strict ordering below

IMPORTANT: Add pdf(NULL) at the top of the visualization script to suppress stray Rplots.pdf from IOBR internal plot calls. Also clean up at the end: if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf").

Fig 01–02: TME Composition Barplots (CIBERSORT + EPIC)

IMPORTANT: Only methods that produce cell FRACTION data (CIBERSORT, EPIC) should use cell_bar_plot(). ESTIMATE scores (stromal_score, immune_score, etc.) are NOT fractions and should NOT be plotted as barplots.

tme_cb <- read.csv("03-tme/tme_cibersort.csv")
tme_ep <- read.csv("03-tme/tme_epic.csv")

# Filter QC columns
qc_cols <- grep("^[Pp][.]", colnames(tme_cb), value = TRUE)
qc_cols <- c(qc_cols, grep("^Correlation", colnames(tme_cb), value = TRUE),
                  grep("^RMSE", colnames(tme_cb), value = TRUE))
cell_cols_cb <- setdiff(grep("_CIBERSORT$", colnames(tme_cb), value = TRUE), qc_cols)

# CIBERSORT barplot
p1 <- cell_bar_plot(input = tme_cb, id = "ID", features = cell_cols_cb,
                     title = "CIBERSORT TME Composition",
                     coord_flip = TRUE, palette = 4, legend.position = "right")
p1 <- p1 + guides(fill = guide_legend(ncol = 1)) +
     theme(legend.text = element_text(size = 5),
           legend.key.size = unit(2.5, "mm"),
           legend.title = element_blank(),
           plot.title = element_text(size = 10, hjust = 0.5))
if (nrow(tme_cb) > 50) {
  p1 <- p1 + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
}

ggsave("04-figs/Fig01-barplot_cibersort.png", p1, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig01-barplot_cibersort.pdf", p1, width = 180, height = 250, units = "mm")

# Repeat for EPIC with EPIC-specific QC filtering
epic_cols <- grep("_EPIC$", colnames(tme_ep), value = TRUE)
p2 <- cell_bar_plot(input = tme_ep, id = "ID", features = epic_cols,
                     title = "EPIC TME Composition",
                     coord_flip = TRUE, palette = 4, legend.position = "right")
p2 <- p2 + guides(fill = guide_legend(ncol = 1)) +
     theme(legend.text = element_text(size = 5),
           legend.key.size = unit(2.5, "mm"),
           legend.title = element_blank(),
           plot.title = element_text(size = 10, hjust = 0.5))

ggsave("04-figs/Fig02-barplot_epic.png", p2, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig02-barplot_epic.pdf", p2, width = 180, height = 250, units = "mm")

Fig 03–04: TME Heatmaps by Subtype (pheatmap)

Do NOT use sig_heatmap() — it depends on tidyHeatmap, only outputs PDF (no PNG), and has .x/.y column suffix issues. Use pheatmap instead for dual PNG+PDF output with full control.

Heatmap dimension rules (MUST follow):

Element Position Parameter
Variables (cell types) Rows (y-axis) rownames(mat) — shows readable names
Samples Columns (x-axis) colnames(mat) — hidden (show_colnames=FALSE)
Subtype color bar Top (column annotation) annotation_col
Subtype gaps Column gaps gaps_col
Clustering Rows only (cell types) cluster_rows=TRUE, cluster_cols=FALSE

The input data frame has samples as rows and variables as columns. Must transpose before pheatmap: mat <- t(as.matrix(data[, cols])).

plot_subtype_heatmap <- function(data, cols, subtype_col, title, prefix,
                                  width = 8, height = 6) {
  data <- data[order(data[[subtype_col]]), ]
  mat <- t(as.matrix(data[, cols]))  # TRANSPOSE: variables -> rows, samples -> columns
  mat[mat > 2] <- 2; mat[mat < -2] <- -2
  rownames(mat) <- sub("\\.[xy]$", "", rownames(mat))
  colnames(mat) <- data$ID

  ann_col <- data.frame(Subtype = data[[subtype_col]], row.names = data$ID)
  subtypes <- sort(unique(data[[subtype_col]]))
  sub_colors <- c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B6")
  ann_colors <- list(Subtype = setNames(sub_colors[seq_along(subtypes)], subtypes))
  gaps <- cumsum(table(data[[subtype_col]]))[-length(subtypes)]

  for (fmt in c("pdf", "png")) {
    pheatmap::pheatmap(mat, annotation_col = ann_col, annotation_colors = ann_colors,
      color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
      show_colnames = FALSE, cluster_rows = TRUE, cluster_cols = FALSE,
      treeheight_row = 15, gaps_col = gaps, fontsize_row = 8,
      main = title,
      filename = paste0("04-figs/", prefix, ".", fmt), width = width, height = height, dpi = 300)
  }
}

# Fig 03: CIBERSORT — use only cibersort (not cibersort_abs) columns
hm_data <- tme_scaled[, c("ID", cb_sc, "TME_subtype")]
plot_subtype_heatmap(hm_data, cb_sc, "TME_subtype",
  "CIBERSORT TME by Subtype", "Fig03-heatmap_cibersort_subtype")

# Fig 04: MCPcounter — MUST use scaled data, not raw CSV
mcp_sc <- grep("MCPcounter$", colnames(tme_scaled), value = TRUE)
mcp_sc <- setdiff(mcp_sc, "TME_subtype")
mcp_hm <- tme_scaled[, c("ID", mcp_sc, "TME_subtype")]
plot_subtype_heatmap(mcp_hm, mcp_sc, "TME_subtype",
  "MCPcounter TME by Subtype", "Fig04-heatmap_mcpcounter_subtype", width = 6, height = 5)

pheatmap notes:

  • Do NOT re-scale — data is already z-scored per variable in Phase 4. Just cap at ±2.
  • treeheight_row = 15 — shrink clustering tree (default 50 is too tall).
  • Must transpose — input is samples×variables, pheatmap needs variables as rows for readable y-axis labels.
  • cluster_cols = FALSE — columns (samples) are ordered by subtype, clustering would destroy ordering.
  • gaps_col (not gaps_row) — since samples are columns, gaps go between column groups.
  • annotation_col (not annotation_row) — subtype annotation applies to samples (columns).
  • CRITICAL: Do NOT use grep("CIBERSORT") in merged data — cibersort + cibersort_abs share cell names. Read from original cibersort CSV instead.

Fig 05: TME Correlation Matrix Heatmap

Use pheatmap on the correlation matrix saved in Phase 4. Must deduplicate — cibersort + cibersort_abs share cell names, so .y columns duplicate .x. Keep only .x, drop .y, clean suffix for display.

load("04-figs/data/06-cor_matrix_full.RData")

# Deduplicate: keep .x (cibersort), drop .y (cibersort_abs)
keep <- !grepl("\\.[xy]$", rownames(cor_mat)) | grepl("\\.x$", rownames(cor_mat))
cor_mat <- cor_mat[keep, keep]
rownames(cor_mat) <- sub("\\.x$", "", rownames(cor_mat))
colnames(cor_mat) <- sub("\\.x$", "", colnames(cor_mat))

pheatmap::pheatmap(cor_mat, fontsize = 6, fontsize_row = 5, fontsize_col = 5,
                    color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
                    border_color = NA, main = "TME Cell Correlation Matrix")

Fig 06: Top 10 Wilcoxon Boxplots — 2×5 patchwork Composite

library(patchwork)

# Load top 10 wilcoxon results
wilcox_res <- read.csv("04-figs/data/04-wilcoxon_results.csv")
wilcox_res <- wilcox_res[order(wilcox_res$pvalue), ]
top10 <- head(wilcox_res, 10)

# Generate individual boxplots
box_list <- lapply(seq_len(nrow(top10)), function(i) {
  sig_box(data = tme_pdata, signature = top10$feature[i],
          variable = cat_var, palette = "paired1",
          size_of_font = 6, size_of_pvalue = 4) +
    theme(axis.title = element_text(size = 5),
          axis.text.y = element_text(size = 5),
          plot.title = element_text(size = 5),
          plot.margin = margin(12, 5, 5, 5))
})

# Arrange as 2×5 grid — height 160mm to show statistical annotations
p_combined <- wrap_plots(box_list, ncol = 5, nrow = 2)

ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.png", p_combined,
       width = 250, height = 220, units = "mm", dpi = 300)
ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.pdf", p_combined,
       width = 250, height = 220, units = "mm")

sig_box native parameters for multi-panel layouts:

  • palette = "paired1" — supported palettes: nrc, jama, aaas, jco, paired1-4, accent, set2 (no lancet).
  • size_of_font = 6 — x-axis group label font size (default 10; too large for patchwork composites).
  • size_of_pvalue = 4 — p-value text size (default 6; use 3–4 for multi-panel to avoid crowding).
  • plot.margin = margin(12, 5, 5, 5) — increased top margin so p-value is not clipped by title.
  • sig_box() has no method parameter — statistical test is auto-selected.

Fig 07a-d: Forest Plots — Grouped by Method (Custom ggplot2)

CRITICAL: Do NOT use sig_forest() for survival forest plots. It uses coord_trans(x = "log2") with only 3 fixed breaks (0.5, 1, 1.5), which breaks when HR values span extreme ranges (0 to 1,000,000+). Use a custom ggplot2 forest plot instead.

Split into 4 method groups and build custom forest plots with log2 scale, text annotations (HR, 95% CI, P value), and proper handling of extreme values:

surv_tme <- read.csv("04-figs/data/05-surv_tme_cells.csv")
surv_sig <- read.csv("04-figs/data/05-surv_signatures.csv")

make_forest <- function(data, title, filename, max_n = 20) {
  data <- data[order(data$P), ]
  data <- head(data, max_n)
  if (nrow(data) == 0) { cat("  Skipping:", title, "\n"); return() }

  # Ensure numeric + complete cases
  for (col in c("HR", "CI_low_0.95", "CI_up_0.95", "P"))
    data[[col]] <- as.numeric(data[[col]])
  data <- data[complete.cases(data[, c("HR", "CI_low_0.95", "CI_up_0.95", "P")]), ]
  if (nrow(data) == 0) return()

  # Format display labels — clean .x/.y suffix and deduplicate
  data$label <- sub("\\.[xy]$", "", data$ID)
  data$label <- strtrim(data$label, 30)
  dup <- duplicated(data$label)
  if (any(dup)) data$label[dup] <- paste0(data$label[dup], "_", seq_len(sum(dup)))
  data <- data[order(data$HR, decreasing = FALSE), ]
  data$label <- factor(data$label, levels = data$label)

  # Format text columns
  data$HR_text <- ifelse(data$HR >= 100, sprintf("%.0f", data$HR),
                  ifelse(data$HR >= 0.01, sprintf("%.2f", data$HR), sprintf("%.4f", data$HR)))
  data$CI_text <- paste0(
    ifelse(data$CI_low_0.95 >= 100, sprintf("%.0f", data$CI_low_0.95),
    ifelse(data$CI_low_0.95 >= 0.01, sprintf("%.2f", data$CI_low_0.95), sprintf("%.4f", data$CI_low_0.95))),
    "-",
    ifelse(data$CI_up_0.95 >= 100, sprintf("%.0f", data$CI_up_0.95),
    ifelse(data$CI_up_0.95 >= 0.01, sprintf("%.2f", data$CI_up_0.95), sprintf("%.4f", data$CI_up_0.95))))
  data$P_text <- ifelse(data$P < 0.001, sprintf("%.2e", data$P), sprintf("%.3f", data$P))

  # Log2 transform with cap for extreme values
  cap <- 10
  data$logHR    <- log2(data$HR);       data$logHR[!is.finite(data$logHR)] <- 0
  data$logCI_low <- log2(data$CI_low_0.95); data$logCI_low[!is.finite(data$logCI_low)] <- -cap
  data$logCI_up  <- log2(data$CI_up_0.95);  data$logCI_up[!is.finite(data$logCI_up)]   <-  cap
  data$logCI_low <- pmax(data$logCI_low, -cap)
  data$logCI_up  <- pmin(data$logCI_up,   cap)

  # Build ggplot
  p <- ggplot(data, aes(x = logHR, y = label)) +
    geom_errorbarh(aes(xmin = logCI_low, xmax = logCI_up),
                   height = 0.2, linewidth = 0.8, color = "grey30") +
    geom_point(aes(color = P), size = 3.5) +
    geom_vline(xintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
    scale_color_gradientn(colors = c("#B2182B", "#EF8A62", "#FDDBC7", "#D1E5F0", "#2166AC"),
                          name = "P value", trans = "log10") +
    geom_text(aes(x = cap + 1.5, label = HR_text), hjust = 0, size = 2.5) +
    geom_text(aes(x = cap + 4.5, label = CI_text), hjust = 0, size = 2.5, color = "grey30") +
    geom_text(aes(x = cap + 9.5, label = P_text), hjust = 0, size = 2.5, color = "grey30") +
    annotate("text", x = cap + 1.5, y = nrow(data) + 1.5, label = "HR", hjust = 0, size = 2.5, fontface = "bold") +
    annotate("text", x = cap + 4.5, y = nrow(data) + 1.5, label = "95% CI", hjust = 0, size = 2.5, fontface = "bold") +
    annotate("text", x = cap + 9.5, y = nrow(data) + 1.5, label = "P value", hjust = 0, size = 2.5, fontface = "bold") +
    scale_x_continuous(breaks = c(-8,-4,-2,-1,0,1,2,4,8),
                       labels = c("1/256","1/16","1/4","0.5","1","2","4","16","256")) +
    coord_cartesian(xlim = c(-cap, cap + 13), clip = "off") +
    labs(x = "Hazard Ratio (log2 scale)", y = "", title = title) +
    theme_light() + theme(
      axis.text.y = element_text(size = 7), axis.text.x = element_text(size = 7),
      plot.title = element_text(size = 10, hjust = 0.5, face = "bold"),
      legend.position = "bottom", plot.margin = margin(5, 40, 5, 5, "mm"))

  plot_h <- max(4, nrow(data) * 0.4 + 1.5)
  ggsave(paste0("04-figs/", filename, ".png"), p, width = 12, height = plot_h, units = "in", dpi = 300)
  ggsave(paste0("04-figs/", filename, ".pdf"), p, width = 12, height = plot_h)
}

# 07a: CIBERSORT
cb_surv <- surv_tme[grepl("CIBERSORT", surv_tme$ID), ]
make_forest(cb_surv, "CIBERSORT Survival Forest", "Fig07a-forest_cibersort")

# 07b: Other TME methods
other_surv <- surv_tme[!grepl("CIBERSORT", surv_tme$ID), ]
make_forest(other_surv, "Other TME Methods Survival Forest", "Fig07b-forest_other_tme")

# 07c: TME signatures (non-GO/KEGG)
gkegg_pattern <- "^HALLMARK_|^KEGG_|^GO_|^REACTOME_"
tme_sig_surv <- surv_sig[!grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(tme_sig_surv, "TME Signatures Survival Forest", "Fig07c-forest_tme_signature")

# 07d: GO/KEGG/Hallmark pathways
gkegg_surv <- surv_sig[grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(gkegg_surv, "Pathway Survival Forest", "Fig07d-forest_go_kegg")

Key notes:

  • Do NOT use sig_forest() — it only has 3 fixed axis breaks (0.5, 1, 1.5) and breaks with extreme HR values. Custom ggplot2 forest plot handles all cases.
  • Custom forest plot features: log2 scale with 9 breaks, text annotations for HR/CI/P, capped error bars for extreme values, dynamic plot height.
  • Split into 4 groups (07a-d) for readability: CIBERSORT, Other TME, TME Signatures, GO/KEGG/Hallmark.

Fig 08: KM Plots — p<0.005 Significant Variables

# sig_surv_plot() saves its own output — do NOT wrap with ggsave()
# Create kmplot subdirectory first
dir.create("04-figs/kmplot", showWarnings = FALSE, recursive = TRUE)

# Merge TME + signature survival results
surv_all <- rbind(surv_tme, surv_sig)
surv_sorted <- surv_all[order(surv_all$P), ]
sig_vars <- surv_sorted$ID[surv_sorted$P < 0.005]
if (length(sig_vars) == 0) sig_vars <- surv_sorted$ID[surv_sorted$P < 0.01]

surv_data <- read.csv("04-figs/data/01-tme_pdata_merged.csv")

for (i in seq_along(sig_vars)) {
  sig_surv_plot(input_pdata = surv_data, signature = sig_vars[i],
                time = time_col, status = status_col, time_type = "month",
                palette = "jama", mini_sig = "score",
                fig.type = "pdf", save_path = "04-figs/kmplot",
                project = paste0("KM_", i), index = i)
}

Key notes:

  • sig_surv_plot() / roc_time() save their own output — Do NOT wrap with ggsave(). Use their save_path / path and fig.type / fig_type parameters.
  • KM plots saved to 04-figs/kmplot/ subfolder — each variable generates multiple files (best-cutoff, 3-group, 2-group KM plots + RData).
  • time_type: "day", "month", or "year"
  • mini_sig: "score" (continuous, auto-split by median) or "category" (already grouped)

Fig 09: TME Subtype Boxplots — Top Feature Per Subtype, Patchwork Composite

For each TME subtype, find the CIBERSORT cell type with highest mean (scaled), then create one sig_box per subtype and combine with patchwork.

# Read CIBERSORT columns from original file
cb_cols <- setdiff(colnames(tme_cb), "ID")
cb_cols <- cb_cols[!grepl(paste(qc_patterns, collapse = "|"), cb_cols)]
cb_sc <- intersect(cb_cols, colnames(tme_sc))
if (length(cb_sc) == 0) cb_sc <- intersect(paste0(cb_cols, ".x"), colnames(tme_sc))

# For each subtype, find top cell (highest scaled mean)
subtypes <- sort(unique(tme_sc$TME_subtype))
subtype_top_cells <- character(0)
for (st in subtypes) {
  st_data <- tme_sc[tme_sc$TME_subtype == st, cb_sc]
  cell_means <- colMeans(st_data, na.rm = TRUE)
  subtype_top_cells[st] <- names(sort(cell_means, decreasing = TRUE))[1]
}

# One sig_box per subtype, patchwork composite
box09_list <- lapply(subtypes, function(st) {
  col <- subtype_top_cells[st]
  clean_name <- sub("\\.[xy]$", "", col)
  sig_box(data = tme_sc, signature = col,
          variable = "TME_subtype", palette = "paired1",
          size_of_font = 6, size_of_pvalue = 4) +
    ggtitle(paste0(st, ": ", clean_name)) +
    theme(axis.title = element_text(size = 6),
          axis.text.y = element_text(size = 5),
          plot.title = element_text(size = 7),
          plot.margin = margin(12, 5, 5, 5))
})
p09 <- wrap_plots(box09_list, ncol = length(subtypes))

ggsave("04-figs/Fig09-subtype_boxplot.png", p09,
       width = 50 * length(subtypes), height = 100, units = "mm", dpi = 300)
ggsave("04-figs/Fig09-subtype_boxplot.pdf", p09,
       width = 50 * length(subtypes), height = 100, units = "mm")

Note: Uses CIBERSORT columns from scaled data (Phase 4 output). Each subtype's top cell is the one with highest mean z-score in that subtype — this is the most representative feature for that cluster.

Clean up stray Rplots.pdf (generated by some IOBR internal plot calls)

Suppress with pdf(NULL) at top of visualization script

if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf")

Additional Visualization Functions

Use these as needed based on analysis requirements:

Function Purpose Output
roc_time() Time-dependent ROC with AUC at multiple time points Figure (self-saving)
sig_roc() Multiple ROC curves for binary response prediction Figure
batch_sig_surv_plot() Batch KM curves for multiple signatures across projects Figures (self-saving)
surv_group() KM curve with risk table (High vs Low) Figure
subgroup_survival() Subgroup Cox analysis by covariates Statistics table
get_cor() Single pairwise correlation with scatter plot + regression line Figure
iobr_cor_plot() Batch correlation visualization (signature vs signature/phenotype) Figure
iobr_pca() PCA dimensionality reduction with scatter plot Figure
iobr_deg() DEG with built-in volcano plot and heatmap output Statistics + figures
sig_gsea() Gene Set Enrichment Analysis via fgsea, with built-in plots Statistics + figures
LR_cal() Ligand-receptor interaction analysis Statistics + figures
# Time-dependent ROC — saves its own output
roc_time(input = tme_pdata, vars = c("T_cells_CD8_CIBERSORT", "TMEscoreA_CIR"),
         time = "OS_time", status = "OS_status",
         time_point = c(12, 36, 60), time_type = "month", palette = "jama",
         path = "04-figs", fig.type = "png", index = 1)

# PCA visualization
iobr_pca(data = eset, pdata = pdata, group = "TME_subtype", scale = TRUE)

# Single correlation scatter plot
get_cor(data = tme_pdata, variable1 = "T_cells_CD8_CIBERSORT",
        variable2 = "CD_8_T_effector", method = "spearman")

IOBR Palettes

Categorical (palette_group parameter):

Name Best For
"jama" Default, 2–5 groups
"jco" Clinical oncology
"nrc" Nature publications
"aaas" Science journal
"accent" Colorblind-friendly
"set2" Many groups

Heatmap (palette parameter, numeric 1–6):

# Style
2 Blue-white-red diverging (default)
1 Classic pheatmap
3 Purple-orange
4–6 Sequential gradients

Cell bar plot (cell_bar_plot() palette parameter, numeric 1–4):

# Style
4 Recommended — rich random palette with many distinct colors
1–3 Alternative random palettes

If unsure which palette, present 3 options to the user and ask.


User Decision Points

Pause and ask at:

  1. Phase 0: Review and approve analysis plan (IOBR-pipeline.md)
  2. Phase 1: Data type (Count/TPM/Array) and species (hsa/mmu)
  3. Phase 2: Which deconvolution method(s)
  4. Phase 3: Scoring method and which signature collection
  5. Phase 5: Color palette preference

05-note Template: IOBR-analysis-README.md

Generate a tree diagram with brief descriptions:

project/
├── 01-script/
│   ├── 01-data_preprocessing.R      # QC, normalization, annotation
│   ├── 02-tme_deconvolution.R       # TME cell type deconvolution
│   ├── 03-signature_analysis.R      # Signature scoring
│   ├── 04-statistical_analysis.R    # Statistics, clustering, survival
│   └── 05-visualization.R           # All figures
├── 02-input/
│   ├── annotated_eset.csv           # Normalized expression matrix
│   └── pdata.csv                    # Phenotype data
├── 03-tme/
│   ├── tme_cibersort.csv            # CIBERSORT results
│   ├── tme_mcpcounter.csv           # MCPcounter results
│   ├── sig_score_tme.csv            # ssGSEA scores
│   └── tme_sig_combine.csv          # All merged results
├── 04-figs/
│   ├── Fig01-barplot_cibersort.png/pdf
│   ├── Fig02-barplot_epic.png/pdf
│   ├── Fig03-heatmap_cibersort_subtype.png/pdf   (pheatmap)
│   ├── Fig04-heatmap_mcpcounter_subtype.png/pdf  (pheatmap)
│   ├── Fig05-cor_matrix.png/pdf
│   ├── Fig06-top10_wilcoxon_boxplot.png/pdf
│   ├── Fig07a-forest_cibersort.png/pdf
│   ├── Fig07b-forest_other_tme.png/pdf
│   ├── Fig07c-forest_tme_signature.png/pdf
│   ├── Fig07d-forest_go_kegg.png/pdf
│   ├── kmplot/                        # KM survival plots (auto-saved by sig_surv_plot)
│   └── Fig09-subtype_boxplot.png/pdf
│   └── data/                        # Statistical result tables
│       ├── 01-tme_pdata_merged.csv
│       ├── 02-tme_scaled.csv
│       ├── 03-tme_subtype.csv
│       ├── 04-wilcoxon_results.csv
│       ├── 05-surv_results.csv
│       └── 06-cor_matrix.csv
├── 05-note/
│   ├── IOBR-pipeline.md             # Analysis plan
│   ├── pdata_summary.csv            # Variable summary
│   └── IOBR-analysis-README.md      # This file
└── 06-log/
    ├── 01-data_preprocessing.log
    └── ...

Include a brief summary of key findings below the tree.


06-log Directory

  • One log file per script, named to match: 01-data_preprocessing.log, 02-tme_deconvolution.log, ...
  • Each log MUST begin with the IOBR citation header (same as scripts).
  • Capture logs by running scripts with redirection:
    { echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/01-data_preprocessing.R; } > 06-log/01-data_preprocessing.log 2>&1

Quick Start Checklist

  1. Check IOBR installation in base environment
  2. Create project directory structure (01-script through 06-log + 04-figs/data)
  3. Confirm data source, type, and species with user
  4. Generate 05-note/IOBR-pipeline.md — present to user for approval
  5. Write 01-data_preprocessing.R (with citation header), run, capture log
  6. Verify sample matching (eset vs pdata), generate 05-note/pdata_summary.csv
  7. Ask user to choose deconvolution method(s)
  8. Write 02-tme_deconvolution.R, run, capture log
  9. Ask user about scoring method and gene sets
  10. Write 03-signature_analysis.R, run, capture log
  11. Write 04-statistical_analysis.R — merge, scale, cluster, test, correlate, save to 04-figs/data/
  12. Ask user about palette preferences
  13. Write 05-visualization.R — follow strict Fig01–09 ordering
  14. Generate 05-note/IOBR-analysis-README.md
  15. Verify all 06-log files have citation headers

References (in this skill directory)

  • references/functions.md — IOBR function parameter reference
  • references/palettes.md — Color palette selection guide
  • references/iobr_built_in_data.md — Complete catalog of built-in signatures, annotations, and datasets
  • references/iobr_pipeline_template.R — Full production pipeline template (9-method deconvolution + signature + pathway scoring)

Read these when you need specific parameter details, palette options, want to know what signatures are available, or need a complete pipeline template to adapt.