IOBRskill

End-to-end bulk RNA-seq / microarray tumor microenvironment (TME) analysis using the IOBR R package. Covers data preprocessing, gene annotation, TME deconvolution (CIBERSORT/MCPcounter/EPIC/xCell/ESTIMATE/TIMER/quanTIseq/IPS), signature scoring (PCA/ssGSEA/Z-score), ligand-receptor analysis, pathway enrichment, survival modeling, and Nature-style visualization. Use this skill when the user mentions: tumor microenvironment analysis, TME deconvolution, immune infiltration analysis, 肿瘤微环境分析, 肿瘤微环境解析, 免疫浸润分析, ligand-receptor analysis, 受体配体分析, IOBR, immune cell deconvolution, CIBERSORT, MCPcounter, ESTIMATE, or any bulk RNA-seq immuno-oncology workflow. Also trigger on /IOBRskill.

IOBR 4 Updated 2mo ago

Resources

GitHub

Install

npx skillscat add iobr/iobrskill

Install via the SkillsCat registry.

SKILL.md

IOBRskill — Bulk Transcriptome Tumor Microenvironment Analysis

IOBR Citation (MUST include in every R script and log)

Every R script generated by this skill MUST begin with this citation header:

###############################################################################
# IOBR Citation:
# Zeng DQ, Fang YR, … , Yu GC, Liao WJ.
# Enhancing Immuno-Oncology Investigations Through Multidimensional Decoding of
# Tumour Microenvironment with IOBR 2.0, Cell Reports Methods, 2024
# https://doi.org/10.1016/j.crmeth.2024.100910
###############################################################################

The same citation header MUST appear at the top of every log file in 06-log/.

Prerequisites

Environment Check (first step)

Verify IOBR is installed in the base conda environment:

source /home/bio/miniconda3/etc/profile.d/conda.sh
conda activate base
Rscript -e 'packageVersion("IOBR")'

If IOBR is not installed:

if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("IOBR/IOBR")

Required Dependencies

IOBR relies on several packages that are NOT auto-installed. Install these before starting:

# For CIBERSORT
BiocManager::install("preprocessCore")

# For quanTIseq deconvolution
install.packages("limSolve", repos = "https://cloud.r-project.org")

# For tme_cluster()
install.packages("NbClust", repos = "https://cloud.r-project.org")

# For iobr_pca()
install.packages("FactoMineR", repos = "https://cloud.r-project.org")

# For sig_heatmap() — only if using tidyHeatmap approach (pheatmap preferred)
# install.packages("tidyHeatmap", repos = "https://cloud.r-project.org")

IOBR Built-in Data

IOBR ships with rich built-in data — you rarely need external annotation files. For full details, read references/iobr_built_in_data.md. Key resources:

Resource	Access	What It Contains
Gene length / ID mapping	`count2tpm()` auto-loads	111 genes with HUGO, ENTREZID, ENSEMBL
Microarray probe sets	`extdata/probesets.txt`	176 probes mapped to cell types
TME signatures (186)	`IOBR:::signature_tme`	Immune, stromal, EMT, checkpoint signatures
Full collection (323)	`data("signature_collection")`	All signatures + Hallmark + Metabolism
Signature groupings	`data("sig_group")`	35+ categories for filtering signatures
Pathway data (on-demand)	`IOBR::load_data()`	hallmark (50), go_bp (7658), kegg (186), reactome (1615)
Example datasets	`data("pdata_stad")` etc.	TCGA-STAD, IMvigor210 toy data

On-demand pathway data via `load_data()`

IOBR provides IOBR::load_data() to download and cache pathway gene sets from GitHub on first use. These are essential for comprehensive pathway scoring:

hallmark  <- IOBR::load_data("hallmark")   # 50 pathways
go_bp     <- IOBR::load_data("go_bp")      # 7,658 gene sets
go_cc     <- IOBR::load_data("go_cc")
go_mf     <- IOBR::load_data("go_mf")
kegg      <- IOBR::load_data("kegg")       # 186 pathways
reactome  <- IOBR::load_data("reactome")   # 1,615 pathways

Project Directory Structure

Create this structure before starting analysis:

project/
├── 01-script/         # R scripts (numbered, each with IOBR citation header)
├── 02-input/          # Normalized and annotated expression matrices
├── 03-tme/            # TME deconvolution and signature scoring results
├── 04-figs/
│   ├── *.png/pdf      # All figures (PNG 300dpi + PDF, numbered naming)
│   ├── kmplot/        # KM survival plots (auto-saved by sig_surv_plot)
│   └── data/          # Statistical result tables matching figures
├── 05-note/
│   ├── IOBR-pipeline.md           # Analysis plan (generated in Phase 0)
│   ├── pdata_summary.csv          # Phenotype variable summary
│   └── IOBR-analysis-README.md    # Final analysis summary
└── 06-log/            # R execution logs (one per script, with citation header)

Naming conventions:

Scripts: 01-data_preprocessing.R, 02-tme_deconvolution.R, ...
Figures: Fig01-barplot_cibersort.png + Fig01-barplot_cibersort.pdf (always dual format)
Statistical tables: 04-figs/data/01-tme_pdata_merged.csv, 04-figs/data/04-wilcoxon_results.csv, ...
Logs: 01-data_preprocessing.log, 02-tme_deconvolution.log, ...
Note: IOBR-analysis-README.md (tree diagram + brief interpretation)

When running a script, capture the log:

Rscript 01-script/01-data_preprocessing.R > 06-log/01-data_preprocessing.log 2>&1

Analysis Pipeline

6 sequential phases (Phase 0–5). Pause at decision points to let the user choose.

Phase 0: Plan Generation

Before writing any analysis code, generate an analysis plan document.

After confirming data type / species / deconvolution methods with the user (Phase 1 questions), create 05-note/IOBR-pipeline.md with:

Project metadata: project name, data source, data type, species
ASCII tree diagram of all scripts, their inputs/outputs, and expected results:

01-data_preprocessing.R
  ├─ Input:  raw expression matrix + phenotype data
  ├─ Output: 02-input/annotated_eset.csv, 02-input/pdata.csv
  └─ Notes:  log2 transformation, gene ID conversion

02-tme_deconvolution.R
  ├─ Input:  02-input/annotated_eset.csv
  ├─ Output: 03-tme/tme_cibersort.csv, 03-tme/tme_mcpcounter.csv, ...
  └─ Notes:  CIBERSORT + MCPcounter + ESTIMATE (linear-scale data)

03-signature_analysis.R
  ├─ Input:  02-input/annotated_eset.csv
  ├─ Output: 03-tme/sig_score_tme.csv, 03-tme/sig_score_pathway.RData
  └─ Notes:  ssGSEA for TME signatures, Hallmark + KEGG pathways

04-statistical_analysis.R
  ├─ Input:  03-tme/*.csv + 02-input/pdata.csv
  ├─ Output: 04-figs/data/01-tme_pdata_merged.csv
  │          04-figs/data/02-tme_scaled.csv
  │          04-figs/data/03-tme_subtype.csv
  │          04-figs/data/04-wilcoxon_results.csv
  │          04-figs/data/05-surv_results.csv
  │          04-figs/data/06-cor_matrix.csv
  └─ Notes:  merge → scale → cluster → wilcoxon → survival → correlation

05-visualization.R
  ├─ Input:  04-figs/data/*.csv + 03-tme/*.csv
  ├─ Output: 04-figs/Fig01-09 (PNG + PDF)
  └─ Notes:  barplots → heatmaps → correlation → boxplots → forest → KM

Present to user: "开始分析" or "调整计划"
Only after approval, begin Phase 1

Phase 1: Data Preprocessing and Annotation

Script: 01-data_preprocessing.R

Confirm with the user:

Data source: User matrix, GEO (GSE#####), or TCGA
Data type: "Count" (raw RNA-seq), "Tpm" (already normalized), or "Array" (microarray)
Species: "hsa" (human) or "mmu" (mouse)

Sample matching (if both eset and pdata exist):

Before any analysis, verify sample overlap between expression matrix and phenotype data:

# Report sample counts
eset_samples  <- colnames(eset)
pdata_samples <- pdata$ID   # or rownames(pdata)
intersection  <- intersect(eset_samples, pdata_samples)

cat("Expression matrix samples:", length(eset_samples), "\n")
cat("Phenotype data samples:",    length(pdata_samples), "\n")
cat("Matched samples:",           length(intersection), "\n")

# Ask user: use all samples, intersection only, or check data?
# Then filter both to agreed sample set
eset  <- eset[, intersection]
pdata <- pdata[pdata$ID %in% intersection, ]

pdata variable summary:

After loading pdata, generate a summary report:

# Generate pdata_summary.csv
pdata_summary <- data.frame(
  var       = character(),
  type      = character(),
  n_levels  = integer(),
  levels    = character(),
  stringsAsFactors = FALSE
)

for (col in colnames(pdata)) {
  vals <- pdata[[col]]
  if (is.numeric(vals)) {
    tp <- ifelse(length(unique(vals)) <= 2, "binary", "numeric")
    lvls <- paste(range(vals, na.rm = TRUE), collapse = " - ")
    n_lv <- length(unique(vals))
  } else {
    tp <- "factor"
    lvls <- paste(unique(vals)[1:min(10, length(unique(vals)))], collapse = ", ")
    n_lv <- length(unique(vals))
  }
  pdata_summary <- rbind(pdata_summary, data.frame(
    var = col, type = tp, n_levels = n_lv, levels = lvls
  ))
}

write.csv(pdata_summary, file = "05-note/pdata_summary.csv", row.names = FALSE)
cat(">>>>> pdata summary saved to 05-note/pdata_summary.csv\n")

This summary helps decide which statistical tests to apply in Phase 4.

Key functions:

Function	Purpose	When to Use
`count2tpm()`	Count → TPM + gene ID mapping	Raw RNA-seq counts
`log2eset()`	Auto-detect and apply log2 transform	Ensures log2 scale for downstream analysis
`anno_eset()`	Probe → gene symbol annotation	Microarray data or unannotated matrices
`remove_batcheffect()`	ComBat / ComBat-seq batch correction	Multi-batch datasets
`find_outlier_samples()`	WGCNA-based outlier detection	QC step
`iobr_pca()`	PCA visualization for batch assessment	QC step
`mouse2human_eset()`	Mouse → human gene ortholog conversion	Mouse data

# RNA-seq: counts → TPM
eset <- count2tpm(countMat = expr_matrix, idType = "Ensembl", org = "hsa")

# Microarray: probe annotation
eset <- anno_eset(eset = expr_matrix, annotation = anno_df,
                  symbol = "gene_symbol", probe = "probe_id", method = "mean")

# Ensure log2 scale (auto-detects — only transforms if not already log2)
eset <- log2eset(eset)

Save output:

write.csv(eset,  file = "02-input/annotated_eset.csv")
write.csv(pdata, file = "02-input/pdata.csv")

Phase 2: TME Deconvolution

Script: 02-tme_deconvolution.R

Ask the user which method(s) to use.

Method	Cell Types	Best For
`"cibersort"`	22 (LM22)	Gold standard, comprehensive
`"cibersort_abs"`	22 (LM22)	CIBERSORT absolute mode (fraction-independent)
`"mcpcounter"`	8 populations	Quick, stromal + immune
`"epic"`	8 incl. cancer cells	Tumor purity
`"xcell"`	64 types	Broadest coverage (slow)
`"estimate"`	4 scores	Stromal/Immune/Tumor purity
`"timer"`	6 types	TCGA cancer-specific
`"quantiseq"`	10 incl. M1/M2	Macrophage polarization
`"ips"`	4 axes	Immunotherapy response

Recommended: CIBERSORT + MCPcounter + ESTIMATE (most cited). For comprehensive analysis, run all 9 methods and merge.

Single-method usage:

tme_result <- deconvo_tme(eset = eset, method = "cibersort", arrays = FALSE, perm = 1000)
write.csv(tme_result, file = "03-tme/tme_cibersort.csv")

Full 9-method deconvolution + merge (production workflow):

# Each method may take 1-5 minutes depending on sample size
# Use tryCatch for each method — some may fail on certain datasets
cibersort <- tryCatch(deconvo_tme(eset = eset, method = "cibersort",
                                   arrays = FALSE, perm = 1000), error = function(e) NULL)
cibersort_abs <- tryCatch(deconvo_tme(eset = eset, method = "cibersort_abs",
                                       arrays = FALSE, perm = 1000), error = function(e) NULL)
epic      <- tryCatch(deconvo_tme(eset = eset, method = "epic",
                                   arrays = FALSE), error = function(e) NULL)
mcp       <- tryCatch(deconvo_tme(eset = eset, method = "mcpcounter"), error = function(e) NULL)
xcell     <- tryCatch(deconvo_tme(eset = eset, method = "xcell",
                                   arrays = FALSE), error = function(e) NULL)
estimate  <- tryCatch(deconvo_tme(eset = eset, method = "estimate"), error = function(e) NULL)
timer     <- tryCatch(deconvo_tme(eset = eset, method = "timer",
                                   group_list = rep(tumor_type, ncol(eset))), error = function(e) NULL)
quantiseq <- tryCatch(deconvo_tme(eset = eset, method = "quantiseq",
                                   tumor = TRUE, arrays = FALSE, scale_mrna = TRUE), error = function(e) NULL)
ips       <- tryCatch(deconvo_tme(eset = eset, method = "ips", plot = FALSE), error = function(e) NULL)

# Merge all results by sample ID using base R merge()
results_list <- list(cibersort, cibersort_abs, mcp, xcell, epic, estimate, quantiseq, timer, ips)
results_list <- results_list[!sapply(results_list, is.null)]

tme_combine <- results_list[[1]]
for (i in seq_along(results_list)[-1]) {
  tme_combine <- merge(tme_combine, results_list[[i]], by = "ID")
}

save(tme_combine, file = "03-tme/tme_combine.RData")
write.csv(tme_combine, file = "03-tme/tme_combine.csv", row.names = FALSE)

Critical caveats:

All deconvolution methods (except IPS) need linear-scale data — Anti-log first: eset_linear <- 2^eset if log2-transformed. CIBERSORT, MCPcounter, ESTIMATE, EPIC, quanTIseq, xCell all warn "Data values appear small (<50)" when fed log2 data. Only IPS works with log2 data directly.
Column names have method suffix — T_cells_CD8_CIBERSORT, T_cells_MCPcounter, etc.
Output CSV has ID column — Set rownames: rownames(df) <- df$ID; df$ID <- NULL.
arrays = TRUE for microarray; perm = 1000 recommended for CIBERSORT.
Required packages — preprocessCore (CIBERSORT), limSolve (quanTIseq). Install before running.
TIMER requires group_list — Provide cancer type per sample: rep(tumor_type, ncol(eset)).
quanTIseq parameters — Use tumor = TRUE, arrays = FALSE, scale_mrna = TRUE for tumor samples.
Use tryCatch for each method — some methods may fail on certain datasets. Wrap each call to avoid losing partial results.

Phase 3: Signature Scoring and Pathway Analysis

Script: 03-signature_analysis.R

Scoring methods (ask user to choose):

Method	Best For
`"ssgsea"`	Default, robust
`"pca"`	Continuous scores, large gene sets
`"zscore"`	Simple, fast
`"integration"`	Combined approach

Step 3a: TME / IO biomarker signature scoring:

# Option A: TME signatures (186 sets, internal — use :::)
signature_tme <- IOBR:::signature_tme
sig_tme <- calculate_sig_score(pdata = NULL, eset = eset,
                                signature = signature_tme,
                                method = "ssgsea",
                                mini_gene_count = 3,
                                adjust_eset = TRUE)
write.csv(sig_tme, file = "03-tme/sig_score_tme.csv")

# Option B: Full collection (323 sets) — for comprehensive analysis
data("signature_collection")
sig_res <- calculate_sig_score(pdata = NULL, eset = eset,
                                signature = signature_collection,
                                method = "pca",
                                mini_gene_count = 2,
                                adjust_eset = TRUE)
write.csv(sig_res, file = "03-tme/sig_score_collection.csv")

Step 3b: Pathway scoring (Hallmark + GO + KEGG + Reactome):

hallmark  <- IOBR::load_data("hallmark")    # 50 pathways
go_bp     <- IOBR::load_data("go_bp")       # 7,658 gene sets
go_cc     <- IOBR::load_data("go_cc")
go_mf     <- IOBR::load_data("go_mf")
kegg      <- IOBR::load_data("kegg")        # 186 pathways
reactome  <- IOBR::load_data("reactome")    # 1,615 pathways

sig_pathway <- calculate_sig_score(pdata = NULL, eset = eset,
                                    signature = c(hallmark, go_bp, go_cc, go_mf, kegg, reactome),
                                    method = "ssgsea",
                                    mini_gene_count = 3)
write.csv(sig_pathway, file = "03-tme/sig_score_pathway.csv")

Step 3c: Merge all results into master table:

tme_sig_combine <- merge(tme_combine, sig_res, by = "ID")
tme_sig_combine <- merge(tme_sig_combine, sig_pathway, by = "ID")
write.csv(tme_sig_combine, file = "03-tme/tme_sig_combine.csv", row.names = FALSE)

Key notes:

pdata = NULL, adjust_eset = TRUE — This is the correct production pattern for calculate_sig_score(). Do NOT set adjust_eset = FALSE unless data is already properly scaled.
Method names are lowercase — Use "pca", "ssgsea", "zscore", "integration" (NOT uppercase "PCA").
signature_tme is internal — Access via IOBR:::signature_tme (triple colon).
signature_collection needs data() — Call data("signature_collection") before use.
IOBR::load_data() downloads on first use — Caches locally for subsequent calls.
Pathway scoring is computationally heavy — Consider running Hallmark (50) and KEGG (186) first for quick results, then GO/Reactome for comprehensive analysis.
Use base R merge() for joining — Not all environments have tidyverse. Prefer merge(df1, df2, by = "ID") over %>% inner_join().
tme_cluster() requires NbClust — Install: install.packages("NbClust").

Use references/iobr_built_in_data.md for the full signature catalog and data("sig_group") for filtering by category.

TME subtype clustering:

tme_clusters <- tme_cluster(input = tme_result, features = cell_types,
                             id = "sample_id", method = "kmeans")

Phase 4: Statistical Analysis

Script: 04-statistical_analysis.R

This phase produces structured output: every statistical result is saved to 04-figs/data/ as a numbered CSV file for traceability and reuse in Phase 5 visualization.

4a. Data Preparation

Merge TME results with pdata, then scale all TME cell columns:

# Load data
pdata <- read.csv("02-input/pdata.csv")
tme   <- read.csv("03-tme/tme_combine.csv")

# Merge pdata with TME results
tme_pdata <- merge(pdata, tme, by = "ID")
write.csv(tme_pdata, file = "04-figs/data/01-tme_pdata_merged.csv", row.names = FALSE)

# Collect TME cell columns — MUST match actual column names in merged data
# After merge, cibersort + cibersort_abs shared names get .x/.y suffixes
# So match by checking actual colname OR colname-without-suffix against original list
qc_patterns <- c("^P[.]", "^Correlation", "^RMSE")
all_tme_cell_cols <- colnames(tme_pdata)[
  colnames(tme_pdata) %in% original_cell_cols |
  gsub("\\.[xy]$", "", colnames(tme_pdata)) %in% original_cell_cols
]
all_tme_cell_cols <- setdiff(all_tme_cell_cols, "ID")

# Z-score scale all TME cell columns (per column = per variable)
tme_scaled <- tme_pdata
tme_scaled[, all_tme_cell_cols] <- scale(tme_scaled[, all_tme_cell_cols])
write.csv(tme_scaled, file = "04-figs/data/02-tme_scaled.csv", row.names = FALSE)

4b. TME Subtyping

Cluster samples by TME cell composition using CIBERSORT fractions:

# Extract CIBERSORT cell columns for clustering
cb_cols <- grep("_CIBERSORT$", colnames(tme_pdata), value = TRUE)
cb_cols <- setdiff(cb_cols, grep("P[.]value|Correlation|RMSE", cb_cols, value = TRUE))

# K-means clustering on scaled CIBERSORT data
set.seed(123)
tme_subtype <- tme_cluster(input = tme_scaled, features = cb_cols,
                            id = "ID", method = "kmeans", max.nc = 6)

# tme_cluster returns data.frame with columns: ID, cluster, <cell_cols>
# Extract cluster assignments and label as TME subtypes
subtype_df <- data.frame(ID = tme_subtype$ID, TME_subtype = paste0("TME", tme_subtype$cluster))
write.csv(subtype_df, file = "04-figs/data/03-tme_subtype.csv", row.names = FALSE)

# Find representative TME variables per subtype
markers <- find_markers_in_bulk(pdata = tme_scaled, eset = tme_scaled,
                                 group = "TME_subtype", nfeatures = 50)
write.csv(markers, file = "04-figs/data/03-tme_subtype_markers.csv", row.names = FALSE)

4c. Statistical Testing (conditional)

Run statistical tests based on what's available in pdata:

# --- Wilcoxon test (IF categorical variables exist in pdata) ---
# Check pdata_summary.csv for factor/binary variables
pdata_summary <- read.csv("05-note/pdata_summary.csv")
cat_vars <- pdata_summary$var[pdata_summary$type %in% c("factor", "binary")]

if (length(cat_vars) > 0) {
  for (v in cat_vars) {
    lvls <- unique(tme_pdata[[v]])
    lvls <- lvls[!is.na(lvls)]
    if (length(lvls) == 2) {
      wilcox_res <- batch_wilcoxon(data = tme_pdata, target = v, feature = cell_cols)
      write.csv(wilcox_res,
                file = paste0("04-figs/data/04-wilcoxon_", v, ".csv"),
                row.names = FALSE)
    }
  }
}

# --- Survival analysis (IF survival data exists) ---
# Check for time + status columns
time_cols   <- grep("time|Time|OS_time|PFS_time", colnames(tme_pdata), value = TRUE)
status_cols <- grep("status|Status|OS_status|PFS_status", colnames(tme_pdata), value = TRUE)

if (length(time_cols) > 0 && length(status_cols) > 0) {
  time_col   <- time_cols[1]
  status_col <- status_cols[1]

  # Batch Cox regression for all TME cell columns
  surv_res <- batch_surv(pdata = tme_pdata, variable = cell_cols,
                          time = time_col, status = status_col)
  write.csv(surv_res, file = "04-figs/data/05-surv_results.csv", row.names = FALSE)

  # Per-method survival (if multiple deconvolution methods were run)
  methods_run <- unique(gsub(".*_", "", cell_cols))
  for (m in methods_run) {
    m_cols <- grep(paste0("_", m, "$"), cell_cols, value = TRUE)
    if (length(m_cols) > 0) {
      m_surv <- batch_surv(pdata = tme_pdata, variable = m_cols,
                           time = time_col, status = status_col)
      write.csv(m_surv,
                file = paste0("04-figs/data/05-surv_", m, ".csv"),
                row.names = FALSE)
    }
  }
}

Key notes for survival:

batch_surv() requires merged pdata — All variables must be columns in pdata. Use merge() first.
batch_surv() parameter is variable — A vector of column names, NOT a separate data frame.
batch_surv() output columns: P, HR, CI_low_0.95, CI_up_0.95, ID.

4d. Correlation

Correlation matrix of all TME variables:

# batch_cor(data, target, feature, method) — correlates one target vs multiple features
# NOTE: batch_cor() returns target-vs-each-feature, NOT a pairwise matrix.
# For a full pairwise correlation matrix, use base R cor() instead (see below).
# Key immune cells as target, all TME cells as features
key_cells <- c("T_cells_CD8_CIBERSORT", "Macrophages_M1_CIBERSORT")
cor_all <- data.frame()
for (tc in key_cells) {
  cor_res <- batch_cor(data = tme_pdata, target = tc, feature = cell_cols, method = "spearman")
  cor_res$target <- tc
  cor_all <- rbind(cor_all, cor_res)
}
write.csv(cor_all, file = "04-figs/data/06-cor_matrix.csv", row.names = FALSE)

# For full pairwise correlation matrix (for heatmap visualization), use base R:
cor_mat <- cor(tme_pdata[, cell_cols], method = "spearman", use = "pairwise.complete.obs")
save(cor_mat, file = "04-figs/data/06-cor_matrix_full.RData")

4e. Other Statistical Functions

Function	Purpose	Output
`batch_pcc()`	Batch partial correlation controlling for a third variable	Statistics table
`iobr_deg()`	DEG using DESeq2 or limma, with built-in volcano plot and heatmap	Statistics + figures
`sig_gsea()`	Gene Set Enrichment Analysis via fgsea, with built-in plots	Statistics + figures
`subgroup_survival()`	Subgroup Cox analysis by covariates	Statistics table
`LR_cal()`	Ligand-receptor interaction analysis	Statistics + figures
`find_mutations()`	Mutation-TME interaction (TCGA only)	Statistics table
`feature_manipulation()`	Clean features: remove NA, outliers, zero-variance	Cleaned data

Phase 5: Visualization

Script: 05-visualization.R

ALL TME data MUST be z-score scaled before visualization. Statistical result tables are in 04-figs/data/.

Every figure MUST:

Use IOBR's built-in visualization functions whenever available
Export as both PNG (300dpi) + PDF
Follow the strict ordering below

IMPORTANT: Add pdf(NULL) at the top of the visualization script to suppress stray Rplots.pdf from IOBR internal plot calls. Also clean up at the end: if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf").

Fig 01–02: TME Composition Barplots (CIBERSORT + EPIC)

IMPORTANT: Only methods that produce cell FRACTION data (CIBERSORT, EPIC) should use cell_bar_plot(). ESTIMATE scores (stromal_score, immune_score, etc.) are NOT fractions and should NOT be plotted as barplots.

tme_cb <- read.csv("03-tme/tme_cibersort.csv")
tme_ep <- read.csv("03-tme/tme_epic.csv")

# Filter QC columns
qc_cols <- grep("^[Pp][.]", colnames(tme_cb), value = TRUE)
qc_cols <- c(qc_cols, grep("^Correlation", colnames(tme_cb), value = TRUE),
                  grep("^RMSE", colnames(tme_cb), value = TRUE))
cell_cols_cb <- setdiff(grep("_CIBERSORT$", colnames(tme_cb), value = TRUE), qc_cols)

# CIBERSORT barplot
p1 <- cell_bar_plot(input = tme_cb, id = "ID", features = cell_cols_cb,
                     title = "CIBERSORT TME Composition",
                     coord_flip = TRUE, palette = 4, legend.position = "right")
p1 <- p1 + guides(fill = guide_legend(ncol = 1)) +
     theme(legend.text = element_text(size = 5),
           legend.key.size = unit(2.5, "mm"),
           legend.title = element_blank(),
           plot.title = element_text(size = 10, hjust = 0.5))
if (nrow(tme_cb) > 50) {
  p1 <- p1 + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
}

ggsave("04-figs/Fig01-barplot_cibersort.png", p1, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig01-barplot_cibersort.pdf", p1, width = 180, height = 250, units = "mm")

# Repeat for EPIC with EPIC-specific QC filtering
epic_cols <- grep("_EPIC$", colnames(tme_ep), value = TRUE)
p2 <- cell_bar_plot(input = tme_ep, id = "ID", features = epic_cols,
                     title = "EPIC TME Composition",
                     coord_flip = TRUE, palette = 4, legend.position = "right")
p2 <- p2 + guides(fill = guide_legend(ncol = 1)) +
     theme(legend.text = element_text(size = 5),
           legend.key.size = unit(2.5, "mm"),
           legend.title = element_blank(),
           plot.title = element_text(size = 10, hjust = 0.5))

ggsave("04-figs/Fig02-barplot_epic.png", p2, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig02-barplot_epic.pdf", p2, width = 180, height = 250, units = "mm")

Fig 03–04: TME Heatmaps by Subtype (pheatmap)

Do NOT use sig_heatmap() — it depends on tidyHeatmap, only outputs PDF (no PNG), and has .x/.y column suffix issues. Use pheatmap instead for dual PNG+PDF output with full control.

Heatmap dimension rules (MUST follow):

Element	Position	Parameter
Variables (cell types)	Rows (y-axis)	`rownames(mat)` — shows readable names
Samples	Columns (x-axis)	`colnames(mat)` — hidden (`show_colnames=FALSE`)
Subtype color bar	Top (column annotation)	`annotation_col`
Subtype gaps	Column gaps	`gaps_col`
Clustering	Rows only (cell types)	`cluster_rows=TRUE`, `cluster_cols=FALSE`

The input data frame has samples as rows and variables as columns. Must transpose before pheatmap: mat <- t(as.matrix(data[, cols])).

plot_subtype_heatmap <- function(data, cols, subtype_col, title, prefix,
                                  width = 8, height = 6) {
  data <- data[order(data[[subtype_col]]), ]
  mat <- t(as.matrix(data[, cols]))  # TRANSPOSE: variables -> rows, samples -> columns
  mat[mat > 2] <- 2; mat[mat < -2] <- -2
  rownames(mat) <- sub("\\.[xy]$", "", rownames(mat))
  colnames(mat) <- data$ID

  ann_col <- data.frame(Subtype = data[[subtype_col]], row.names = data$ID)
  subtypes <- sort(unique(data[[subtype_col]]))
  sub_colors <- c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B6")
  ann_colors <- list(Subtype = setNames(sub_colors[seq_along(subtypes)], subtypes))
  gaps <- cumsum(table(data[[subtype_col]]))[-length(subtypes)]

  for (fmt in c("pdf", "png")) {
    pheatmap::pheatmap(mat, annotation_col = ann_col, annotation_colors = ann_colors,
      color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
      show_colnames = FALSE, cluster_rows = TRUE, cluster_cols = FALSE,
      treeheight_row = 15, gaps_col = gaps, fontsize_row = 8,
      main = title,
      filename = paste0("04-figs/", prefix, ".", fmt), width = width, height = height, dpi = 300)
  }
}

# Fig 03: CIBERSORT — use only cibersort (not cibersort_abs) columns
hm_data <- tme_scaled[, c("ID", cb_sc, "TME_subtype")]
plot_subtype_heatmap(hm_data, cb_sc, "TME_subtype",
  "CIBERSORT TME by Subtype", "Fig03-heatmap_cibersort_subtype")

# Fig 04: MCPcounter — MUST use scaled data, not raw CSV
mcp_sc <- grep("MCPcounter$", colnames(tme_scaled), value = TRUE)
mcp_sc <- setdiff(mcp_sc, "TME_subtype")
mcp_hm <- tme_scaled[, c("ID", mcp_sc, "TME_subtype")]
plot_subtype_heatmap(mcp_hm, mcp_sc, "TME_subtype",
  "MCPcounter TME by Subtype", "Fig04-heatmap_mcpcounter_subtype", width = 6, height = 5)

pheatmap notes:

Do NOT re-scale — data is already z-scored per variable in Phase 4. Just cap at ±2.
treeheight_row = 15 — shrink clustering tree (default 50 is too tall).
Must transpose — input is samples×variables, pheatmap needs variables as rows for readable y-axis labels.
cluster_cols = FALSE — columns (samples) are ordered by subtype, clustering would destroy ordering.
gaps_col (not gaps_row) — since samples are columns, gaps go between column groups.
annotation_col (not annotation_row) — subtype annotation applies to samples (columns).
CRITICAL: Do NOT use grep("CIBERSORT") in merged data — cibersort + cibersort_abs share cell names. Read from original cibersort CSV instead.

Fig 05: TME Correlation Matrix Heatmap

Use pheatmap on the correlation matrix saved in Phase 4. Must deduplicate — cibersort + cibersort_abs share cell names, so .y columns duplicate .x. Keep only .x, drop .y, clean suffix for display.

load("04-figs/data/06-cor_matrix_full.RData")

# Deduplicate: keep .x (cibersort), drop .y (cibersort_abs)
keep <- !grepl("\\.[xy]$", rownames(cor_mat)) | grepl("\\.x$", rownames(cor_mat))
cor_mat <- cor_mat[keep, keep]
rownames(cor_mat) <- sub("\\.x$", "", rownames(cor_mat))
colnames(cor_mat) <- sub("\\.x$", "", colnames(cor_mat))

pheatmap::pheatmap(cor_mat, fontsize = 6, fontsize_row = 5, fontsize_col = 5,
                    color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
                    border_color = NA, main = "TME Cell Correlation Matrix")

Fig 06: Top 10 Wilcoxon Boxplots — 2×5 patchwork Composite

library(patchwork)

# Load top 10 wilcoxon results
wilcox_res <- read.csv("04-figs/data/04-wilcoxon_results.csv")
wilcox_res <- wilcox_res[order(wilcox_res$pvalue), ]
top10 <- head(wilcox_res, 10)

# Generate individual boxplots
box_list <- lapply(seq_len(nrow(top10)), function(i) {
  sig_box(data = tme_pdata, signature = top10$feature[i],
          variable = cat_var, palette = "paired1",
          size_of_font = 6, size_of_pvalue = 4) +
    theme(axis.title = element_text(size = 5),
          axis.text.y = element_text(size = 5),
          plot.title = element_text(size = 5),
          plot.margin = margin(12, 5, 5, 5))
})

# Arrange as 2×5 grid — height 160mm to show statistical annotations
p_combined <- wrap_plots(box_list, ncol = 5, nrow = 2)

ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.png", p_combined,
       width = 250, height = 220, units = "mm", dpi = 300)
ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.pdf", p_combined,
       width = 250, height = 220, units = "mm")

sig_box native parameters for multi-panel layouts:

palette = "paired1" — supported palettes: nrc, jama, aaas, jco, paired1-4, accent, set2 (no lancet).
size_of_font = 6 — x-axis group label font size (default 10; too large for patchwork composites).
size_of_pvalue = 4 — p-value text size (default 6; use 3–4 for multi-panel to avoid crowding).
plot.margin = margin(12, 5, 5, 5) — increased top margin so p-value is not clipped by title.
sig_box() has no method parameter — statistical test is auto-selected.

Fig 07a-d: Forest Plots — Grouped by Method (Custom ggplot2)

CRITICAL: Do NOT use sig_forest() for survival forest plots. It uses coord_trans(x = "log2") with only 3 fixed breaks (0.5, 1, 1.5), which breaks when HR values span extreme ranges (0 to 1,000,000+). Use a custom ggplot2 forest plot instead.

Split into 4 method groups and build custom forest plots with log2 scale, text annotations (HR, 95% CI, P value), and proper handling of extreme values:

surv_tme <- read.csv("04-figs/data/05-surv_tme_cells.csv")
surv_sig <- read.csv("04-figs/data/05-surv_signatures.csv")

make_forest <- function(data, title, filename, max_n = 20) {
  data <- data[order(data$P), ]
  data <- head(data, max_n)
  if (nrow(data) == 0) { cat("  Skipping:", title, "\n"); return() }

  # Ensure numeric + complete cases
  for (col in c("HR", "CI_low_0.95", "CI_up_0.95", "P"))
    data[[col]] <- as.numeric(data[[col]])
  data <- data[complete.cases(data[, c("HR", "CI_low_0.95", "CI_up_0.95", "P")]), ]
  if (nrow(data) == 0) return()

  # Format display labels — clean .x/.y suffix and deduplicate
  data$label <- sub("\\.[xy]$", "", data$ID)
  data$label <- strtrim(data$label, 30)
  dup <- duplicated(data$label)
  if (any(dup)) data$label[dup] <- paste0(data$label[dup], "_", seq_len(sum(dup)))
  data <- data[order(data$HR, decreasing = FALSE), ]
  data$label <- factor(data$label, levels = data$label)

  # Format text columns
  data$HR_text <- ifelse(data$HR >= 100, sprintf("%.0f", data$HR),
                  ifelse(data$HR >= 0.01, sprintf("%.2f", data$HR), sprintf("%.4f", data$HR)))
  data$CI_text <- paste0(
    ifelse(data$CI_low_0.95 >= 100, sprintf("%.0f", data$CI_low_0.95),
    ifelse(data$CI_low_0.95 >= 0.01, sprintf("%.2f", data$CI_low_0.95), sprintf("%.4f", data$CI_low_0.95))),
    "-",
    ifelse(data$CI_up_0.95 >= 100, sprintf("%.0f", data$CI_up_0.95),
    ifelse(data$CI_up_0.95 >= 0.01, sprintf("%.2f", data$CI_up_0.95), sprintf("%.4f", data$CI_up_0.95))))
  data$P_text <- ifelse(data$P < 0.001, sprintf("%.2e", data$P), sprintf("%.3f", data$P))

  # Log2 transform with cap for extreme values
  cap <- 10
  data$logHR    <- log2(data$HR);       data$logHR[!is.finite(data$logHR)] <- 0
  data$logCI_low <- log2(data$CI_low_0.95); data$logCI_low[!is.finite(data$logCI_low)] <- -cap
  data$logCI_up  <- log2(data$CI_up_0.95);  data$logCI_up[!is.finite(data$logCI_up)]   <-  cap
  data$logCI_low <- pmax(data$logCI_low, -cap)
  data$logCI_up  <- pmin(data$logCI_up,   cap)

  # Build ggplot
  p <- ggplot(data, aes(x = logHR, y = label)) +
    geom_errorbarh(aes(xmin = logCI_low, xmax = logCI_up),
                   height = 0.2, linewidth = 0.8, color = "grey30") +
    geom_point(aes(color = P), size = 3.5) +
    geom_vline(xintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
    scale_color_gradientn(colors = c("#B2182B", "#EF8A62", "#FDDBC7", "#D1E5F0", "#2166AC"),
                          name = "P value", trans = "log10") +
    geom_text(aes(x = cap + 1.5, label = HR_text), hjust = 0, size = 2.5) +
    geom_text(aes(x = cap + 4.5, label = CI_text), hjust = 0, size = 2.5, color = "grey30") +
    geom_text(aes(x = cap + 9.5, label = P_text), hjust = 0, size = 2.5, color = "grey30") +
    annotate("text", x = cap + 1.5, y = nrow(data) + 1.5, label = "HR", hjust = 0, size = 2.5, fontface = "bold") +
    annotate("text", x = cap + 4.5, y = nrow(data) + 1.5, label = "95% CI", hjust = 0, size = 2.5, fontface = "bold") +
    annotate("text", x = cap + 9.5, y = nrow(data) + 1.5, label = "P value", hjust = 0, size = 2.5, fontface = "bold") +
    scale_x_continuous(breaks = c(-8,-4,-2,-1,0,1,2,4,8),
                       labels = c("1/256","1/16","1/4","0.5","1","2","4","16","256")) +
    coord_cartesian(xlim = c(-cap, cap + 13), clip = "off") +
    labs(x = "Hazard Ratio (log2 scale)", y = "", title = title) +
    theme_light() + theme(
      axis.text.y = element_text(size = 7), axis.text.x = element_text(size = 7),
      plot.title = element_text(size = 10, hjust = 0.5, face = "bold"),
      legend.position = "bottom", plot.margin = margin(5, 40, 5, 5, "mm"))

  plot_h <- max(4, nrow(data) * 0.4 + 1.5)
  ggsave(paste0("04-figs/", filename, ".png"), p, width = 12, height = plot_h, units = "in", dpi = 300)
  ggsave(paste0("04-figs/", filename, ".pdf"), p, width = 12, height = plot_h)
}

# 07a: CIBERSORT
cb_surv <- surv_tme[grepl("CIBERSORT", surv_tme$ID), ]
make_forest(cb_surv, "CIBERSORT Survival Forest", "Fig07a-forest_cibersort")

# 07b: Other TME methods
other_surv <- surv_tme[!grepl("CIBERSORT", surv_tme$ID), ]
make_forest(other_surv, "Other TME Methods Survival Forest", "Fig07b-forest_other_tme")

# 07c: TME signatures (non-GO/KEGG)
gkegg_pattern <- "^HALLMARK_|^KEGG_|^GO_|^REACTOME_"
tme_sig_surv <- surv_sig[!grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(tme_sig_surv, "TME Signatures Survival Forest", "Fig07c-forest_tme_signature")

# 07d: GO/KEGG/Hallmark pathways
gkegg_surv <- surv_sig[grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(gkegg_surv, "Pathway Survival Forest", "Fig07d-forest_go_kegg")

Key notes:

Do NOT use sig_forest() — it only has 3 fixed axis breaks (0.5, 1, 1.5) and breaks with extreme HR values. Custom ggplot2 forest plot handles all cases.
Custom forest plot features: log2 scale with 9 breaks, text annotations for HR/CI/P, capped error bars for extreme values, dynamic plot height.
Split into 4 groups (07a-d) for readability: CIBERSORT, Other TME, TME Signatures, GO/KEGG/Hallmark.

Fig 08: KM Plots — p<0.005 Significant Variables

# sig_surv_plot() saves its own output — do NOT wrap with ggsave()
# Create kmplot subdirectory first
dir.create("04-figs/kmplot", showWarnings = FALSE, recursive = TRUE)

# Merge TME + signature survival results
surv_all <- rbind(surv_tme, surv_sig)
surv_sorted <- surv_all[order(surv_all$P), ]
sig_vars <- surv_sorted$ID[surv_sorted$P < 0.005]
if (length(sig_vars) == 0) sig_vars <- surv_sorted$ID[surv_sorted$P < 0.01]

surv_data <- read.csv("04-figs/data/01-tme_pdata_merged.csv")

for (i in seq_along(sig_vars)) {
  sig_surv_plot(input_pdata = surv_data, signature = sig_vars[i],
                time = time_col, status = status_col, time_type = "month",
                palette = "jama", mini_sig = "score",
                fig.type = "pdf", save_path = "04-figs/kmplot",
                project = paste0("KM_", i), index = i)
}

Key notes:

sig_surv_plot() / roc_time() save their own output — Do NOT wrap with ggsave(). Use their save_path / path and fig.type / fig_type parameters.
KM plots saved to 04-figs/kmplot/ subfolder — each variable generates multiple files (best-cutoff, 3-group, 2-group KM plots + RData).
time_type: "day", "month", or "year"
mini_sig: "score" (continuous, auto-split by median) or "category" (already grouped)

Fig 09: TME Subtype Boxplots — Top Feature Per Subtype, Patchwork Composite

For each TME subtype, find the CIBERSORT cell type with highest mean (scaled), then create one sig_box per subtype and combine with patchwork.

# Read CIBERSORT columns from original file
cb_cols <- setdiff(colnames(tme_cb), "ID")
cb_cols <- cb_cols[!grepl(paste(qc_patterns, collapse = "|"), cb_cols)]
cb_sc <- intersect(cb_cols, colnames(tme_sc))
if (length(cb_sc) == 0) cb_sc <- intersect(paste0(cb_cols, ".x"), colnames(tme_sc))

# For each subtype, find top cell (highest scaled mean)
subtypes <- sort(unique(tme_sc$TME_subtype))
subtype_top_cells <- character(0)
for (st in subtypes) {
  st_data <- tme_sc[tme_sc$TME_subtype == st, cb_sc]
  cell_means <- colMeans(st_data, na.rm = TRUE)
  subtype_top_cells[st] <- names(sort(cell_means, decreasing = TRUE))[1]
}

# One sig_box per subtype, patchwork composite
box09_list <- lapply(subtypes, function(st) {
  col <- subtype_top_cells[st]
  clean_name <- sub("\\.[xy]$", "", col)
  sig_box(data = tme_sc, signature = col,
          variable = "TME_subtype", palette = "paired1",
          size_of_font = 6, size_of_pvalue = 4) +
    ggtitle(paste0(st, ": ", clean_name)) +
    theme(axis.title = element_text(size = 6),
          axis.text.y = element_text(size = 5),
          plot.title = element_text(size = 7),
          plot.margin = margin(12, 5, 5, 5))
})
p09 <- wrap_plots(box09_list, ncol = length(subtypes))

ggsave("04-figs/Fig09-subtype_boxplot.png", p09,
       width = 50 * length(subtypes), height = 100, units = "mm", dpi = 300)
ggsave("04-figs/Fig09-subtype_boxplot.pdf", p09,
       width = 50 * length(subtypes), height = 100, units = "mm")

Note: Uses CIBERSORT columns from scaled data (Phase 4 output). Each subtype's top cell is the one with highest mean z-score in that subtype — this is the most representative feature for that cluster.

Clean up stray Rplots.pdf (generated by some IOBR internal plot calls)

Suppress with `pdf(NULL)` at top of visualization script

if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf")

Additional Visualization Functions

Use these as needed based on analysis requirements:

Function	Purpose	Output
`roc_time()`	Time-dependent ROC with AUC at multiple time points	Figure (self-saving)
`sig_roc()`	Multiple ROC curves for binary response prediction	Figure
`batch_sig_surv_plot()`	Batch KM curves for multiple signatures across projects	Figures (self-saving)
`surv_group()`	KM curve with risk table (High vs Low)	Figure
`subgroup_survival()`	Subgroup Cox analysis by covariates	Statistics table
`get_cor()`	Single pairwise correlation with scatter plot + regression line	Figure
`iobr_cor_plot()`	Batch correlation visualization (signature vs signature/phenotype)	Figure
`iobr_pca()`	PCA dimensionality reduction with scatter plot	Figure
`iobr_deg()`	DEG with built-in volcano plot and heatmap output	Statistics + figures
`sig_gsea()`	Gene Set Enrichment Analysis via fgsea, with built-in plots	Statistics + figures
`LR_cal()`	Ligand-receptor interaction analysis	Statistics + figures

# Time-dependent ROC — saves its own output
roc_time(input = tme_pdata, vars = c("T_cells_CD8_CIBERSORT", "TMEscoreA_CIR"),
         time = "OS_time", status = "OS_status",
         time_point = c(12, 36, 60), time_type = "month", palette = "jama",
         path = "04-figs", fig.type = "png", index = 1)

# PCA visualization
iobr_pca(data = eset, pdata = pdata, group = "TME_subtype", scale = TRUE)

# Single correlation scatter plot
get_cor(data = tme_pdata, variable1 = "T_cells_CD8_CIBERSORT",
        variable2 = "CD_8_T_effector", method = "spearman")

IOBR Palettes

Categorical (palette_group parameter):

Name	Best For
`"jama"`	Default, 2–5 groups
`"jco"`	Clinical oncology
`"nrc"`	Nature publications
`"aaas"`	Science journal
`"accent"`	Colorblind-friendly
`"set2"`	Many groups

Heatmap (palette parameter, numeric 1–6):

#	Style
2	Blue-white-red diverging (default)
1	Classic pheatmap
3	Purple-orange
4–6	Sequential gradients

Cell bar plot (cell_bar_plot() palette parameter, numeric 1–4):

#	Style
4	Recommended — rich random palette with many distinct colors
1–3	Alternative random palettes

If unsure which palette, present 3 options to the user and ask.

User Decision Points

Pause and ask at:

Phase 0: Review and approve analysis plan (IOBR-pipeline.md)
Phase 1: Data type (Count/TPM/Array) and species (hsa/mmu)
Phase 2: Which deconvolution method(s)
Phase 3: Scoring method and which signature collection
Phase 5: Color palette preference

05-note Template: `IOBR-analysis-README.md`

Generate a tree diagram with brief descriptions:

project/
├── 01-script/
│   ├── 01-data_preprocessing.R      # QC, normalization, annotation
│   ├── 02-tme_deconvolution.R       # TME cell type deconvolution
│   ├── 03-signature_analysis.R      # Signature scoring
│   ├── 04-statistical_analysis.R    # Statistics, clustering, survival
│   └── 05-visualization.R           # All figures
├── 02-input/
│   ├── annotated_eset.csv           # Normalized expression matrix
│   └── pdata.csv                    # Phenotype data
├── 03-tme/
│   ├── tme_cibersort.csv            # CIBERSORT results
│   ├── tme_mcpcounter.csv           # MCPcounter results
│   ├── sig_score_tme.csv            # ssGSEA scores
│   └── tme_sig_combine.csv          # All merged results
├── 04-figs/
│   ├── Fig01-barplot_cibersort.png/pdf
│   ├── Fig02-barplot_epic.png/pdf
│   ├── Fig03-heatmap_cibersort_subtype.png/pdf   (pheatmap)
│   ├── Fig04-heatmap_mcpcounter_subtype.png/pdf  (pheatmap)
│   ├── Fig05-cor_matrix.png/pdf
│   ├── Fig06-top10_wilcoxon_boxplot.png/pdf
│   ├── Fig07a-forest_cibersort.png/pdf
│   ├── Fig07b-forest_other_tme.png/pdf
│   ├── Fig07c-forest_tme_signature.png/pdf
│   ├── Fig07d-forest_go_kegg.png/pdf
│   ├── kmplot/                        # KM survival plots (auto-saved by sig_surv_plot)
│   └── Fig09-subtype_boxplot.png/pdf
│   └── data/                        # Statistical result tables
│       ├── 01-tme_pdata_merged.csv
│       ├── 02-tme_scaled.csv
│       ├── 03-tme_subtype.csv
│       ├── 04-wilcoxon_results.csv
│       ├── 05-surv_results.csv
│       └── 06-cor_matrix.csv
├── 05-note/
│   ├── IOBR-pipeline.md             # Analysis plan
│   ├── pdata_summary.csv            # Variable summary
│   └── IOBR-analysis-README.md      # This file
└── 06-log/
    ├── 01-data_preprocessing.log
    └── ...

Include a brief summary of key findings below the tree.

06-log Directory

One log file per script, named to match: 01-data_preprocessing.log, 02-tme_deconvolution.log, ...
Each log MUST begin with the IOBR citation header (same as scripts).

Capture logs by running scripts with redirection:

{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/01-data_preprocessing.R; } > 06-log/01-data_preprocessing.log 2>&1

Quick Start Checklist

Check IOBR installation in base environment
Create project directory structure (01-script through 06-log + 04-figs/data)
Confirm data source, type, and species with user
Generate 05-note/IOBR-pipeline.md — present to user for approval
Write 01-data_preprocessing.R (with citation header), run, capture log
Verify sample matching (eset vs pdata), generate 05-note/pdata_summary.csv
Ask user to choose deconvolution method(s)
Write 02-tme_deconvolution.R, run, capture log
Ask user about scoring method and gene sets
Write 03-signature_analysis.R, run, capture log
Write 04-statistical_analysis.R — merge, scale, cluster, test, correlate, save to 04-figs/data/
Ask user about palette preferences
Write 05-visualization.R — follow strict Fig01–09 ordering
Generate 05-note/IOBR-analysis-README.md
Verify all 06-log files have citation headers

References (in this skill directory)

references/functions.md — IOBR function parameter reference
references/palettes.md — Color palette selection guide
references/iobr_built_in_data.md — Complete catalog of built-in signatures, annotations, and datasets
references/iobr_pipeline_template.R — Full production pipeline template (9-method deconvolution + signature + pathway scoring)

Read these when you need specific parameter details, palette options, want to know what signatures are available, or need a complete pipeline template to adapt.

IOBRskill

Resources

Install

IOBRskill — Bulk Transcriptome Tumor Microenvironment Analysis

IOBR Citation (MUST include in every R script and log)

Prerequisites

Environment Check (first step)

Required Dependencies

IOBR Built-in Data

On-demand pathway data via load_data()

Project Directory Structure

Analysis Pipeline

Phase 0: Plan Generation

Phase 1: Data Preprocessing and Annotation

Confirm with the user:

Sample matching (if both eset and pdata exist):

pdata variable summary:

Key functions:

Phase 2: TME Deconvolution

Single-method usage:

Full 9-method deconvolution + merge (production workflow):

Critical caveats:

Phase 3: Signature Scoring and Pathway Analysis

Scoring methods (ask user to choose):

Step 3a: TME / IO biomarker signature scoring:

Step 3b: Pathway scoring (Hallmark + GO + KEGG + Reactome):

Step 3c: Merge all results into master table:

Key notes:

TME subtype clustering:

Phase 4: Statistical Analysis

4a. Data Preparation

4b. TME Subtyping

4c. Statistical Testing (conditional)

4d. Correlation

4e. Other Statistical Functions

Phase 5: Visualization

Fig 01–02: TME Composition Barplots (CIBERSORT + EPIC)

Fig 03–04: TME Heatmaps by Subtype (pheatmap)

Fig 05: TME Correlation Matrix Heatmap

Fig 06: Top 10 Wilcoxon Boxplots — 2×5 patchwork Composite

Fig 07a-d: Forest Plots — Grouped by Method (Custom ggplot2)

Fig 08: KM Plots — p<0.005 Significant Variables

Fig 09: TME Subtype Boxplots — Top Feature Per Subtype, Patchwork Composite

Clean up stray Rplots.pdf (generated by some IOBR internal plot calls)

Suppress with pdf(NULL) at top of visualization script

Additional Visualization Functions

IOBR Palettes

User Decision Points

05-note Template: IOBR-analysis-README.md

06-log Directory

Quick Start Checklist

References (in this skill directory)

Categories

Install

Recommended Skills

On-demand pathway data via `load_data()`

Suppress with `pdf(NULL)` at top of visualization script

05-note Template: `IOBR-analysis-README.md`