End-to-end bulk RNA-seq / microarray tumor microenvironment (TME) analysis using the IOBR R package. Covers data preprocessing, gene annotation, TME deconvolution (CIBERSORT/MCPcounter/EPIC/xCell/ESTIMATE/TIMER/quanTIseq/IPS), signature scoring (PCA/ssGSEA/Z-score), ligand-receptor analysis, pathway enrichment, survival modeling, and Nature-style visualization. Use this skill when the user mentions: tumor microenvironment analysis, TME deconvolution, immune infiltration analysis, 肿瘤微环境分析, 肿瘤微环境解析, 免疫浸润分析, ligand-receptor analysis, 受体配体分析, IOBR, immune cell deconvolution, CIBERSORT, MCPcounter, ESTIMATE, or any bulk RNA-seq immuno-oncology workflow. Also trigger on /IOBRskill.
Resources
5Install
npx skillscat add iobr/iobrskill Install via the SkillsCat registry.
IOBRskill — Bulk Transcriptome Tumor Microenvironment Analysis
IOBR Citation (MUST include in every R script and log)
Every R script generated by this skill MUST begin with this citation header:
###############################################################################
# IOBR Citation:
# Zeng DQ, Fang YR, … , Yu GC, Liao WJ.
# Enhancing Immuno-Oncology Investigations Through Multidimensional Decoding of
# Tumour Microenvironment with IOBR 2.0, Cell Reports Methods, 2024
# https://doi.org/10.1016/j.crmeth.2024.100910
###############################################################################The same citation header MUST appear at the top of every log file in 06-log/.
Prerequisites
Environment Check (first step)
Verify IOBR is installed in the base conda environment:
source /home/bio/miniconda3/etc/profile.d/conda.sh
conda activate base
Rscript -e 'packageVersion("IOBR")'If IOBR is not installed:
if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager")
BiocManager::install("IOBR/IOBR")Required Dependencies
IOBR relies on several packages that are NOT auto-installed. Install these before starting:
# For CIBERSORT
BiocManager::install("preprocessCore")
# For quanTIseq deconvolution
install.packages("limSolve", repos = "https://cloud.r-project.org")
# For tme_cluster()
install.packages("NbClust", repos = "https://cloud.r-project.org")
# For iobr_pca()
install.packages("FactoMineR", repos = "https://cloud.r-project.org")
# For sig_heatmap() — only if using tidyHeatmap approach (pheatmap preferred)
# install.packages("tidyHeatmap", repos = "https://cloud.r-project.org")IOBR Built-in Data
IOBR ships with rich built-in data — you rarely need external annotation files. For full details, read references/iobr_built_in_data.md. Key resources:
| Resource | Access | What It Contains |
|---|---|---|
| Gene length / ID mapping | count2tpm() auto-loads |
111 genes with HUGO, ENTREZID, ENSEMBL |
| Microarray probe sets | extdata/probesets.txt |
176 probes mapped to cell types |
| TME signatures (186) | IOBR:::signature_tme |
Immune, stromal, EMT, checkpoint signatures |
| Full collection (323) | data("signature_collection") |
All signatures + Hallmark + Metabolism |
| Signature groupings | data("sig_group") |
35+ categories for filtering signatures |
| Pathway data (on-demand) | IOBR::load_data() |
hallmark (50), go_bp (7658), kegg (186), reactome (1615) |
| Example datasets | data("pdata_stad") etc. |
TCGA-STAD, IMvigor210 toy data |
On-demand pathway data via load_data()
IOBR provides IOBR::load_data() to download and cache pathway gene sets from GitHub on first use. These are essential for comprehensive pathway scoring:
hallmark <- IOBR::load_data("hallmark") # 50 pathways
go_bp <- IOBR::load_data("go_bp") # 7,658 gene sets
go_cc <- IOBR::load_data("go_cc")
go_mf <- IOBR::load_data("go_mf")
kegg <- IOBR::load_data("kegg") # 186 pathways
reactome <- IOBR::load_data("reactome") # 1,615 pathwaysProject Directory Structure
Create this structure before starting analysis:
project/
├── 01-script/ # R scripts (numbered, each with IOBR citation header)
├── 02-input/ # Normalized and annotated expression matrices
├── 03-tme/ # TME deconvolution and signature scoring results
├── 04-figs/
│ ├── *.png/pdf # All figures (PNG 300dpi + PDF, numbered naming)
│ ├── kmplot/ # KM survival plots (auto-saved by sig_surv_plot)
│ └── data/ # Statistical result tables matching figures
├── 05-note/
│ ├── IOBR-pipeline.md # Analysis plan (generated in Phase 0)
│ ├── pdata_summary.csv # Phenotype variable summary
│ └── IOBR-analysis-README.md # Final analysis summary
└── 06-log/ # R execution logs (one per script, with citation header)Naming conventions:
- Scripts:
01-data_preprocessing.R,02-tme_deconvolution.R, ... - Figures:
Fig01-barplot_cibersort.png+Fig01-barplot_cibersort.pdf(always dual format) - Statistical tables:
04-figs/data/01-tme_pdata_merged.csv,04-figs/data/04-wilcoxon_results.csv, ... - Logs:
01-data_preprocessing.log,02-tme_deconvolution.log, ... - Note:
IOBR-analysis-README.md(tree diagram + brief interpretation)
When running a script, capture the log:
Rscript 01-script/01-data_preprocessing.R > 06-log/01-data_preprocessing.log 2>&1Analysis Pipeline
6 sequential phases (Phase 0–5). Pause at decision points to let the user choose.
Phase 0: Plan Generation
Before writing any analysis code, generate an analysis plan document.
After confirming data type / species / deconvolution methods with the user (Phase 1 questions), create 05-note/IOBR-pipeline.md with:
- Project metadata: project name, data source, data type, species
- ASCII tree diagram of all scripts, their inputs/outputs, and expected results:
01-data_preprocessing.R
├─ Input: raw expression matrix + phenotype data
├─ Output: 02-input/annotated_eset.csv, 02-input/pdata.csv
└─ Notes: log2 transformation, gene ID conversion
02-tme_deconvolution.R
├─ Input: 02-input/annotated_eset.csv
├─ Output: 03-tme/tme_cibersort.csv, 03-tme/tme_mcpcounter.csv, ...
└─ Notes: CIBERSORT + MCPcounter + ESTIMATE (linear-scale data)
03-signature_analysis.R
├─ Input: 02-input/annotated_eset.csv
├─ Output: 03-tme/sig_score_tme.csv, 03-tme/sig_score_pathway.RData
└─ Notes: ssGSEA for TME signatures, Hallmark + KEGG pathways
04-statistical_analysis.R
├─ Input: 03-tme/*.csv + 02-input/pdata.csv
├─ Output: 04-figs/data/01-tme_pdata_merged.csv
│ 04-figs/data/02-tme_scaled.csv
│ 04-figs/data/03-tme_subtype.csv
│ 04-figs/data/04-wilcoxon_results.csv
│ 04-figs/data/05-surv_results.csv
│ 04-figs/data/06-cor_matrix.csv
└─ Notes: merge → scale → cluster → wilcoxon → survival → correlation
05-visualization.R
├─ Input: 04-figs/data/*.csv + 03-tme/*.csv
├─ Output: 04-figs/Fig01-09 (PNG + PDF)
└─ Notes: barplots → heatmaps → correlation → boxplots → forest → KM- Present to user: "开始分析" or "调整计划"
- Only after approval, begin Phase 1
Phase 1: Data Preprocessing and Annotation
Script: 01-data_preprocessing.R
Confirm with the user:
- Data source: User matrix, GEO (GSE#####), or TCGA
- Data type:
"Count"(raw RNA-seq),"Tpm"(already normalized), or"Array"(microarray) - Species:
"hsa"(human) or"mmu"(mouse)
Sample matching (if both eset and pdata exist):
Before any analysis, verify sample overlap between expression matrix and phenotype data:
# Report sample counts
eset_samples <- colnames(eset)
pdata_samples <- pdata$ID # or rownames(pdata)
intersection <- intersect(eset_samples, pdata_samples)
cat("Expression matrix samples:", length(eset_samples), "\n")
cat("Phenotype data samples:", length(pdata_samples), "\n")
cat("Matched samples:", length(intersection), "\n")
# Ask user: use all samples, intersection only, or check data?
# Then filter both to agreed sample set
eset <- eset[, intersection]
pdata <- pdata[pdata$ID %in% intersection, ]pdata variable summary:
After loading pdata, generate a summary report:
# Generate pdata_summary.csv
pdata_summary <- data.frame(
var = character(),
type = character(),
n_levels = integer(),
levels = character(),
stringsAsFactors = FALSE
)
for (col in colnames(pdata)) {
vals <- pdata[[col]]
if (is.numeric(vals)) {
tp <- ifelse(length(unique(vals)) <= 2, "binary", "numeric")
lvls <- paste(range(vals, na.rm = TRUE), collapse = " - ")
n_lv <- length(unique(vals))
} else {
tp <- "factor"
lvls <- paste(unique(vals)[1:min(10, length(unique(vals)))], collapse = ", ")
n_lv <- length(unique(vals))
}
pdata_summary <- rbind(pdata_summary, data.frame(
var = col, type = tp, n_levels = n_lv, levels = lvls
))
}
write.csv(pdata_summary, file = "05-note/pdata_summary.csv", row.names = FALSE)
cat(">>>>> pdata summary saved to 05-note/pdata_summary.csv\n")This summary helps decide which statistical tests to apply in Phase 4.
Key functions:
| Function | Purpose | When to Use |
|---|---|---|
count2tpm() |
Count → TPM + gene ID mapping | Raw RNA-seq counts |
log2eset() |
Auto-detect and apply log2 transform | Ensures log2 scale for downstream analysis |
anno_eset() |
Probe → gene symbol annotation | Microarray data or unannotated matrices |
remove_batcheffect() |
ComBat / ComBat-seq batch correction | Multi-batch datasets |
find_outlier_samples() |
WGCNA-based outlier detection | QC step |
iobr_pca() |
PCA visualization for batch assessment | QC step |
mouse2human_eset() |
Mouse → human gene ortholog conversion | Mouse data |
# RNA-seq: counts → TPM
eset <- count2tpm(countMat = expr_matrix, idType = "Ensembl", org = "hsa")
# Microarray: probe annotation
eset <- anno_eset(eset = expr_matrix, annotation = anno_df,
symbol = "gene_symbol", probe = "probe_id", method = "mean")
# Ensure log2 scale (auto-detects — only transforms if not already log2)
eset <- log2eset(eset)Save output:
write.csv(eset, file = "02-input/annotated_eset.csv")
write.csv(pdata, file = "02-input/pdata.csv")Phase 2: TME Deconvolution
Script: 02-tme_deconvolution.R
Ask the user which method(s) to use.
| Method | Cell Types | Best For |
|---|---|---|
"cibersort" |
22 (LM22) | Gold standard, comprehensive |
"cibersort_abs" |
22 (LM22) | CIBERSORT absolute mode (fraction-independent) |
"mcpcounter" |
8 populations | Quick, stromal + immune |
"epic" |
8 incl. cancer cells | Tumor purity |
"xcell" |
64 types | Broadest coverage (slow) |
"estimate" |
4 scores | Stromal/Immune/Tumor purity |
"timer" |
6 types | TCGA cancer-specific |
"quantiseq" |
10 incl. M1/M2 | Macrophage polarization |
"ips" |
4 axes | Immunotherapy response |
Recommended: CIBERSORT + MCPcounter + ESTIMATE (most cited). For comprehensive analysis, run all 9 methods and merge.
Single-method usage:
tme_result <- deconvo_tme(eset = eset, method = "cibersort", arrays = FALSE, perm = 1000)
write.csv(tme_result, file = "03-tme/tme_cibersort.csv")Full 9-method deconvolution + merge (production workflow):
# Each method may take 1-5 minutes depending on sample size
# Use tryCatch for each method — some may fail on certain datasets
cibersort <- tryCatch(deconvo_tme(eset = eset, method = "cibersort",
arrays = FALSE, perm = 1000), error = function(e) NULL)
cibersort_abs <- tryCatch(deconvo_tme(eset = eset, method = "cibersort_abs",
arrays = FALSE, perm = 1000), error = function(e) NULL)
epic <- tryCatch(deconvo_tme(eset = eset, method = "epic",
arrays = FALSE), error = function(e) NULL)
mcp <- tryCatch(deconvo_tme(eset = eset, method = "mcpcounter"), error = function(e) NULL)
xcell <- tryCatch(deconvo_tme(eset = eset, method = "xcell",
arrays = FALSE), error = function(e) NULL)
estimate <- tryCatch(deconvo_tme(eset = eset, method = "estimate"), error = function(e) NULL)
timer <- tryCatch(deconvo_tme(eset = eset, method = "timer",
group_list = rep(tumor_type, ncol(eset))), error = function(e) NULL)
quantiseq <- tryCatch(deconvo_tme(eset = eset, method = "quantiseq",
tumor = TRUE, arrays = FALSE, scale_mrna = TRUE), error = function(e) NULL)
ips <- tryCatch(deconvo_tme(eset = eset, method = "ips", plot = FALSE), error = function(e) NULL)
# Merge all results by sample ID using base R merge()
results_list <- list(cibersort, cibersort_abs, mcp, xcell, epic, estimate, quantiseq, timer, ips)
results_list <- results_list[!sapply(results_list, is.null)]
tme_combine <- results_list[[1]]
for (i in seq_along(results_list)[-1]) {
tme_combine <- merge(tme_combine, results_list[[i]], by = "ID")
}
save(tme_combine, file = "03-tme/tme_combine.RData")
write.csv(tme_combine, file = "03-tme/tme_combine.csv", row.names = FALSE)Critical caveats:
- All deconvolution methods (except IPS) need linear-scale data — Anti-log first:
eset_linear <- 2^esetif log2-transformed. CIBERSORT, MCPcounter, ESTIMATE, EPIC, quanTIseq, xCell all warn "Data values appear small (<50)" when fed log2 data. Only IPS works with log2 data directly. - Column names have method suffix —
T_cells_CD8_CIBERSORT,T_cells_MCPcounter, etc. - Output CSV has
IDcolumn — Set rownames:rownames(df) <- df$ID; df$ID <- NULL. arrays = TRUEfor microarray;perm = 1000recommended for CIBERSORT.- Required packages —
preprocessCore(CIBERSORT),limSolve(quanTIseq). Install before running. - TIMER requires
group_list— Provide cancer type per sample:rep(tumor_type, ncol(eset)). - quanTIseq parameters — Use
tumor = TRUE, arrays = FALSE, scale_mrna = TRUEfor tumor samples. - Use
tryCatchfor each method — some methods may fail on certain datasets. Wrap each call to avoid losing partial results.
Phase 3: Signature Scoring and Pathway Analysis
Script: 03-signature_analysis.R
Scoring methods (ask user to choose):
| Method | Best For |
|---|---|
"ssgsea" |
Default, robust |
"pca" |
Continuous scores, large gene sets |
"zscore" |
Simple, fast |
"integration" |
Combined approach |
Step 3a: TME / IO biomarker signature scoring:
# Option A: TME signatures (186 sets, internal — use :::)
signature_tme <- IOBR:::signature_tme
sig_tme <- calculate_sig_score(pdata = NULL, eset = eset,
signature = signature_tme,
method = "ssgsea",
mini_gene_count = 3,
adjust_eset = TRUE)
write.csv(sig_tme, file = "03-tme/sig_score_tme.csv")
# Option B: Full collection (323 sets) — for comprehensive analysis
data("signature_collection")
sig_res <- calculate_sig_score(pdata = NULL, eset = eset,
signature = signature_collection,
method = "pca",
mini_gene_count = 2,
adjust_eset = TRUE)
write.csv(sig_res, file = "03-tme/sig_score_collection.csv")Step 3b: Pathway scoring (Hallmark + GO + KEGG + Reactome):
hallmark <- IOBR::load_data("hallmark") # 50 pathways
go_bp <- IOBR::load_data("go_bp") # 7,658 gene sets
go_cc <- IOBR::load_data("go_cc")
go_mf <- IOBR::load_data("go_mf")
kegg <- IOBR::load_data("kegg") # 186 pathways
reactome <- IOBR::load_data("reactome") # 1,615 pathways
sig_pathway <- calculate_sig_score(pdata = NULL, eset = eset,
signature = c(hallmark, go_bp, go_cc, go_mf, kegg, reactome),
method = "ssgsea",
mini_gene_count = 3)
write.csv(sig_pathway, file = "03-tme/sig_score_pathway.csv")Step 3c: Merge all results into master table:
tme_sig_combine <- merge(tme_combine, sig_res, by = "ID")
tme_sig_combine <- merge(tme_sig_combine, sig_pathway, by = "ID")
write.csv(tme_sig_combine, file = "03-tme/tme_sig_combine.csv", row.names = FALSE)Key notes:
pdata = NULL, adjust_eset = TRUE— This is the correct production pattern forcalculate_sig_score(). Do NOT setadjust_eset = FALSEunless data is already properly scaled.- Method names are lowercase — Use
"pca","ssgsea","zscore","integration"(NOT uppercase"PCA"). signature_tmeis internal — Access viaIOBR:::signature_tme(triple colon).signature_collectionneedsdata()— Calldata("signature_collection")before use.IOBR::load_data()downloads on first use — Caches locally for subsequent calls.- Pathway scoring is computationally heavy — Consider running Hallmark (50) and KEGG (186) first for quick results, then GO/Reactome for comprehensive analysis.
- Use base R
merge()for joining — Not all environments havetidyverse. Prefermerge(df1, df2, by = "ID")over%>% inner_join(). tme_cluster()requires NbClust — Install:install.packages("NbClust").
Use references/iobr_built_in_data.md for the full signature catalog and data("sig_group") for filtering by category.
TME subtype clustering:
tme_clusters <- tme_cluster(input = tme_result, features = cell_types,
id = "sample_id", method = "kmeans")Phase 4: Statistical Analysis
Script: 04-statistical_analysis.R
This phase produces structured output: every statistical result is saved to 04-figs/data/ as a numbered CSV file for traceability and reuse in Phase 5 visualization.
4a. Data Preparation
Merge TME results with pdata, then scale all TME cell columns:
# Load data
pdata <- read.csv("02-input/pdata.csv")
tme <- read.csv("03-tme/tme_combine.csv")
# Merge pdata with TME results
tme_pdata <- merge(pdata, tme, by = "ID")
write.csv(tme_pdata, file = "04-figs/data/01-tme_pdata_merged.csv", row.names = FALSE)
# Collect TME cell columns — MUST match actual column names in merged data
# After merge, cibersort + cibersort_abs shared names get .x/.y suffixes
# So match by checking actual colname OR colname-without-suffix against original list
qc_patterns <- c("^P[.]", "^Correlation", "^RMSE")
all_tme_cell_cols <- colnames(tme_pdata)[
colnames(tme_pdata) %in% original_cell_cols |
gsub("\\.[xy]$", "", colnames(tme_pdata)) %in% original_cell_cols
]
all_tme_cell_cols <- setdiff(all_tme_cell_cols, "ID")
# Z-score scale all TME cell columns (per column = per variable)
tme_scaled <- tme_pdata
tme_scaled[, all_tme_cell_cols] <- scale(tme_scaled[, all_tme_cell_cols])
write.csv(tme_scaled, file = "04-figs/data/02-tme_scaled.csv", row.names = FALSE)4b. TME Subtyping
Cluster samples by TME cell composition using CIBERSORT fractions:
# Extract CIBERSORT cell columns for clustering
cb_cols <- grep("_CIBERSORT$", colnames(tme_pdata), value = TRUE)
cb_cols <- setdiff(cb_cols, grep("P[.]value|Correlation|RMSE", cb_cols, value = TRUE))
# K-means clustering on scaled CIBERSORT data
set.seed(123)
tme_subtype <- tme_cluster(input = tme_scaled, features = cb_cols,
id = "ID", method = "kmeans", max.nc = 6)
# tme_cluster returns data.frame with columns: ID, cluster, <cell_cols>
# Extract cluster assignments and label as TME subtypes
subtype_df <- data.frame(ID = tme_subtype$ID, TME_subtype = paste0("TME", tme_subtype$cluster))
write.csv(subtype_df, file = "04-figs/data/03-tme_subtype.csv", row.names = FALSE)
# Find representative TME variables per subtype
markers <- find_markers_in_bulk(pdata = tme_scaled, eset = tme_scaled,
group = "TME_subtype", nfeatures = 50)
write.csv(markers, file = "04-figs/data/03-tme_subtype_markers.csv", row.names = FALSE)4c. Statistical Testing (conditional)
Run statistical tests based on what's available in pdata:
# --- Wilcoxon test (IF categorical variables exist in pdata) ---
# Check pdata_summary.csv for factor/binary variables
pdata_summary <- read.csv("05-note/pdata_summary.csv")
cat_vars <- pdata_summary$var[pdata_summary$type %in% c("factor", "binary")]
if (length(cat_vars) > 0) {
for (v in cat_vars) {
lvls <- unique(tme_pdata[[v]])
lvls <- lvls[!is.na(lvls)]
if (length(lvls) == 2) {
wilcox_res <- batch_wilcoxon(data = tme_pdata, target = v, feature = cell_cols)
write.csv(wilcox_res,
file = paste0("04-figs/data/04-wilcoxon_", v, ".csv"),
row.names = FALSE)
}
}
}
# --- Survival analysis (IF survival data exists) ---
# Check for time + status columns
time_cols <- grep("time|Time|OS_time|PFS_time", colnames(tme_pdata), value = TRUE)
status_cols <- grep("status|Status|OS_status|PFS_status", colnames(tme_pdata), value = TRUE)
if (length(time_cols) > 0 && length(status_cols) > 0) {
time_col <- time_cols[1]
status_col <- status_cols[1]
# Batch Cox regression for all TME cell columns
surv_res <- batch_surv(pdata = tme_pdata, variable = cell_cols,
time = time_col, status = status_col)
write.csv(surv_res, file = "04-figs/data/05-surv_results.csv", row.names = FALSE)
# Per-method survival (if multiple deconvolution methods were run)
methods_run <- unique(gsub(".*_", "", cell_cols))
for (m in methods_run) {
m_cols <- grep(paste0("_", m, "$"), cell_cols, value = TRUE)
if (length(m_cols) > 0) {
m_surv <- batch_surv(pdata = tme_pdata, variable = m_cols,
time = time_col, status = status_col)
write.csv(m_surv,
file = paste0("04-figs/data/05-surv_", m, ".csv"),
row.names = FALSE)
}
}
}Key notes for survival:
batch_surv()requires merged pdata — All variables must be columns inpdata. Usemerge()first.batch_surv()parameter isvariable— A vector of column names, NOT a separate data frame.batch_surv()output columns:P,HR,CI_low_0.95,CI_up_0.95,ID.
4d. Correlation
Correlation matrix of all TME variables:
# batch_cor(data, target, feature, method) — correlates one target vs multiple features
# NOTE: batch_cor() returns target-vs-each-feature, NOT a pairwise matrix.
# For a full pairwise correlation matrix, use base R cor() instead (see below).
# Key immune cells as target, all TME cells as features
key_cells <- c("T_cells_CD8_CIBERSORT", "Macrophages_M1_CIBERSORT")
cor_all <- data.frame()
for (tc in key_cells) {
cor_res <- batch_cor(data = tme_pdata, target = tc, feature = cell_cols, method = "spearman")
cor_res$target <- tc
cor_all <- rbind(cor_all, cor_res)
}
write.csv(cor_all, file = "04-figs/data/06-cor_matrix.csv", row.names = FALSE)
# For full pairwise correlation matrix (for heatmap visualization), use base R:
cor_mat <- cor(tme_pdata[, cell_cols], method = "spearman", use = "pairwise.complete.obs")
save(cor_mat, file = "04-figs/data/06-cor_matrix_full.RData")4e. Other Statistical Functions
| Function | Purpose | Output |
|---|---|---|
batch_pcc() |
Batch partial correlation controlling for a third variable | Statistics table |
iobr_deg() |
DEG using DESeq2 or limma, with built-in volcano plot and heatmap | Statistics + figures |
sig_gsea() |
Gene Set Enrichment Analysis via fgsea, with built-in plots | Statistics + figures |
subgroup_survival() |
Subgroup Cox analysis by covariates | Statistics table |
LR_cal() |
Ligand-receptor interaction analysis | Statistics + figures |
find_mutations() |
Mutation-TME interaction (TCGA only) | Statistics table |
feature_manipulation() |
Clean features: remove NA, outliers, zero-variance | Cleaned data |
Phase 5: Visualization
Script: 05-visualization.R
ALL TME data MUST be z-score scaled before visualization. Statistical result tables are in 04-figs/data/.
Every figure MUST:
- Use IOBR's built-in visualization functions whenever available
- Export as both PNG (300dpi) + PDF
- Follow the strict ordering below
IMPORTANT: Add pdf(NULL) at the top of the visualization script to suppress stray Rplots.pdf from IOBR internal plot calls. Also clean up at the end: if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf").
Fig 01–02: TME Composition Barplots (CIBERSORT + EPIC)
IMPORTANT: Only methods that produce cell FRACTION data (CIBERSORT, EPIC) should use cell_bar_plot(). ESTIMATE scores (stromal_score, immune_score, etc.) are NOT fractions and should NOT be plotted as barplots.
tme_cb <- read.csv("03-tme/tme_cibersort.csv")
tme_ep <- read.csv("03-tme/tme_epic.csv")
# Filter QC columns
qc_cols <- grep("^[Pp][.]", colnames(tme_cb), value = TRUE)
qc_cols <- c(qc_cols, grep("^Correlation", colnames(tme_cb), value = TRUE),
grep("^RMSE", colnames(tme_cb), value = TRUE))
cell_cols_cb <- setdiff(grep("_CIBERSORT$", colnames(tme_cb), value = TRUE), qc_cols)
# CIBERSORT barplot
p1 <- cell_bar_plot(input = tme_cb, id = "ID", features = cell_cols_cb,
title = "CIBERSORT TME Composition",
coord_flip = TRUE, palette = 4, legend.position = "right")
p1 <- p1 + guides(fill = guide_legend(ncol = 1)) +
theme(legend.text = element_text(size = 5),
legend.key.size = unit(2.5, "mm"),
legend.title = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
if (nrow(tme_cb) > 50) {
p1 <- p1 + theme(axis.text.y = element_blank(), axis.ticks.y = element_blank())
}
ggsave("04-figs/Fig01-barplot_cibersort.png", p1, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig01-barplot_cibersort.pdf", p1, width = 180, height = 250, units = "mm")
# Repeat for EPIC with EPIC-specific QC filtering
epic_cols <- grep("_EPIC$", colnames(tme_ep), value = TRUE)
p2 <- cell_bar_plot(input = tme_ep, id = "ID", features = epic_cols,
title = "EPIC TME Composition",
coord_flip = TRUE, palette = 4, legend.position = "right")
p2 <- p2 + guides(fill = guide_legend(ncol = 1)) +
theme(legend.text = element_text(size = 5),
legend.key.size = unit(2.5, "mm"),
legend.title = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
ggsave("04-figs/Fig02-barplot_epic.png", p2, width = 180, height = 250, units = "mm", dpi = 300)
ggsave("04-figs/Fig02-barplot_epic.pdf", p2, width = 180, height = 250, units = "mm")Fig 03–04: TME Heatmaps by Subtype (pheatmap)
Do NOT use sig_heatmap() — it depends on tidyHeatmap, only outputs PDF (no PNG), and has .x/.y column suffix issues. Use pheatmap instead for dual PNG+PDF output with full control.
Heatmap dimension rules (MUST follow):
| Element | Position | Parameter |
|---|---|---|
| Variables (cell types) | Rows (y-axis) | rownames(mat) — shows readable names |
| Samples | Columns (x-axis) | colnames(mat) — hidden (show_colnames=FALSE) |
| Subtype color bar | Top (column annotation) | annotation_col |
| Subtype gaps | Column gaps | gaps_col |
| Clustering | Rows only (cell types) | cluster_rows=TRUE, cluster_cols=FALSE |
The input data frame has samples as rows and variables as columns. Must transpose before pheatmap: mat <- t(as.matrix(data[, cols])).
plot_subtype_heatmap <- function(data, cols, subtype_col, title, prefix,
width = 8, height = 6) {
data <- data[order(data[[subtype_col]]), ]
mat <- t(as.matrix(data[, cols])) # TRANSPOSE: variables -> rows, samples -> columns
mat[mat > 2] <- 2; mat[mat < -2] <- -2
rownames(mat) <- sub("\\.[xy]$", "", rownames(mat))
colnames(mat) <- data$ID
ann_col <- data.frame(Subtype = data[[subtype_col]], row.names = data$ID)
subtypes <- sort(unique(data[[subtype_col]]))
sub_colors <- c("#E64B35", "#4DBBD5", "#00A087", "#3C5488", "#F39B7F", "#8491B6")
ann_colors <- list(Subtype = setNames(sub_colors[seq_along(subtypes)], subtypes))
gaps <- cumsum(table(data[[subtype_col]]))[-length(subtypes)]
for (fmt in c("pdf", "png")) {
pheatmap::pheatmap(mat, annotation_col = ann_col, annotation_colors = ann_colors,
color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
show_colnames = FALSE, cluster_rows = TRUE, cluster_cols = FALSE,
treeheight_row = 15, gaps_col = gaps, fontsize_row = 8,
main = title,
filename = paste0("04-figs/", prefix, ".", fmt), width = width, height = height, dpi = 300)
}
}
# Fig 03: CIBERSORT — use only cibersort (not cibersort_abs) columns
hm_data <- tme_scaled[, c("ID", cb_sc, "TME_subtype")]
plot_subtype_heatmap(hm_data, cb_sc, "TME_subtype",
"CIBERSORT TME by Subtype", "Fig03-heatmap_cibersort_subtype")
# Fig 04: MCPcounter — MUST use scaled data, not raw CSV
mcp_sc <- grep("MCPcounter$", colnames(tme_scaled), value = TRUE)
mcp_sc <- setdiff(mcp_sc, "TME_subtype")
mcp_hm <- tme_scaled[, c("ID", mcp_sc, "TME_subtype")]
plot_subtype_heatmap(mcp_hm, mcp_sc, "TME_subtype",
"MCPcounter TME by Subtype", "Fig04-heatmap_mcpcounter_subtype", width = 6, height = 5)pheatmap notes:
- Do NOT re-scale — data is already z-scored per variable in Phase 4. Just cap at ±2.
treeheight_row = 15— shrink clustering tree (default 50 is too tall).- Must transpose — input is samples×variables, pheatmap needs variables as rows for readable y-axis labels.
cluster_cols = FALSE— columns (samples) are ordered by subtype, clustering would destroy ordering.gaps_col(notgaps_row) — since samples are columns, gaps go between column groups.annotation_col(notannotation_row) — subtype annotation applies to samples (columns).- CRITICAL: Do NOT use
grep("CIBERSORT")in merged data — cibersort + cibersort_abs share cell names. Read from original cibersort CSV instead.
Fig 05: TME Correlation Matrix Heatmap
Use pheatmap on the correlation matrix saved in Phase 4. Must deduplicate — cibersort + cibersort_abs share cell names, so .y columns duplicate .x. Keep only .x, drop .y, clean suffix for display.
load("04-figs/data/06-cor_matrix_full.RData")
# Deduplicate: keep .x (cibersort), drop .y (cibersort_abs)
keep <- !grepl("\\.[xy]$", rownames(cor_mat)) | grepl("\\.x$", rownames(cor_mat))
cor_mat <- cor_mat[keep, keep]
rownames(cor_mat) <- sub("\\.x$", "", rownames(cor_mat))
colnames(cor_mat) <- sub("\\.x$", "", colnames(cor_mat))
pheatmap::pheatmap(cor_mat, fontsize = 6, fontsize_row = 5, fontsize_col = 5,
color = colorRampPalette(c("#2166AC", "white", "#B2182B"))(100),
border_color = NA, main = "TME Cell Correlation Matrix")Fig 06: Top 10 Wilcoxon Boxplots — 2×5 patchwork Composite
library(patchwork)
# Load top 10 wilcoxon results
wilcox_res <- read.csv("04-figs/data/04-wilcoxon_results.csv")
wilcox_res <- wilcox_res[order(wilcox_res$pvalue), ]
top10 <- head(wilcox_res, 10)
# Generate individual boxplots
box_list <- lapply(seq_len(nrow(top10)), function(i) {
sig_box(data = tme_pdata, signature = top10$feature[i],
variable = cat_var, palette = "paired1",
size_of_font = 6, size_of_pvalue = 4) +
theme(axis.title = element_text(size = 5),
axis.text.y = element_text(size = 5),
plot.title = element_text(size = 5),
plot.margin = margin(12, 5, 5, 5))
})
# Arrange as 2×5 grid — height 160mm to show statistical annotations
p_combined <- wrap_plots(box_list, ncol = 5, nrow = 2)
ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.png", p_combined,
width = 250, height = 220, units = "mm", dpi = 300)
ggsave("04-figs/Fig06-top10_wilcoxon_boxplot.pdf", p_combined,
width = 250, height = 220, units = "mm")sig_box native parameters for multi-panel layouts:
palette = "paired1"— supported palettes:nrc, jama, aaas, jco, paired1-4, accent, set2(no lancet).size_of_font = 6— x-axis group label font size (default 10; too large for patchwork composites).size_of_pvalue = 4— p-value text size (default 6; use 3–4 for multi-panel to avoid crowding).plot.margin = margin(12, 5, 5, 5)— increased top margin so p-value is not clipped by title.sig_box()has nomethodparameter — statistical test is auto-selected.
Fig 07a-d: Forest Plots — Grouped by Method (Custom ggplot2)
CRITICAL: Do NOT use sig_forest() for survival forest plots. It uses coord_trans(x = "log2") with only 3 fixed breaks (0.5, 1, 1.5), which breaks when HR values span extreme ranges (0 to 1,000,000+). Use a custom ggplot2 forest plot instead.
Split into 4 method groups and build custom forest plots with log2 scale, text annotations (HR, 95% CI, P value), and proper handling of extreme values:
surv_tme <- read.csv("04-figs/data/05-surv_tme_cells.csv")
surv_sig <- read.csv("04-figs/data/05-surv_signatures.csv")
make_forest <- function(data, title, filename, max_n = 20) {
data <- data[order(data$P), ]
data <- head(data, max_n)
if (nrow(data) == 0) { cat(" Skipping:", title, "\n"); return() }
# Ensure numeric + complete cases
for (col in c("HR", "CI_low_0.95", "CI_up_0.95", "P"))
data[[col]] <- as.numeric(data[[col]])
data <- data[complete.cases(data[, c("HR", "CI_low_0.95", "CI_up_0.95", "P")]), ]
if (nrow(data) == 0) return()
# Format display labels — clean .x/.y suffix and deduplicate
data$label <- sub("\\.[xy]$", "", data$ID)
data$label <- strtrim(data$label, 30)
dup <- duplicated(data$label)
if (any(dup)) data$label[dup] <- paste0(data$label[dup], "_", seq_len(sum(dup)))
data <- data[order(data$HR, decreasing = FALSE), ]
data$label <- factor(data$label, levels = data$label)
# Format text columns
data$HR_text <- ifelse(data$HR >= 100, sprintf("%.0f", data$HR),
ifelse(data$HR >= 0.01, sprintf("%.2f", data$HR), sprintf("%.4f", data$HR)))
data$CI_text <- paste0(
ifelse(data$CI_low_0.95 >= 100, sprintf("%.0f", data$CI_low_0.95),
ifelse(data$CI_low_0.95 >= 0.01, sprintf("%.2f", data$CI_low_0.95), sprintf("%.4f", data$CI_low_0.95))),
"-",
ifelse(data$CI_up_0.95 >= 100, sprintf("%.0f", data$CI_up_0.95),
ifelse(data$CI_up_0.95 >= 0.01, sprintf("%.2f", data$CI_up_0.95), sprintf("%.4f", data$CI_up_0.95))))
data$P_text <- ifelse(data$P < 0.001, sprintf("%.2e", data$P), sprintf("%.3f", data$P))
# Log2 transform with cap for extreme values
cap <- 10
data$logHR <- log2(data$HR); data$logHR[!is.finite(data$logHR)] <- 0
data$logCI_low <- log2(data$CI_low_0.95); data$logCI_low[!is.finite(data$logCI_low)] <- -cap
data$logCI_up <- log2(data$CI_up_0.95); data$logCI_up[!is.finite(data$logCI_up)] <- cap
data$logCI_low <- pmax(data$logCI_low, -cap)
data$logCI_up <- pmin(data$logCI_up, cap)
# Build ggplot
p <- ggplot(data, aes(x = logHR, y = label)) +
geom_errorbarh(aes(xmin = logCI_low, xmax = logCI_up),
height = 0.2, linewidth = 0.8, color = "grey30") +
geom_point(aes(color = P), size = 3.5) +
geom_vline(xintercept = 0, linetype = "dashed", color = "grey50", linewidth = 0.5) +
scale_color_gradientn(colors = c("#B2182B", "#EF8A62", "#FDDBC7", "#D1E5F0", "#2166AC"),
name = "P value", trans = "log10") +
geom_text(aes(x = cap + 1.5, label = HR_text), hjust = 0, size = 2.5) +
geom_text(aes(x = cap + 4.5, label = CI_text), hjust = 0, size = 2.5, color = "grey30") +
geom_text(aes(x = cap + 9.5, label = P_text), hjust = 0, size = 2.5, color = "grey30") +
annotate("text", x = cap + 1.5, y = nrow(data) + 1.5, label = "HR", hjust = 0, size = 2.5, fontface = "bold") +
annotate("text", x = cap + 4.5, y = nrow(data) + 1.5, label = "95% CI", hjust = 0, size = 2.5, fontface = "bold") +
annotate("text", x = cap + 9.5, y = nrow(data) + 1.5, label = "P value", hjust = 0, size = 2.5, fontface = "bold") +
scale_x_continuous(breaks = c(-8,-4,-2,-1,0,1,2,4,8),
labels = c("1/256","1/16","1/4","0.5","1","2","4","16","256")) +
coord_cartesian(xlim = c(-cap, cap + 13), clip = "off") +
labs(x = "Hazard Ratio (log2 scale)", y = "", title = title) +
theme_light() + theme(
axis.text.y = element_text(size = 7), axis.text.x = element_text(size = 7),
plot.title = element_text(size = 10, hjust = 0.5, face = "bold"),
legend.position = "bottom", plot.margin = margin(5, 40, 5, 5, "mm"))
plot_h <- max(4, nrow(data) * 0.4 + 1.5)
ggsave(paste0("04-figs/", filename, ".png"), p, width = 12, height = plot_h, units = "in", dpi = 300)
ggsave(paste0("04-figs/", filename, ".pdf"), p, width = 12, height = plot_h)
}
# 07a: CIBERSORT
cb_surv <- surv_tme[grepl("CIBERSORT", surv_tme$ID), ]
make_forest(cb_surv, "CIBERSORT Survival Forest", "Fig07a-forest_cibersort")
# 07b: Other TME methods
other_surv <- surv_tme[!grepl("CIBERSORT", surv_tme$ID), ]
make_forest(other_surv, "Other TME Methods Survival Forest", "Fig07b-forest_other_tme")
# 07c: TME signatures (non-GO/KEGG)
gkegg_pattern <- "^HALLMARK_|^KEGG_|^GO_|^REACTOME_"
tme_sig_surv <- surv_sig[!grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(tme_sig_surv, "TME Signatures Survival Forest", "Fig07c-forest_tme_signature")
# 07d: GO/KEGG/Hallmark pathways
gkegg_surv <- surv_sig[grepl(gkegg_pattern, surv_sig$ID), ]
make_forest(gkegg_surv, "Pathway Survival Forest", "Fig07d-forest_go_kegg")Key notes:
- Do NOT use
sig_forest()— it only has 3 fixed axis breaks (0.5, 1, 1.5) and breaks with extreme HR values. Custom ggplot2 forest plot handles all cases. - Custom forest plot features: log2 scale with 9 breaks, text annotations for HR/CI/P, capped error bars for extreme values, dynamic plot height.
- Split into 4 groups (07a-d) for readability: CIBERSORT, Other TME, TME Signatures, GO/KEGG/Hallmark.
Fig 08: KM Plots — p<0.005 Significant Variables
# sig_surv_plot() saves its own output — do NOT wrap with ggsave()
# Create kmplot subdirectory first
dir.create("04-figs/kmplot", showWarnings = FALSE, recursive = TRUE)
# Merge TME + signature survival results
surv_all <- rbind(surv_tme, surv_sig)
surv_sorted <- surv_all[order(surv_all$P), ]
sig_vars <- surv_sorted$ID[surv_sorted$P < 0.005]
if (length(sig_vars) == 0) sig_vars <- surv_sorted$ID[surv_sorted$P < 0.01]
surv_data <- read.csv("04-figs/data/01-tme_pdata_merged.csv")
for (i in seq_along(sig_vars)) {
sig_surv_plot(input_pdata = surv_data, signature = sig_vars[i],
time = time_col, status = status_col, time_type = "month",
palette = "jama", mini_sig = "score",
fig.type = "pdf", save_path = "04-figs/kmplot",
project = paste0("KM_", i), index = i)
}Key notes:
sig_surv_plot()/roc_time()save their own output — Do NOT wrap withggsave(). Use theirsave_path/pathandfig.type/fig_typeparameters.- KM plots saved to
04-figs/kmplot/subfolder — each variable generates multiple files (best-cutoff, 3-group, 2-group KM plots + RData). time_type:"day","month", or"year"mini_sig:"score"(continuous, auto-split by median) or"category"(already grouped)
Fig 09: TME Subtype Boxplots — Top Feature Per Subtype, Patchwork Composite
For each TME subtype, find the CIBERSORT cell type with highest mean (scaled), then create one sig_box per subtype and combine with patchwork.
# Read CIBERSORT columns from original file
cb_cols <- setdiff(colnames(tme_cb), "ID")
cb_cols <- cb_cols[!grepl(paste(qc_patterns, collapse = "|"), cb_cols)]
cb_sc <- intersect(cb_cols, colnames(tme_sc))
if (length(cb_sc) == 0) cb_sc <- intersect(paste0(cb_cols, ".x"), colnames(tme_sc))
# For each subtype, find top cell (highest scaled mean)
subtypes <- sort(unique(tme_sc$TME_subtype))
subtype_top_cells <- character(0)
for (st in subtypes) {
st_data <- tme_sc[tme_sc$TME_subtype == st, cb_sc]
cell_means <- colMeans(st_data, na.rm = TRUE)
subtype_top_cells[st] <- names(sort(cell_means, decreasing = TRUE))[1]
}
# One sig_box per subtype, patchwork composite
box09_list <- lapply(subtypes, function(st) {
col <- subtype_top_cells[st]
clean_name <- sub("\\.[xy]$", "", col)
sig_box(data = tme_sc, signature = col,
variable = "TME_subtype", palette = "paired1",
size_of_font = 6, size_of_pvalue = 4) +
ggtitle(paste0(st, ": ", clean_name)) +
theme(axis.title = element_text(size = 6),
axis.text.y = element_text(size = 5),
plot.title = element_text(size = 7),
plot.margin = margin(12, 5, 5, 5))
})
p09 <- wrap_plots(box09_list, ncol = length(subtypes))
ggsave("04-figs/Fig09-subtype_boxplot.png", p09,
width = 50 * length(subtypes), height = 100, units = "mm", dpi = 300)
ggsave("04-figs/Fig09-subtype_boxplot.pdf", p09,
width = 50 * length(subtypes), height = 100, units = "mm")Note: Uses CIBERSORT columns from scaled data (Phase 4 output). Each subtype's top cell is the one with highest mean z-score in that subtype — this is the most representative feature for that cluster.
Clean up stray Rplots.pdf (generated by some IOBR internal plot calls)
Suppress with pdf(NULL) at top of visualization script
if (file.exists("Rplots.pdf")) file.remove("Rplots.pdf")
Additional Visualization Functions
Use these as needed based on analysis requirements:
| Function | Purpose | Output |
|---|---|---|
roc_time() |
Time-dependent ROC with AUC at multiple time points | Figure (self-saving) |
sig_roc() |
Multiple ROC curves for binary response prediction | Figure |
batch_sig_surv_plot() |
Batch KM curves for multiple signatures across projects | Figures (self-saving) |
surv_group() |
KM curve with risk table (High vs Low) | Figure |
subgroup_survival() |
Subgroup Cox analysis by covariates | Statistics table |
get_cor() |
Single pairwise correlation with scatter plot + regression line | Figure |
iobr_cor_plot() |
Batch correlation visualization (signature vs signature/phenotype) | Figure |
iobr_pca() |
PCA dimensionality reduction with scatter plot | Figure |
iobr_deg() |
DEG with built-in volcano plot and heatmap output | Statistics + figures |
sig_gsea() |
Gene Set Enrichment Analysis via fgsea, with built-in plots | Statistics + figures |
LR_cal() |
Ligand-receptor interaction analysis | Statistics + figures |
# Time-dependent ROC — saves its own output
roc_time(input = tme_pdata, vars = c("T_cells_CD8_CIBERSORT", "TMEscoreA_CIR"),
time = "OS_time", status = "OS_status",
time_point = c(12, 36, 60), time_type = "month", palette = "jama",
path = "04-figs", fig.type = "png", index = 1)
# PCA visualization
iobr_pca(data = eset, pdata = pdata, group = "TME_subtype", scale = TRUE)
# Single correlation scatter plot
get_cor(data = tme_pdata, variable1 = "T_cells_CD8_CIBERSORT",
variable2 = "CD_8_T_effector", method = "spearman")IOBR Palettes
Categorical (palette_group parameter):
| Name | Best For |
|---|---|
"jama" |
Default, 2–5 groups |
"jco" |
Clinical oncology |
"nrc" |
Nature publications |
"aaas" |
Science journal |
"accent" |
Colorblind-friendly |
"set2" |
Many groups |
Heatmap (palette parameter, numeric 1–6):
| # | Style |
|---|---|
| 2 | Blue-white-red diverging (default) |
| 1 | Classic pheatmap |
| 3 | Purple-orange |
| 4–6 | Sequential gradients |
Cell bar plot (cell_bar_plot() palette parameter, numeric 1–4):
| # | Style |
|---|---|
| 4 | Recommended — rich random palette with many distinct colors |
| 1–3 | Alternative random palettes |
If unsure which palette, present 3 options to the user and ask.
User Decision Points
Pause and ask at:
- Phase 0: Review and approve analysis plan (
IOBR-pipeline.md) - Phase 1: Data type (Count/TPM/Array) and species (hsa/mmu)
- Phase 2: Which deconvolution method(s)
- Phase 3: Scoring method and which signature collection
- Phase 5: Color palette preference
05-note Template: IOBR-analysis-README.md
Generate a tree diagram with brief descriptions:
project/
├── 01-script/
│ ├── 01-data_preprocessing.R # QC, normalization, annotation
│ ├── 02-tme_deconvolution.R # TME cell type deconvolution
│ ├── 03-signature_analysis.R # Signature scoring
│ ├── 04-statistical_analysis.R # Statistics, clustering, survival
│ └── 05-visualization.R # All figures
├── 02-input/
│ ├── annotated_eset.csv # Normalized expression matrix
│ └── pdata.csv # Phenotype data
├── 03-tme/
│ ├── tme_cibersort.csv # CIBERSORT results
│ ├── tme_mcpcounter.csv # MCPcounter results
│ ├── sig_score_tme.csv # ssGSEA scores
│ └── tme_sig_combine.csv # All merged results
├── 04-figs/
│ ├── Fig01-barplot_cibersort.png/pdf
│ ├── Fig02-barplot_epic.png/pdf
│ ├── Fig03-heatmap_cibersort_subtype.png/pdf (pheatmap)
│ ├── Fig04-heatmap_mcpcounter_subtype.png/pdf (pheatmap)
│ ├── Fig05-cor_matrix.png/pdf
│ ├── Fig06-top10_wilcoxon_boxplot.png/pdf
│ ├── Fig07a-forest_cibersort.png/pdf
│ ├── Fig07b-forest_other_tme.png/pdf
│ ├── Fig07c-forest_tme_signature.png/pdf
│ ├── Fig07d-forest_go_kegg.png/pdf
│ ├── kmplot/ # KM survival plots (auto-saved by sig_surv_plot)
│ └── Fig09-subtype_boxplot.png/pdf
│ └── data/ # Statistical result tables
│ ├── 01-tme_pdata_merged.csv
│ ├── 02-tme_scaled.csv
│ ├── 03-tme_subtype.csv
│ ├── 04-wilcoxon_results.csv
│ ├── 05-surv_results.csv
│ └── 06-cor_matrix.csv
├── 05-note/
│ ├── IOBR-pipeline.md # Analysis plan
│ ├── pdata_summary.csv # Variable summary
│ └── IOBR-analysis-README.md # This file
└── 06-log/
├── 01-data_preprocessing.log
└── ...Include a brief summary of key findings below the tree.
06-log Directory
- One log file per script, named to match:
01-data_preprocessing.log,02-tme_deconvolution.log, ... - Each log MUST begin with the IOBR citation header (same as scripts).
- Capture logs by running scripts with redirection:
{ echo "# IOBR Citation:"; echo "# Zeng DQ et al., Cell Reports Methods, 2024"; echo "# https://doi.org/10.1016/j.crmeth.2024.100910"; echo ""; Rscript 01-script/01-data_preprocessing.R; } > 06-log/01-data_preprocessing.log 2>&1
Quick Start Checklist
- Check IOBR installation in base environment
- Create project directory structure (01-script through 06-log + 04-figs/data)
- Confirm data source, type, and species with user
- Generate
05-note/IOBR-pipeline.md— present to user for approval - Write
01-data_preprocessing.R(with citation header), run, capture log - Verify sample matching (eset vs pdata), generate
05-note/pdata_summary.csv - Ask user to choose deconvolution method(s)
- Write
02-tme_deconvolution.R, run, capture log - Ask user about scoring method and gene sets
- Write
03-signature_analysis.R, run, capture log - Write
04-statistical_analysis.R— merge, scale, cluster, test, correlate, save to 04-figs/data/ - Ask user about palette preferences
- Write
05-visualization.R— follow strict Fig01–09 ordering - Generate
05-note/IOBR-analysis-README.md - Verify all 06-log files have citation headers
References (in this skill directory)
references/functions.md— IOBR function parameter referencereferences/palettes.md— Color palette selection guidereferences/iobr_built_in_data.md— Complete catalog of built-in signatures, annotations, and datasetsreferences/iobr_pipeline_template.R— Full production pipeline template (9-method deconvolution + signature + pathway scoring)
Read these when you need specific parameter details, palette options, want to know what signatures are available, or need a complete pipeline template to adapt.