Advanced statistical analysis in R. Use when the user asks for regression, hypothesis testing, ANOVA, time-series forecasting, Bayesian modeling, survival analysis, descriptive statistics, EDA, correlation analysis, diagnostics, or reproducible analytical reports. Also use when the user mentions R packages like ggplot2, tidyverse, forecast, brms, broom, lme4, survival, or any statistical method. 支持中文:当用户提到回归分析、假设检验、时间序列、贝叶斯、 生存分析、描述统计、相关分析等统计方法时使用此技能。
Resources
8Install
npx skillscat add cuiweig/openclaw-r-stats Install via the SkillsCat registry.
SKILL.md
OpenClaw R Stats
When to use
- User asks for statistical analysis, regression, hypothesis testing
- User asks to compare groups, test significance, find associations
- User mentions ANOVA, t-test, chi-square, correlation
- User asks for time series forecasting or trend analysis
- User uploads CSV and wants statistical insights
- User asks "is this significant?" or "what predicts X?"
- 用户用中文提到:回归、检验、预测、显著性、描述统计
What this skill does NOT do
- Do not claim causality from observational data. Use "associated with".
- Do not run large exploratory fishing without clear user intent.
- Do not silently ignore assumption violations.
- Do not execute arbitrary inline R code. Always use the wrapper script.
- Do not install packages during analysis. Installation is a separate step.
- Do not report only p-values. Always include effect sizes and CIs.
Pre-flight checks (mandatory before any analysis)
- Confirm the dataset file exists and is readable.
- Run schema inspection:
bash {baseDir}/scripts/run-rstats.sh schema --data - Report to the user: row/column count, types, missing values, unique counts.
- If missing data > 5%, warn and ask how to handle.
- If sample size < 30, warn about small sample limitations.
- Only then proceed to build the analysis spec.
Environment check
If first time or errors occur:
bash {baseDir}/scripts/run-rstats.sh doctor
If packages missing:
Rscript {baseDir}/scripts/install-core.R
Standard workflow
- Determine the correct analysis type.
- Inspect dataset schema and missingness.
- Build a JSON analysis spec:
{
"dataset_path": "",
"analysis_type": "",
"outcome": "",
"predictors": ["",""],
"formula": "",
"group_var": "",
"hypothesis": "",
"missing_strategy": "complete_case",
"alpha": 0.05,
"seed": 42,
"output_dir": ""
} - Save the spec as a .json file.
- Run: bash {baseDir}/scripts/run-rstats.sh analyze --spec
- Read summary.json and report.md from the output directory.
- Present results: Summary → Statistics → Interpretation → Plots → Assumptions → Caveats.
- Offer follow-up: diagnostics, alternative methods, export.
Analysis selection
| User intent | analysis_type |
|---|---|
| Describe data / EDA | summary |
| Compare 2 groups (continuous) | ttest |
| Compare 2 groups (non-normal/small n) | wilcoxon |
| Compare 3+ groups (continuous, normal) | anova |
| Compare 3+ groups (non-normal/ordinal) | kruskal |
| Compare categorical variables | chisq |
| Categorical (small expected counts) | fisher |
| Paired categorical (before/after) | mcnemar |
| Repeated measures non-parametric | friedman |
| Association between 2 continuous vars | correlation |
| Predict continuous outcome | linear_regression |
| Predict binary outcome | logistic_regression |
| Predict count outcome | poisson_regression |
| Forecast time series | forecast_arima |
| Assess missing data patterns | missing_diagnostics |
| Impute missing values | multiple_imputation |
| Correct for multiple comparisons | p_adjust |
| Survival curves + median survival | kaplan_meier |
| Survival regression (HR) | cox_regression |
| Competing risks (Fine-Gray) | competing_risks |
| Time-dependent Cox model | cox_time_dependent |
| Restricted mean survival time | rmst |
| Odds ratio (case-control) | odds_ratio |
| Risk ratio + NNT (cohort/RCT) | risk_ratio |
| Incidence rate ratio (person-time) | incidence_rate |
| Stratified analysis (confounding) | mantel_haenszel |
| Number needed to treat/harm | nnt |
| Linear mixed model (random effects) | lmm |
| Generalized linear mixed model | glmm |
| GEE marginal model | gee |
| Intraclass correlation | icc |
| Propensity score matching | propensity_match |
| Propensity score weighting (IPW) | propensity_weight |
| Causal mediation analysis | mediation_analysis |
| Instrumental variable regression | iv_regression |
| Difference-in-differences | did |
| Regression discontinuity | rdd |
Automatic method switching guardrails
- Normality doubtful AND n < 30 → prefer wilcoxon over ttest
- Variance equality doubtful → use Welch t-test (equal_var: false)
- Expected cell counts < 5 → prefer fisher over chisq
- Overdispersion in Poisson → warn, suggest negative binomial
- Residuals heteroscedastic → warn about robust SE
Reporting rules (non-negotiable)
Every analysis MUST include:
- Sample size (n) and missing data handling
- Method name and selection rationale
- Point estimates with confidence intervals
- Effect sizes (Cohen's d, η², R², OR, etc.)
- Assumption check results
- Warnings or limitations
Language rules:
- ✓ "associated with" / "evidence suggests" / "estimated effect"
- ✗ NEVER "causes" / "proves" / "definitively shows"
Output artifacts (every run produces all of these)
| File | Contents |
|---|---|
| summary.json | Status, method, findings, warnings, artifact paths |
| schema.json | Column types, missingness, unique counts |
| report.md | Human-readable analysis report |
| session_info.txt | R version, packages, platform, timestamp |
| executed_spec.json | Copy of input spec for reproducibility |
| tables/*.csv | Coefficients, group stats, forecast values |
| figures/*.png | Diagnostic and result plots |
Rules
- Never run ad-hoc inline R code when the wrapper script can be used.
- Never install packages during an analysis run.
- Never access the internet during analysis execution.
- Always set a random seed for reproducibility.
- Always save session info for every analysis.
- Support both English and Chinese (中文) queries.