CuiweiG

openclaw-r-stats

Advanced statistical analysis in R. Use when the user asks for regression, hypothesis testing, ANOVA, time-series forecasting, Bayesian modeling, survival analysis, descriptive statistics, EDA, correlation analysis, diagnostics, or reproducible analytical reports. Also use when the user mentions R packages like ggplot2, tidyverse, forecast, brms, broom, lme4, survival, or any statistical method. 支持中文:当用户提到回归分析、假设检验、时间序列、贝叶斯、 生存分析、描述统计、相关分析等统计方法时使用此技能。

CuiweiG 0 Updated 2mo ago

Resources

8
GitHub

Install

npx skillscat add cuiweig/openclaw-r-stats

Install via the SkillsCat registry.

SKILL.md

OpenClaw R Stats

When to use

  • User asks for statistical analysis, regression, hypothesis testing
  • User asks to compare groups, test significance, find associations
  • User mentions ANOVA, t-test, chi-square, correlation
  • User asks for time series forecasting or trend analysis
  • User uploads CSV and wants statistical insights
  • User asks "is this significant?" or "what predicts X?"
  • 用户用中文提到:回归、检验、预测、显著性、描述统计

What this skill does NOT do

  • Do not claim causality from observational data. Use "associated with".
  • Do not run large exploratory fishing without clear user intent.
  • Do not silently ignore assumption violations.
  • Do not execute arbitrary inline R code. Always use the wrapper script.
  • Do not install packages during analysis. Installation is a separate step.
  • Do not report only p-values. Always include effect sizes and CIs.

Pre-flight checks (mandatory before any analysis)

  1. Confirm the dataset file exists and is readable.
  2. Run schema inspection:
    bash {baseDir}/scripts/run-rstats.sh schema --data
  3. Report to the user: row/column count, types, missing values, unique counts.
  4. If missing data > 5%, warn and ask how to handle.
  5. If sample size < 30, warn about small sample limitations.
  6. Only then proceed to build the analysis spec.

Environment check

If first time or errors occur:
bash {baseDir}/scripts/run-rstats.sh doctor

If packages missing:
Rscript {baseDir}/scripts/install-core.R

Standard workflow

  1. Determine the correct analysis type.
  2. Inspect dataset schema and missingness.
  3. Build a JSON analysis spec:
    {
    "dataset_path": "",
    "analysis_type": "",
    "outcome": "",
    "predictors": ["",""],
    "formula": "",
    "group_var": "",
    "hypothesis": "",
    "missing_strategy": "complete_case",
    "alpha": 0.05,
    "seed": 42,
    "output_dir": ""
    }
  4. Save the spec as a .json file.
  5. Run: bash {baseDir}/scripts/run-rstats.sh analyze --spec
  6. Read summary.json and report.md from the output directory.
  7. Present results: Summary → Statistics → Interpretation → Plots → Assumptions → Caveats.
  8. Offer follow-up: diagnostics, alternative methods, export.

Analysis selection

User intent analysis_type
Describe data / EDA summary
Compare 2 groups (continuous) ttest
Compare 2 groups (non-normal/small n) wilcoxon
Compare 3+ groups (continuous, normal) anova
Compare 3+ groups (non-normal/ordinal) kruskal
Compare categorical variables chisq
Categorical (small expected counts) fisher
Paired categorical (before/after) mcnemar
Repeated measures non-parametric friedman
Association between 2 continuous vars correlation
Predict continuous outcome linear_regression
Predict binary outcome logistic_regression
Predict count outcome poisson_regression
Forecast time series forecast_arima
Assess missing data patterns missing_diagnostics
Impute missing values multiple_imputation
Correct for multiple comparisons p_adjust
Survival curves + median survival kaplan_meier
Survival regression (HR) cox_regression
Competing risks (Fine-Gray) competing_risks
Time-dependent Cox model cox_time_dependent
Restricted mean survival time rmst
Odds ratio (case-control) odds_ratio
Risk ratio + NNT (cohort/RCT) risk_ratio
Incidence rate ratio (person-time) incidence_rate
Stratified analysis (confounding) mantel_haenszel
Number needed to treat/harm nnt
Linear mixed model (random effects) lmm
Generalized linear mixed model glmm
GEE marginal model gee
Intraclass correlation icc
Propensity score matching propensity_match
Propensity score weighting (IPW) propensity_weight
Causal mediation analysis mediation_analysis
Instrumental variable regression iv_regression
Difference-in-differences did
Regression discontinuity rdd

Automatic method switching guardrails

  • Normality doubtful AND n < 30 → prefer wilcoxon over ttest
  • Variance equality doubtful → use Welch t-test (equal_var: false)
  • Expected cell counts < 5 → prefer fisher over chisq
  • Overdispersion in Poisson → warn, suggest negative binomial
  • Residuals heteroscedastic → warn about robust SE

Reporting rules (non-negotiable)

Every analysis MUST include:

  • Sample size (n) and missing data handling
  • Method name and selection rationale
  • Point estimates with confidence intervals
  • Effect sizes (Cohen's d, η², R², OR, etc.)
  • Assumption check results
  • Warnings or limitations

Language rules:

  • ✓ "associated with" / "evidence suggests" / "estimated effect"
  • ✗ NEVER "causes" / "proves" / "definitively shows"

Output artifacts (every run produces all of these)

File Contents
summary.json Status, method, findings, warnings, artifact paths
schema.json Column types, missingness, unique counts
report.md Human-readable analysis report
session_info.txt R version, packages, platform, timestamp
executed_spec.json Copy of input spec for reproducibility
tables/*.csv Coefficients, group stats, forecast values
figures/*.png Diagnostic and result plots

Rules

  • Never run ad-hoc inline R code when the wrapper script can be used.
  • Never install packages during an analysis run.
  • Never access the internet during analysis execution.
  • Always set a random seed for reproducibility.
  • Always save session info for every analysis.
  • Support both English and Chinese (中文) queries.