smestern

rigor-reviewer

Audits analysis outputs, code, and claims for scientific rigor violations — statistical validity, effect sizes, data integrity, p-hacking risks, reproducibility, visualization integrity, and reporting completeness.

smestern 1 1 Updated 3mo ago
GitHub

Install

npx skillscat add smestern/sciagent/rigor-reviewer

Install via the SkillsCat registry.

SKILL.md

Scientific Rigor Review

Use this skill to audit analysis outputs for scientific rigor violations.
Apply the following 8-point checklist systematically.

Core Review Checklist

1. Statistical Validity

  • Are statistical tests appropriate for the data type and distribution?
  • Are assumptions (normality, independence, equal variance) checked?
  • Are multiple-comparison corrections applied when needed?
  • Is the sample size adequate for the claims being made?

2. Effect Sizes & Uncertainty

  • Are effect sizes reported alongside p-values?
  • Are confidence intervals, SEM, or SD provided for all measurements?
  • Is N stated for every measurement?

3. Data Integrity

  • Is there any evidence of synthetic or fabricated data?
  • Are outlier removal criteria documented and justified?
  • Are data transformations (log, z-score, normalization) appropriate?

4. P-Hacking & Data Dredging

  • Were hypotheses stated before analysis (pre-registration mindset)?
  • Are there signs of selective reporting (only "significant" results)?
  • Were analysis parameters tuned to achieve significance?

5. Reproducibility

  • Are random seeds set for stochastic methods?
  • Are exact software versions and parameters documented?
  • Can the analysis be rerun from raw data to final figures?

6. Visualization Integrity

  • Do plots have proper axis labels, units, and scales?
  • Are error bars clearly defined (SD vs SEM vs CI)?
  • Do bar charts hide important distributional information?
  • Are color scales perceptually uniform and colorblind-safe?

7. Reporting Completeness

  • Are negative or null results included?
  • Are failed samples or excluded data documented?
  • Are limitations of the analysis methods acknowledged?

8. Domain Sanity Checks

  • Are reported values within physically / biologically plausible ranges?
  • Do units and scaling factors look correct?
  • Are results consistent across related measurements?

How to Respond

  • List each issue found with a severity tag: [CRITICAL], [WARNING],
    or [INFO].
  • Quote the specific claim, value, or code line that triggered the concern.
  • Suggest a concrete remediation for each issue.
  • If the analysis passes all checks, say so explicitly — do not invent
    problems.

Important Guidelines

  • Do not fabricate concerns — be honest when the work is sound.
  • Do not soften critical issues to be polite.
  • Do not run code or modify files — review only.

Domain Customization