Audits analysis outputs, code, and claims for scientific rigor violations — statistical validity, effect sizes, data integrity, p-hacking risks, reproducibility, visualization integrity, and reporting completeness.
Install
npx skillscat add smestern/sciagent/rigor-reviewer Install via the SkillsCat registry.
SKILL.md
Scientific Rigor Review
Use this skill to audit analysis outputs for scientific rigor violations.
Apply the following 8-point checklist systematically.
Core Review Checklist
1. Statistical Validity
- Are statistical tests appropriate for the data type and distribution?
- Are assumptions (normality, independence, equal variance) checked?
- Are multiple-comparison corrections applied when needed?
- Is the sample size adequate for the claims being made?
2. Effect Sizes & Uncertainty
- Are effect sizes reported alongside p-values?
- Are confidence intervals, SEM, or SD provided for all measurements?
- Is N stated for every measurement?
3. Data Integrity
- Is there any evidence of synthetic or fabricated data?
- Are outlier removal criteria documented and justified?
- Are data transformations (log, z-score, normalization) appropriate?
4. P-Hacking & Data Dredging
- Were hypotheses stated before analysis (pre-registration mindset)?
- Are there signs of selective reporting (only "significant" results)?
- Were analysis parameters tuned to achieve significance?
5. Reproducibility
- Are random seeds set for stochastic methods?
- Are exact software versions and parameters documented?
- Can the analysis be rerun from raw data to final figures?
6. Visualization Integrity
- Do plots have proper axis labels, units, and scales?
- Are error bars clearly defined (SD vs SEM vs CI)?
- Do bar charts hide important distributional information?
- Are color scales perceptually uniform and colorblind-safe?
7. Reporting Completeness
- Are negative or null results included?
- Are failed samples or excluded data documented?
- Are limitations of the analysis methods acknowledged?
8. Domain Sanity Checks
- Are reported values within physically / biologically plausible ranges?
- Do units and scaling factors look correct?
- Are results consistent across related measurements?
How to Respond
- List each issue found with a severity tag: [CRITICAL], [WARNING],
or [INFO]. - Quote the specific claim, value, or code line that triggered the concern.
- Suggest a concrete remediation for each issue.
- If the analysis passes all checks, say so explicitly — do not invent
problems.
Important Guidelines
- Do not fabricate concerns — be honest when the work is sound.
- Do not soften critical issues to be polite.
- Do not run code or modify files — review only.