causal-inference

Production-grade Bayesian causal inference with PyMC, CausalPy, and DoWhy. Enforces DAG-first thinking, mandatory user checkpoints for assumptions, design-specific refutation, and defensible reporting with causal language guardrails. Trigger on: causal inference, causal effect estimation, treatment effects, counterfactuals, difference-in-differences (DiD), synthetic control, regression discontinuity (RDD), interrupted time series (ITS), instrumental variables (IV), propensity scores, DAGs, causal graphs, confounders, backdoor criterion, do-calculus, interventional distributions, pm.do(), pm.observe(), CausalPy, DoWhy, mediation analysis, refutation, sensitivity analysis, parallel trends, placebo tests, or any question of the form "does X cause Y" or "what is the effect of X on Y."

brycewang-stanford 2,626 356 Updated 3mo ago

Resources

GitHub

Install

npx skillscat add brycewang-stanford/auto-empirical-research-skills/causal-inference

Install via the SkillsCat registry.

SKILL.md

Causal Inference

Dependencies

This skill requires the bayesian-workflow skill for all PyMC modeling steps (priors, sampling,
diagnostics, calibration, reporting).

Detect it:

ls ~/.claude/skills/bayesian-workflow/SKILL.md 2>/dev/null || ls .claude/skills/bayesian-workflow/SKILL.md 2>/dev/null

If not found, install it:

git clone https://github.com/Learning-Bayesian-Statistics/baygent-skills.git /tmp/baygent-skills
cp -r /tmp/baygent-skills/bayesian-workflow ~/.claude/skills/

For all PyMC modeling steps (priors, sampling, diagnostics, calibration, reporting), follow the
bayesian-workflow skill.

Workflow overview

Every causal analysis follows this sequence. Steps 1-4 are the thinking phase (no code). Steps 5-8
are the doing phase. Think before you do.

Formulate the causal question — Propose precise estimand (ATE, ATT, LATE, etc.). ⚠️ ASK USER TO CONFIRM.
Draw the DAG — Propose causal graph with nodes, edges, and explicit non-edges. ⚠️ ASK USER TO CONFIRM. See references/dags-and-identification.md
Identify — Determine identification strategy (backdoor, front-door, IV, RDD, DiD). ⚠️ ASK USER TO CONFIRM untestable assumptions. See references/dags-and-identification.md
Choose design — Match problem to method using table below. ⚠️ ASK USER TO CONFIRM. See references/quasi-experiments.md or references/structural-models.md
Estimate — Build and fit the model. Delegate all PyMC mechanics to bayesian-workflow skill.
Refute — MANDATORY. Run design-specific robustness checks. See references/refutation.md
Interpret — Effect size + decision-relevant HDIs + probability of direction.
Report — Generate causal analysis report. See references/reporting.md

Design selection guide

Design	Use when	Key assumption	Tool
DiD	Treatment at known time, control group available	Parallel trends	CausalPy
Staggered DiD	Treatment rolls out at different times	Parallel trends per cohort	CausalPy
Synthetic Control	Single treated unit, donor pool available	Weighted donors approximate counterfactual	CausalPy
ITS	Time series, intervention at known time, no control	No confounding event at treatment time	CausalPy
RDD	Treatment by threshold on running variable	No manipulation at threshold	CausalPy
IV	Endogenous treatment, valid instrument	Exclusion restriction, relevance	CausalPy
IPSW	Observational data, treatment modeled	No unmeasured confounders, positivity	CausalPy
Structural (do/observe)	Full causal theory, model mechanisms	Correct DAG specification	PyMC
Counterfactual	"What would Y have been if X differed?"	Correct structural model	PyMC

Critical rules

No estimation without a confirmed DAG. A causal graph is not optional decoration — it makes
assumptions explicit and determines the adjustment set. If the user resists, explain why the DAG
is non-negotiable before proceeding.
No causal claims without refutation. Every design has failure modes. Run at minimum one
design-specific robustness check (placebo test, sensitivity analysis, falsification test) before
reporting results. See references/refutation.md.
State assumptions before results. Lead with what must be true for the estimate to be causal.
Bury the estimate after the assumptions, not before. This is not optional politeness — it prevents
misuse of results.
Adapt HDIs to the decision context. The bayesian-workflow skill's 94% HDI is a sensible
default; adapt it with explicit explanation when the decision stakes warrant it (e.g., 89% for
exploratory, 97% for high-stakes policy). Report multiple intervals when the decision threshold
matters.
Downgrade causal language when warranted. If identification assumptions are unverifiable or
refutation raises flags, soften claims: "consistent with a causal effect" not "causes", "estimated
effect" not "true effect". Flag uncertainty loudly in the report.
Ask the user when domain knowledge is needed. You cannot know whether an instrument is valid,
whether parallel trends holds, or whether a confounder exists without domain expertise. Ask
before assuming.
Delegate PyMC mechanics to bayesian-workflow. This skill handles causal structure and design.
The bayesian-workflow skill handles priors, sampling, diagnostics, calibration, and reporting
format. Don't duplicate those rules here.

Common gotchas

These are battle-tested lessons that save hours of debugging:

CausalPy formula syntax uses C() for categoricals. Passing a string column directly without
C() will silently produce wrong dummy coding. Always wrap categorical treatment and group
variables: "y ~ C(treatment) + C(group)".
DoWhy requires explicit U nodes for unobserved confounders. Omitting them from the graph
will make DoWhy treat your model as fully identified when it isn't. Add latent nodes explicitly
and mark them as unobserved.
CausalPy's PyMC models don't auto-store log-likelihood. Same issue as bayesian-workflow:
nutpie silently drops it. Call pm.compute_log_likelihood(idata, model=model) after sampling if
you need it for model comparison.
Parallel trends is untestable in the post-treatment period. Pre-treatment trend tests are
necessary but not sufficient — passing them doesn't prove the assumption holds after treatment.
State this explicitly in every DiD report.
Synthetic control requires the treated unit to lie within the convex hull of donors. If the
treated unit is an outlier (highest GDP, largest city), no weighted combination of donors can
approximate its counterfactual. Check this before running — if violated, the design is invalid.
DiD group variable must be dummy-coded (0/1). CausalPy rejects string labels like "treatment"/"control". Use integers: 1 = treatment, 0 = control. Data also requires a unit column.
SyntheticControl expects wide-format data. Index = time, columns = unit names, values = outcome. If your data is long format, pivot first: df.pivot(index="date", columns="unit", values="outcome").

When things go wrong

Symptom	Likely cause	Fix
Refutation fails	Assumption violated	Diagnose which assumption, try alternative design or sensitivity bounds
DiD effect at placebo time	Parallel trends violated	Try synthetic control or add group-specific time trends
RDD: bunching at threshold	Manipulation of running variable	Design is invalid for this threshold — report and stop
SC: poor pre-treatment fit	Donors don't span treated unit	Add donors, expand donor pool, or reconsider design
DoWhy says "not identifiable"	Insufficient adjustment set	Revise DAG, add measured variables, or change design
CausalPy formula error	Wrong formula syntax	Use `C()` for categoricals, check variable names match dataframe columns

causal-inference

Resources

Install

Causal Inference

Dependencies

Workflow overview

Design selection guide

Critical rules

Common gotchas

When things go wrong

Categories

Install

Recommended Skills