causal-inference-mixtape

'This skill should be used when the user asks to "implement a DiD regression", "write a causal inference pipeline", "set up an event study", "implement instrumental variables", "run a regression discontinuity design", "build a synthetic control model", "implement propensity score matching", "write parallel trends test", "implement Bacon decomposition", or needs code templates for causal inference methods in Python, R, or Stata. Based on Scott Cunningham''s Causal Inference: The Mixtape.'

brycewang-stanford 2,623 356 Updated 2mo ago

Resources

GitHub

Install

npx skillscat add brycewang-stanford/auto-empirical-research-skills/causal-inference-mixtape

Install via the SkillsCat registry.

SKILL.md

Causal Inference: The Mixtape — Code Skill

Practitioner-oriented causal inference skill built from Scott Cunningham's Causal Inference: The Mixtape repository. Covers 10 identification strategies with ready-to-run code templates in Python, R, and Stata.

Methods Covered

Method	Python	R	Stata	Reference
OLS / Regression	statsmodels	estimatr	reg/reghdfe	`references/method-patterns.md` §1
Difference-in-Differences	statsmodels + C()	lfe/fixest	xtreg/reghdfe	`references/method-patterns.md` §2
Event Study (Dynamic DiD)	manual lead/lag	estimatr	reghdfe	`references/method-patterns.md` §3
Staggered DiD / TWFE	statsmodels	bacondecomp	bacondecomp	`references/method-patterns.md` §4
Regression Discontinuity	statsmodels polynomial	rdrobust	rdplot/rdrobust	`references/method-patterns.md` §5
Instrumental Variables	linearmodels IV2SLS	AER/ivreg	ivregress 2sls	`references/method-patterns.md` §6
Synthetic Control	rpy2 → R Synth	Synth + SCtools	synth	`references/method-patterns.md` §7
Matching / PSM / IPW	manual logit + weights	MatchIt + Zelig	teffects/cem	`references/method-patterns.md` §8
DAGs / Collider Bias	dagitty (conceptual)	dagitty/ggdag	—	`references/method-patterns.md` §9
Randomization Inference	permutation loop	ri2	ritest	`references/method-patterns.md` §10

Core Workflow

Implement a Causal Method

Identify the method from the table above
Load the appropriate template from references/method-patterns.md
Adapt variable names, fixed effects, and clustering to the user's data
Add robustness checks (parallel trends for DiD, McCrary for RDD, first-stage F for IV)

Choose the Right Language

Scenario	Recommendation
ML pipeline integration	Python (statsmodels + linearmodels)
Synthetic Control	R (Synth package) or Stata (synth) — Python lacks mature implementation
Bacon decomposition	R (bacondecomp) or Stata — no Python equivalent
Publication-ready tables	Stata (outreg2/esttab) or R (stargazer/modelsummary)
Coarsened Exact Matching	Stata (cem) or R (MatchIt) — no Python equivalent
Quick prototyping	Python with statsmodels

Cross-Language Equivalents

Task	Python	R	Stata
OLS with robust SE	`smf.ols().fit(cov_type='HC1')`	`lm_robust()`	`reg y x, robust`
Cluster SE	`fit(cov_type='cluster', cov_kwds={'groups': g})`	`felm(y ~ x	0
Two-way FE	`C(id) + C(time)` in formula	`felm(y ~ x	id + time)`
IV / 2SLS	`IV2SLS.from_formula('y ~ 1 + exog + [endog ~ inst]')`	`ivreg(y ~ exog	inst)`
DiD	`C(treat)*C(post)`	`treat:post` in formula	`did_multiplegt` or interaction

Key Python Patterns

DiD with Cluster-Robust SE

import statsmodels.formula.api as smf

model = smf.ols('y ~ C(treated)*C(post) + controls', data=df)
results = model.fit(cov_type='cluster', cov_kwds={'groups': df['firm_id']})

Event Study (Lead/Lag)

# Create relative time dummies
for k in range(-4, 5):
    col = f'rel_{k}' if k >= 0 else f'rel_m{abs(k)}'
    df[col] = (df['relative_time'] == k).astype(int)

# Drop t=-1 as reference
formula = 'y ~ ' + ' + '.join([c for c in rel_cols if c != 'rel_m1']) + ' + C(id) + C(year)'

IV / 2SLS

from linearmodels.iv import IV2SLS

model = IV2SLS.from_formula('y ~ 1 + exog + [endog ~ instrument]', data=df)
results = model.fit(cov_type='clustered', clusters=df['cluster_var'])

Robustness Check Patterns

Method	Required Checks
DiD	Parallel trends (event study plot), placebo treatment dates
RDD	McCrary density test, bandwidth robustness (half/double IK optimal), polynomial robustness
IV	First-stage F > 10, exclusion restriction argument, over-identification test
Synthetic Control	Pre-treatment RMSPE, placebo distribution, leave-one-out
Matching	Covariate balance table, caliper sensitivity

Common Pitfalls

TWFE with staggered treatment — standard two-way FE is biased when treatment timing varies. Use Bacon decomposition or Sun & Abraham / Callaway & Sant'Anna estimators.
Synthetic Control with many treated units — the Synth package handles one treated unit. For multiple, use augmented synthetic control or stacked approach.
RDD without McCrary test — always test for manipulation at the cutoff before estimating.
IV weak instruments — report first-stage F-statistic. Below 10 indicates weak instrument bias.
Python Synth gap — no mature Python Synth package exists. Use rpy2 to call R's Synth from Python.

Additional Resources

Reference Files

references/method-patterns.md — Detailed code templates for all 10 methods with full examples
references/r-stata-comparison.md — Cross-language package comparison and method coverage gaps

Prompt Files

prompts/01-implement-method.md — Copy-paste prompt for implementing any causal method
prompts/02-robustness-checks.md — Copy-paste prompt for generating robustness check code

causal-inference-mixtape

Resources

Install

Causal Inference: The Mixtape — Code Skill

Methods Covered

Core Workflow

Implement a Causal Method

Choose the Right Language

Cross-Language Equivalents

Key Python Patterns

DiD with Cluster-Robust SE

Event Study (Lead/Lag)

IV / 2SLS

Robustness Check Patterns

Common Pitfalls

Additional Resources

Reference Files

Prompt Files

Categories

Install

Recommended Skills