statistical-analysis

Statistical methods and tests, hypothesis testing, A/B testing frameworks, time series analysis, and experimental design

Logos-Liber 16 3 Updated 5mo ago

GitHub

Install

npx skillscat add logos-liber/atlas-agent-teams/statistical-analysis

Install via the SkillsCat registry.

SKILL.md

Statistical Analysis

Statistical Methods and Tests

Descriptive Statistics

Measures of Central Tendency: Mean, median, mode
Measures of Dispersion: Variance, standard deviation, range, interquartile range
Distribution Shape: Skewness, kurtosis
Correlation: Pearson, Spearman, Kendall correlation coefficients
Covariance: Measure of joint variability

Probability Distributions

Normal Distribution: Bell curve, symmetric, defined by mean and standard deviation
Binomial Distribution: Number of successes in n independent trials
Poisson Distribution: Number of events in fixed interval
Exponential Distribution: Time between events in Poisson process
Chi-Square Distribution: Sum of squared normal variables

Inferential Statistics

Confidence Intervals: Range of values likely to contain population parameter
Hypothesis Testing: Formal procedure for testing claims about populations
p-values: Probability of observing results as extreme as current, assuming null hypothesis
Statistical Power: Probability of correctly rejecting false null hypothesis
Effect Size: Magnitude of difference or relationship

Hypothesis Testing

Hypothesis Structure

Null Hypothesis (H0): Default assumption, no effect or difference
Alternative Hypothesis (H1): Claim to be tested, effect or difference exists
Type I Error: Rejecting true null hypothesis (false positive)
Type II Error: Failing to reject false null hypothesis (false negative)
Significance Level (α): Threshold for rejecting null hypothesis (typically 0.05)

Common Statistical Tests

t-test: Compare means between two groups
- One-sample t-test: Compare sample mean to known value
- Independent t-test: Compare means of two independent groups
- Paired t-test: Compare means of paired samples
ANOVA: Compare means across multiple groups
- One-way ANOVA: Single factor
- Two-way ANOVA: Two factors with interaction
Chi-Square Test: Test independence between categorical variables
Mann-Whitney U Test: Non-parametric alternative to t-test
Kruskal-Wallis Test: Non-parametric alternative to ANOVA

Multiple Testing Correction

Bonferroni Correction: Divide α by number of tests
False Discovery Rate (FDR): Control proportion of false positives
Benjamini-Hochberg: Adaptive FDR control

A/B Testing Frameworks

Experimental Design

Control Group: Receives current version or no treatment
Treatment Group: Receives new version or treatment
Random Assignment: Randomly assign subjects to groups
Sample Size Calculation: Determine required sample size for desired power
Stratification: Balance groups on important covariates

Metrics Selection

Primary Metric: Main measure of success
Secondary Metrics: Additional measures of interest
Guardrail Metrics: Ensure no negative impact on important KPIs
Binary Metrics: Conversion, click-through rate
Continuous Metrics: Revenue, time on page

Statistical Significance

Two-tailed Test: Test for difference in either direction
One-tailed Test: Test for difference in specific direction
Confidence Intervals: Provide range of plausible values
Minimum Detectable Effect (MDE): Smallest effect detectable with given power

Common Pitfalls

Peeking: Checking results before experiment ends
Simpson's Paradox: Trend appears in groups but disappears when combined
Novelty Effect: Temporary effect due to newness
Selection Bias: Non-random assignment to groups

Time Series Analysis

Time Series Components

Trend: Long-term increase or decrease
Seasonality: Regular, predictable patterns
Cyclical: Irregular, long-term cycles
Irregular/Noise: Random fluctuations

Stationarity

Definition: Statistical properties constant over time
Tests: Augmented Dickey-Fuller (ADF), KPSS test
Transformations: Differencing, log transformation
Importance: Required for many time series models

Forecasting Methods

Naive Forecast: Use last observed value
Moving Average: Average of last n values
Exponential Smoothing: Weighted average with decreasing weights
ARIMA: AutoRegressive Integrated Moving Average
Prophet: Facebook's forecasting tool for business time series
Neural Networks: LSTM, GRU for complex patterns

Seasonal Decomposition

Additive Model: Y = Trend + Seasonal + Residual
Multiplicative Model: Y = Trend × Seasonal × Residual
STL Decomposition: Seasonal-Trend decomposition using LOESS

Experimental Design

Design Principles

Randomization: Random assignment to treatment groups
Replication: Repeat experiment multiple times
Blocking: Group similar experimental units together
Factorial Design: Test multiple factors simultaneously
Control Groups: Baseline for comparison

Sample Size Determination

Power Analysis: Calculate required sample size
Effect Size: Expected magnitude of effect
Significance Level: Acceptable Type I error rate
Power: Desired probability of detecting effect (typically 0.8)

Experimental Validity

Internal Validity: Causal relationship between treatment and outcome
External Validity: Generalizability to other populations/settings
Construct Validity: Measurement accurately reflects concept
Statistical Conclusion Validity: Appropriate statistical methods

Common Designs

Completely Randomized Design: Random assignment to groups
Randomized Block Design: Block on nuisance variables
Factorial Design: Multiple factors with all combinations
Crossover Design: Subjects receive multiple treatments
Split-Plot Design: Hierarchical randomization

statistical-analysis

Install

Statistical Analysis

Statistical Methods and Tests

Descriptive Statistics

Probability Distributions

Inferential Statistics

Hypothesis Testing

Hypothesis Structure

Common Statistical Tests

Multiple Testing Correction

A/B Testing Frameworks

Experimental Design

Metrics Selection

Statistical Significance

Common Pitfalls

Time Series Analysis

Time Series Components

Stationarity

Forecasting Methods

Seasonal Decomposition

Experimental Design

Design Principles

Sample Size Determination

Experimental Validity

Common Designs

Categories

Install

Recommended Skills