pipeline-design

Design ETL/ELT pipeline architectures with proper patterns for reliability and scalability.

timequity 6 Updated 7mo ago

GitHub

Install

npx skillscat add timequity/plugins/pipeline-design

Install via the SkillsCat registry.

SKILL.md

Pipeline Design

ETL vs ELT

Approach	When to Use
ETL	Transform before load, limited warehouse compute
ELT	Modern warehouses (Snowflake, BigQuery, Redshift)

Pipeline Patterns

Batch

Source → Extract → Stage → Transform → Load → Target
         │                    │
         └── Checkpoint ──────┘

Scheduled intervals (hourly, daily)
Full or incremental loads
Idempotent operations

Streaming

Source → Kafka/Kinesis → Process → Sink
              │
              └── State Store

Real-time requirements
Event-driven architecture
Exactly-once semantics

Design Principles

Idempotent - Safe to re-run
Incremental - Process only new/changed data
Observable - Metrics, logs, alerts
Testable - Unit tests for transformations
Recoverable - Checkpoints, retry logic

Staging Pattern

-- 1. Land raw data
COPY INTO raw.source_data FROM @stage;

-- 2. Deduplicate
CREATE TABLE staging.deduped AS
SELECT * FROM raw.source_data
QUALIFY ROW_NUMBER() OVER (PARTITION BY id ORDER BY _loaded_at DESC) = 1;

-- 3. Transform to target
MERGE INTO target.dim_customer
USING staging.deduped
ON target.id = staging.id
WHEN MATCHED THEN UPDATE ...
WHEN NOT MATCHED THEN INSERT ...;

Error Handling

Dead letter queues for failed records
Retry with exponential backoff
Alert on threshold breaches
Quarantine bad data for review

pipeline-design

Install

Pipeline Design

ETL vs ELT

Pipeline Patterns

Batch

Streaming

Design Principles

Staging Pattern

Error Handling

Categories

Install

Recommended Skills