"TODO: Complete and informative explanation of what the skill does and when to use it. Include WHEN to use this skill - specific scenarios, file types, or tasks that trigger it."
Install
npx skillscat add gwenwindflower/charmschool/dbt-projects Install via the SkillsCat registry.
dbt Project Conventions
YAML Conventions
Model Documentation (*.yml)
Each model file can contain:
- models: Documentation and column tests
- data_tests: Model-level tests
- unit_tests: Unit test definitions
- semantic_models: MetricFlow semantic layer
- metrics: Metric definitions
- saved_queries: Pre-defined metric queries
Example structure:
models:
- name: orders
description: One row per order.
data_tests:
- dbt_utils.expression_is_true:
expression: "order_total = subtotal + tax_paid"
columns:
- name: order_id
description: Primary key.
data_tests:
- not_null
- uniqueSource Definitions (__sources.yml)
Sources are defined in models/staging/__sources.yml:
sources:
- name: ecom
schema: raw
freshness:
warn_after: {count: 24, period: hour}
tables:
- name: raw_orders
loaded_at_field: ordered_atMacros
cents_to_dollars(column_name)
Converts cents to dollars with proper decimal handling. Uses adapter dispatch for cross-database compatibility.
{{ cents_to_dollars('amount') }} -- Returns: (amount / 100)::numeric(16, 2)generate_schema_name(custom_schema_name, node)
Custom schema routing:
- Seeds →
rawschema (always) - Prod target →
<default>_<custom>schema naming - Non-prod targets → default target schema
Testing Patterns
Data Tests
Column-level tests in YAML:
columns:
- name: order_id
data_tests:
- not_null
- unique
- relationships:
to: ref('stg_orders')
field: order_idExpression Tests
Model-level validation using dbt_utils.expression_is_true:
data_tests:
- dbt_utils.expression_is_true:
expression: "order_total - tax_paid = subtotal"Unit Tests
Test business logic with mocked inputs:
unit_tests:
- name: test_order_items_compute_to_bools_correctly
model: orders
given:
- input: ref('order_items')
rows:
- {order_id: 1, is_drink_item: true, is_food_item: false}
- input: ref('stg_orders')
rows:
- {order_id: 1}
expect:
rows:
- {order_id: 1, is_drink_order: true, is_food_order: false}dbt Packages Used
| Package | Version | Purpose |
|---|---|---|
dbt-labs/dbt_utils |
1.1.1 | Common macros and tests |
calogica/dbt_date |
0.10.0 | Date utilities |
dbt-labs/dbt-audit-helper |
main | Audit/comparison helpers |
CI/CD
GitHub Actions (.github/workflows/ci.yml)
PRs to main or staging trigger dbt Cloud CI jobs against:
- Snowflake
- BigQuery
- Postgres
Jobs create isolated schemas: dbt_jsdx__pr_<branch_name>
Branching Strategy (WAP Flow)
main: Production (read-only, deploys toprodschema)staging: Pre-production testing- Feature branches: Branch from
staging, merge back tostaging
Important Gotchas
Seed paths: The sample data lives in
jaffle-data/. To use it, add"jaffle-data"toseed-pathsindbt_project.yml, rundbt seed, then remove it.dbt Cloud required: This project is designed for dbt Cloud. Local dbt Core works but SQLFluff uses
dbt-cloudtemplater.Raw data schema: All source tables expect a
rawschema in your warehouse.Schema generation: The custom
generate_schema_namemacro routes schemas differently in prod vs non-prod environments.Packages first: Always run
dbt depsbeforedbt buildon a fresh clone.Semantic layer: The project includes MetricFlow semantic models and metrics, which require dbt Cloud to query interactively.
Adding New Models
dbt Models fit into 3 broad categories: staging, intermediate, and marts.
Staging models are the rawest, closest to the source data, and should be simple transformations like renaming, type casting, and basic cleaning. They maintain a 1-to-1 relationship with a single raw source table, except in extremely rare circumstances (e.g., if the physical data from your company's orders landed in 5 identical tables split by global region, and you never interacted with those tables separately, you might choose to unify them in the staging layer into a single stg_orders table).
Intermediate models usually have verb names, labelled by the transformational action they apply. For instance, int_aggregate_orders_by_customer would be a table that joins stg_orders to stg_customers and aggregates a variety of orders facts at the customer grain. All models from the intermediate layer select from staging models or other intermediate models only, never from raw source data. The only models that should contain a {{source()}} macro are staging models.
Mart models are the most refined, business-facing models, they present a combination of staging and/or intermediate models to create datasets ready for analysis, BI, AI engineering, or data science work. Marts are the source of truth for the business, the user-facing end of the dbt transformation pipeline.
There are different philosophies on marts, which have upstream effects on how you design and construct your DAG. Many people build towards a star schema, which allows flexible combinations of facts and dimensions, but the other end of the spectrum, the "OBT" (One Big Table) approach is also extremely popular. OBT modeling aims to provide a convenient, pre-computed table that can service the most common query patterns in a simple and performant way, at the cost of flexibility and storage efficiency. Modern data platforms - with cheap storage separated from compute, and highly optimized query engines running on powerful, scalable machines - have changed how people weigh these options significantly.
In reality, there are many shades in-between these approaches, purists are rare, and it is not uncommon to have a mixture of both. Building a rigorous star schema of marts, then constructing a secondary layer of OBT-style marts on top of that to provide quick and easy access to key metrics is probably the most common approach in real, everyday work. Ultimately, the right approach depends on the specifics of the business and stakeholder needs.
Staging Model Checklist
- Create
models/staging/stg_<entity>.sqlwith source → renamed CTE pattern - Add source table to
models/staging/__sources.ymlif new - Create
models/staging/stg_<entity>.ymlwith description and tests - Follow lowercase, explicit alias, 4-space indent conventions
Mart Model Checklist
- Create
models/marts/<entity>.sqljoining staging models - Create
models/marts/<entity>.ymlwith:- Model description
- Column descriptions and tests
- Semantic model definition (optional)
- Metrics (optional)
- Materialization defaults to
table(configured indbt_project.yml)