jwmossmoz

bigquery

Query Mozilla telemetry data directly from BigQuery using the bq CLI. Use when the user wants to run SQL against Firefox telemetry, analyze Windows version distribution, count DAU/MAU/WAU, query Glean metrics, or investigate user populations. Triggers on "bigquery", "bq", "telemetry query", "DAU", "MAU", "Windows distribution", "macOS distribution", "Darwin version", "Linux distribution", "kernel version", "client count", "user count", "Glean metrics query", "baseline_clients".

jwmossmoz 3 Updated 3mo ago

Resources

1
GitHub

Install

npx skillscat add jwmossmoz/agent-skills/bigquery

Install via the SkillsCat registry.

SKILL.md

BigQuery

Query Mozilla telemetry data directly using the bq CLI.

Prerequisites

  • gcloud and bq CLI installed (brew install google-cloud-sdk)
  • Authenticated: gcloud auth login with a Mozilla account
  • Billing project set: queries run against a project you have bigquery.jobs.create on
  • (Optional but highly recommended) mozdata-claude-plugin — provides Glean Dictionary MCP for metric/ping discovery, making it much easier to find the right tables and columns

Authentication

# Check current account
gcloud config get-value account

# Re-authenticate if needed
gcloud auth login

# List available projects
gcloud projects list --format="table(projectId,name)"

# Set billing project (mozdata is the standard choice)
gcloud config set project mozdata

If queries fail with "Access Denied", the billing project likely lacks permissions. Try --project_id=mozdata.

Running Queries

# Basic query
bq query --project_id=mozdata --use_legacy_sql=false --format=pretty "SELECT ..."

# Dry run (check cost before executing)
bq query --project_id=mozdata --use_legacy_sql=false --dry_run "SELECT ..."

Always use --project_id=mozdata and --use_legacy_sql=false.

Table Selection

Choose the right table — this is the most important optimization:

Query Type Table Why
Windows version distribution telemetry.windows_10_aggregate Pre-aggregated, instant
DAU/MAU by standard dimensions firefox_desktop_derived.active_users_aggregates_v3 Pre-computed, 100x faster
DAU with custom dimensions firefox_desktop.baseline_clients_daily One row per client per day
MAU/WAU/retention firefox_desktop.baseline_clients_last_seen Bit patterns, scan 1 day not 28
Event analysis firefox_desktop.events_stream Pre-unnested, clustered
Mobile search search.mobile_search_clients_daily_v2 Pre-aggregated
Specific Glean metric firefox_desktop.metrics Raw metrics ping

All tables are in the moz-fx-data-shared-prod project. Fully qualify as `moz-fx-data-shared-prod.{dataset}.{table}`.

Critical Rules

  • Always use aggregate tables first — raw tables are 10-100x more expensive
  • Always include partition filtersubmission_date or DATE(submission_timestamp)
  • Use sample_id = 0 for development (1% sample) — remove for production
  • Say "clients" not "users" — BigQuery tracks client_id, not actual humans
  • Never join across products by client_id — each product has its own namespace
  • Use events_stream for events — never raw events_v1 (requires UNNEST)
  • Use baseline_clients_last_seen for MAU — bit patterns, scan 1 day not 28

References

  • references/tables.md — Detailed table schemas and common query patterns
  • references/os-versions.md — Windows, macOS, and Linux version distribution queries with build number, Darwin, and kernel version mappings

Related Skills

  • redash — Web UI frontend to BigQuery with visualizations and sharing
  • mozdata:query-writing — Guided query writing with Glean Dictionary MCP
  • mozdata:probe-discovery — Find Glean metrics and telemetry probes