This skill should be used when the user wants to debug or trace where time is spent during query execution in Materialize. Trigger when the user mentions tracing queries, span analysis, query latency breakdown, Tempo traces, trace IDs, or wants to understand where time goes in CREATE TABLE, SELECT, INSERT, DROP, or any other SQL statement execution.
Resources
1Install
npx skillscat add materializeinc/materialize/query-tracing Install via the SkillsCat registry.
Query Tracing Skill
Use distributed tracing (OpenTelemetry / Tempo) to understand where time is
spent during SQL statement execution in a local Materialize instance.
For background on Materialize's tracing infrastructure (span levels, filtering,#[tracing::instrument], distributed context propagation), seedoc/developer/tracing.md.
Important: Use Optimized Builds
Always trace --optimized builds. Debug builds introduce significant
artifacts — unoptimized code, extra debug checks, and inflated self-times that
don't reflect production behavior. Tracing a debug build can be misleading
because time may appear in places that are negligible in release builds.
bin/environmentd --optimized --monitoringThe --optimized build takes longer to compile but produces traces that
accurately represent where time is actually spent.
Prerequisites
The local monitoring stack (Tempo + Grafana + Prometheus) and environmentd
with tracing enabled must be running. All network calls requiredangerouslyDisableSandbox.
Important: Start the monitoring stack before environmentd. If Tempo is
not running when environmentd starts sending spans, those spans are lost —
they will not be retroactively captured.
1. Start the monitoring stack
bin/mzcompose --find monitoring run defaultWait for Tempo to be ready:
# Poll until Tempo is accepting requests
for i in $(seq 1 30); do
curl -s -o /dev/null -w "%{http_code}" http://localhost:3200/ready | grep -q 200 && break
sleep 5
done2. Start environmentd with tracing
bin/environmentd --optimized --monitoringOr with the trace filter pre-set:
bin/environmentd --optimized --reset --monitoring -- --system-parameter-default='opentelemetry_filter=debug'3. Verify both services are running
Before tracing, confirm the stack is healthy:
# Tempo accepting traces?
curl -s -o /dev/null -w "%{http_code}" http://localhost:3200/ready
# environmentd listening?
lsof -tiTCP:6875 -sTCP:LISTEN4. Set the trace filter at runtime (if not set at startup)
Connect as mz_system and set the OpenTelemetry filter:
psql -U mz_system -h localhost -p 6877 materialize -c "ALTER SYSTEM SET opentelemetry_filter = 'debug';"debug is a good starting filter. For even more detail, use trace (but
expect much more data). You can also target specific modules:
ALTER SYSTEM SET opentelemetry_filter = 'mz_adapter::coord=trace,debug';Workflow: Trace a Query
Step 1: Run the query and capture the trace ID
Use psql with emit_trace_id_notice enabled. Important: Use separate-c flags for SET and the query, since DDL statements (CREATE, DROP, ALTER)
cannot run inside an implicit transaction block:
psql -U materialize -h localhost -p 6875 materialize \
-c "SET emit_trace_id_notice = true;" \
-c "<YOUR SQL STATEMENT HERE>;"The output will contain NOTICE lines with trace IDs — one per statement. The
SET statement gets its own trace ID; the second trace ID is the one you want:
SET
NOTICE: trace id: fd2f69eb059e4823d8d48a87ccab8f6f
NOTICE: trace id: 984bd8c69f99703243d350e14b02caea
CREATE TABLEThe trace ID for the query is the last one before the statement result.
To extract the trace ID programmatically:
OUTPUT=$(psql -U materialize -h localhost -p 6875 materialize \
-c "SET emit_trace_id_notice = true;" \
-c "CREATE TABLE foo (id INT);" 2>&1)
TRACE_ID=$(echo "$OUTPUT" | grep "trace id:" | tail -1 | sed 's/.*trace id: //')
echo "Trace ID: $TRACE_ID"Step 2: Wait for spans to be flushed
Spans are batched before export (default 5s delay, configured in--opentelemetry-sched-delay). Wait ~10 seconds after the query completes.
Step 3: Fetch the trace from Tempo
Query the Tempo HTTP API on port 3200. Retry if needed — spans may still be
in flight:
sleep 10
# Retry loop — first attempt often gets "trace not found"
for i in $(seq 1 5); do
RESP=$(curl -s http://localhost:3200/api/traces/$TRACE_ID)
if echo "$RESP" | grep -q "batches"; then
echo "$RESP" > /tmp/claude-1000/trace.json
echo "Trace fetched ($(echo "$RESP" | wc -c) bytes)"
break
fi
echo "Trace not ready, retrying... ($i/5)"
sleep 5
doneStep 4: Analyze the trace
The Tempo API returns traces in OTLP JSON format. Important: Span IDs and
parent span IDs are base64-encoded, not hex. The analysis script handles this.
Use the trace_tree.py script in this skill directory to analyze traces:
python3 .claude/skills/query-tracing/trace_tree.py /tmp/claude-1000/trace.json "My Query"Interpreting Results
- Self-time is where actual work happens. A span with 100ms duration but
0ms self-time is just a wrapper — its children do all the work. group_commit_apply::append_fut— Persist append operations, often
dominant for DDL. Time here is spent writing to the durable log.catalog::transact_inner— Catalog transaction processing. High
self-time means catalog state manipulation.oracle::write_ts/oracle::apply_write— Timestamp oracle calls to
CockroachDB. Each is a round-trip.consensus::compare_and_set/consensus::scan— Persist consensus
operations (also CockroachDB round-trips).coord::check_consistency— Post-DDL consistency check. Self-time here
is pure computation.- Sequential vs. concurrent children — children with overlapping time
ranges are concurrent; non-overlapping are sequential. - Code locations in
[src/...]brackets let you jump directly to the
source code. - For spans not listed above, read the source code at the location shown
in the[src/...]bracket to understand what the span does and why it's
taking time.
Tips
- Use
--optimizedbuilds. Debug builds inflate self-times with
unoptimized code paths and extra debug checks (e.g.,check_consistency
may appear much larger than in production). Always trace optimized builds
for meaningful results. - Start monitoring before environmentd. Spans emitted before Tempo is
ready are silently dropped. If you restart environmentd, make sure Tempo
is still running. - If traces are empty or incomplete, the
opentelemetry_filtermay be too
restrictive. Trydebugor eventrace. - The first fetch after a query often returns
trace not found. The default
batch delay is 5s, but ingestion adds latency. Always use a retry loop. - Tempo retains traces for only 15 minutes by default (configured in
misc/monitoring/tempo.yml). Fetch traces promptly. - To search for recent traces:
curl -s "http://localhost:3200/api/search?limit=10" - You can also view traces in Grafana at http://localhost:3000 by searching
for the trace ID in the Tempo datasource explore view. - For comparing before/after a code change, trace the same query on both
builds and compare the self-time rankings side by side.