Pueue universal CLI telemetry and job orchestration. TRIGGERS - run on bigblack, run on littleblack, queue job, long-running task, cache population, batch processing, GPU workstation, pueue callback, pueue delay, pueue priority.
Resources
1Install
npx skillscat add terrylica/cc-skills/pueue-job-orchestration Install via the SkillsCat registry.
Pueue Job Orchestration
Universal CLI telemetry layer and job management — every command routed through pueue gets precise timing, exit code capture, full stdout/stderr logs, environment snapshots, and callback-on-completion.
Overview
Pueue is a Rust CLI tool for managing shell command queues. It provides:
- Daemon persistence - Survives SSH disconnects, crashes, reboots
- Disk-backed queue - Auto-resumes after any failure
- Group-based parallelism - Control concurrent jobs per group
- Easy failure recovery - Restart failed jobs with one command
- Full telemetry - Timing, exit codes, stdout/stderr logs, env snapshots per task
When to Route Through Pueue
| Operation | Route Through Pueue? | Why |
|---|---|---|
| Any command >30 seconds | Always | Telemetry, persistence, log capture |
| Batch operations (>3 items) | Always | Parallelism control, failure isolation |
| Build/test pipelines | Recommended | --after DAGs, group monitoring |
| Data processing | Always | Checkpoint resume, state management |
| Quick one-off commands (<5s) | Optional | Overhead is ~100ms, but you get logs |
| Interactive commands (editors, REPLs) | Never | Pueue can't handle stdin interaction |
When to Use This Skill
Use this skill when the user mentions:
| Trigger | Example |
|---|---|
| Running tasks on BigBlack/LittleBlack | "Run this on bigblack" |
| Long-running data processing | "Populate the cache for all symbols" |
| Batch/parallel operations | "Process these 70 jobs" |
| SSH remote execution | "Execute this overnight on the GPU server" |
| Cache population | "Fill the ClickHouse cache" |
| Pueue features | "Set up a callback", "delay this job" |
Quick Reference
Check Status
# Local
pueue status
# Remote (BigBlack)
ssh bigblack "~/.local/bin/pueue status"Queue a Job
# Local (with working directory)
pueue add -w ~/project -- python long_running_script.py
# Local (simple)
pueue add -- python long_running_script.py
# Remote (BigBlack)
ssh bigblack "~/.local/bin/pueue add -w ~/project -- uv run python script.py"
# With group (for parallelism control)
pueue add --group p1 --label "BTCUSDT@1000" -w ~/project -- python populate.py --symbol BTCUSDTMonitor Jobs
pueue follow <id> # Watch job output in real-time
pueue log <id> # View completed job output
pueue log <id> --full # Full output (not truncated)Manage Jobs
pueue restart <id> # Restart failed job
pueue restart --all-failed # Restart ALL failed jobs
pueue kill <id> # Kill running job
pueue clean # Remove completed jobs from list
pueue reset # Clear all jobs (use with caution)Host Configuration
| Host | Location | Parallelism Groups |
|---|---|---|
| BigBlack | ~/.local/bin/pueue |
p1 (16), p2 (2), p3 (3), p4 (1) |
| LittleBlack | ~/.local/bin/pueue |
default (2) |
| Local (macOS) | /opt/homebrew/bin/pueue |
default |
Workflows
1. Queue Single Remote Job
# Step 1: Verify daemon is running
ssh bigblack "~/.local/bin/pueue status"
# Step 2: Queue the job
ssh bigblack "~/.local/bin/pueue add --label 'my-job' -- cd ~/project && uv run python script.py"
# Step 3: Monitor progress
ssh bigblack "~/.local/bin/pueue follow <id>"2. Batch Job Submission (Multiple Symbols)
For rangebar cache population or similar batch operations:
# Use the pueue-populate.sh script
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh setup" # One-time
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh phase1" # Queue Phase 1
ssh bigblack "cd ~/rangebar-py && ./scripts/pueue-populate.sh status" # Check progress3. Configure Parallelism Groups
# Create groups with different parallelism limits
pueue group add fast # Create 'fast' group
pueue parallel 4 --group fast # Allow 4 parallel jobs
pueue group add slow
pueue parallel 1 --group slow # Sequential execution
# Queue jobs to specific groups
pueue add --group fast -- echo "fast job"
pueue add --group slow -- echo "slow job"4. Handle Failed Jobs
# Check what failed
pueue status | grep Failed
# View error output
pueue log <id>
# Restart specific job
pueue restart <id>
# Restart all failed jobs
pueue restart --all-failedInstallation
macOS (Local)
brew install pueue
pueued -d # Start daemonLinux (BigBlack/LittleBlack)
# Download from GitHub releases (see https://github.com/Nukesor/pueue/releases for latest)
curl -sSL https://raw.githubusercontent.com/terrylica/rangebar-py/main/scripts/setup-pueue-linux.sh | bash
# Or manually:
# SSoT-OK: Version from GitHub releases page
PUEUE_VERSION="v4.0.2"
curl -sSL "https://github.com/Nukesor/pueue/releases/download/${PUEUE_VERSION}/pueue-x86_64-unknown-linux-musl" -o ~/.local/bin/pueue
curl -sSL "https://github.com/Nukesor/pueue/releases/download/${PUEUE_VERSION}/pueued-x86_64-unknown-linux-musl" -o ~/.local/bin/pueued
chmod +x ~/.local/bin/pueue ~/.local/bin/pueued
# Start daemon
~/.local/bin/pueued -dSystemd Auto-Start (Linux)
mkdir -p ~/.config/systemd/user
cat > ~/.config/systemd/user/pueued.service << 'EOF'
[Unit]
Description=Pueue Daemon
After=network.target
[Service]
ExecStart=%h/.local/bin/pueued -v
Restart=on-failure
[Install]
WantedBy=default.target
EOF
systemctl --user daemon-reload
systemctl --user enable --now pueuedIntegration with rangebar-py
The rangebar-py project has Pueue integration scripts:
| Script | Purpose |
|---|---|
scripts/pueue-populate.sh |
Queue cache population jobs with group-based parallelism |
scripts/setup-pueue-linux.sh |
Install Pueue on Linux servers |
scripts/populate_full_cache.py |
Python script for individual symbol/threshold jobs |
Phase-Based Execution
# Phase 1: 1000 dbps (fast, 4 parallel)
./scripts/pueue-populate.sh phase1
# Phase 2: 250 dbps (moderate, 2 parallel)
./scripts/pueue-populate.sh phase2
# Phase 3: 500, 750 dbps (3 parallel)
./scripts/pueue-populate.sh phase3
# Phase 4: 100 dbps (resource intensive, 1 at a time)
./scripts/pueue-populate.sh phase4Troubleshooting
| Issue | Cause | Solution |
|---|---|---|
pueue: command not found |
Not in PATH | Use full path: ~/.local/bin/pueue |
Connection refused |
Daemon not running | Start with pueued -d |
| Jobs stuck in Queued | Group paused or at limit | Check pueue status, pueue start |
| SSH disconnect kills jobs | Not using Pueue | Queue via Pueue instead of direct SSH |
| Job fails immediately | Wrong working directory | Use pueue add -w /path or cd /path && pueue add |
Production Lessons (Issue #88)
Battle-tested patterns from real production deployments.
Dependency Chaining with --after
Pueue supports automatic job dependency resolution via --after. This is critical for post-processing pipelines where steps must run sequentially after batch jobs complete.
Key flags:
--after <id>...-- Start job only after ALL specified jobs succeed. If any dependency fails, this job fails too.--print-task-id(or-p) -- Return only the numeric job ID (for scripting).
Pattern: Capturing job IDs for dependency wiring
# Capture job IDs during batch submission
JOB_IDS=()
for symbol in BTCUSDT ETHUSDT; do
job_id=$(cd /path/to/project && pueue add --print-task-id --group mygroup \
--label "${symbol}@250" \
-- uv run python scripts/process.py --symbol "$symbol")
JOB_IDS+=("$job_id")
done
# Chain post-processing after ALL batch jobs
optimize_id=$(pueue add --print-task-id --group mygroup \
--label "optimize-table" \
--after "${JOB_IDS[@]}" \
-- clickhouse-client --query "OPTIMIZE TABLE mydb.mytable FINAL")
# Chain validation after optimize
pueue add --group mygroup \
--label "validate" \
--after "$optimize_id" \
-- uv run python scripts/validate.pyResult in pueue status:
Job 0 BTCUSDT@250 Running
Job 1 ETHUSDT@250 Running
Job 2 optimize-table Queued Deps: 0, 1
Job 3 validate Queued Deps: 2When to use --after:
- Post-processing steps (OPTIMIZE TABLE, validation scripts, cleanup)
- Multi-stage pipelines where Stage N depends on Stage N-1
- Verification jobs that should only run after data is fully written
Anti-pattern: Manual waiting
# BAD: Manual polling or instructions to "run this after that finishes"
postprocess_all() {
queue_repopulation_jobs
echo "Run 'pueue wait --group postfix' then run optimize manually" # NO!
}
# GOOD: Automatic dependency chain
postprocess_all() {
queue_repopulation_jobs # captures JOB_IDS
pueue add --after "${JOB_IDS[@]}" -- optimize_command
pueue add --after "$optimize_id" -- validate_command
}Mise Task to Pueue Pipeline Integration
Pattern for mise run commands that build pueue DAGs:
# .mise/tasks/cache.toml
["cache:postprocess-all"]
description = "Full post-fix pipeline via pueue: repopulate -> optimize -> detect (auto-chained)"
run = "./scripts/pueue-populate.sh postprocess-all"The shell script captures pueue job IDs and chains them with --after. Mise provides the entry point; pueue provides the execution engine with dependency resolution.
Forensic Audit Before Deployment
ALWAYS audit the remote host before mutating anything:
# 1. Pueue job state
ssh host 'pueue status'
ssh host 'pueue status --json | python3 -c "import json,sys; d=json.load(sys.stdin); print(sum(1 for t in d[\"tasks\"].values() if \"Running\" in str(t[\"status\"])))"'
# 2. Database state (ClickHouse example)
ssh host 'clickhouse-client --query "SELECT symbol, threshold, count(), countIf(volume < 0) FROM mytable GROUP BY ALL"'
# 3. Checkpoint state
ssh host 'ls -la ~/.cache/myapp/checkpoints/'
ssh host 'cat ~/.cache/myapp/checkpoints/latest.json'
# 4. System resources
ssh host 'uptime && free -h && df -h /home'
# 5. Installed version
ssh host 'cd ~/project && git log --oneline -1'Force-Refresh vs Checkpoint Resume
Decision matrix for restarting killed/failed jobs:
| Scenario | Action | Flag |
|---|---|---|
| Job killed mid-run, data is clean | Resume from checkpoint | (no --force-refresh) |
| Data is corrupt (overflow, schema bug) | Wipe and restart | --force-refresh |
| Code fix changes output format | Wipe and restart | --force-refresh |
| Code fix is internal-only (no output change) | Resume from checkpoint | (no --force-refresh) |
PATH Gotcha: Rust Not in PATH via uv run
On remote hosts, uv run maturin develop may fail because ~/.cargo/bin is not in uv run's PATH:
# FAILS: rustc not found
ssh host 'cd ~/project && uv run maturin develop --uv'
# WORKS: Prepend cargo bin to PATH
ssh host 'cd ~/project && PATH="$HOME/.cargo/bin:$PATH" uv run maturin develop --uv'For pueue jobs that need Rust compilation:
pueue add -- env PATH="/home/user/.cargo/bin:$PATH" uv run maturin developPer-Year (Epoch) Parallelization — DEFAULT STRATEGY
This is the default approach for all multi-year cache population. Never queue a monolithic multi-year job when epoch boundaries exist. A single DOGEUSDT@500 job estimated 22 days; per-year splits brought it to ~3-4 days with 4 parallel cores.
When a processing pipeline has natural reset boundaries (yearly, monthly, etc.) where processor state resets, each epoch becomes an independent processing unit. This enables massive speedup by splitting a multi-year sequential job into concurrent per-year pueue jobs.
Why it's safe (three isolation layers):
| Layer | Why No Conflicts |
|---|---|
| Checkpoint files | Filename includes {start}_{end} — each year gets unique file |
| Database writes | INSERT is append-only; OPTIMIZE TABLE FINAL deduplicates after |
| Source data | Read-only files (Parquet, CSV, etc.) — no write contention |
Pattern: Per-symbol pueue groups
Give each symbol (or job family) its own pueue group for independent parallelism control:
# Create per-symbol groups
pueue group add btc-yearly --parallel 4
pueue group add eth-yearly --parallel 4
pueue group add shib-yearly --parallel 4
# Queue per-year jobs
for year in 2019 2020 2021 2022 2023 2024 2025 2026; do
pueue add --group btc-yearly \
--label "BTC@250:${year}" \
-- uv run python scripts/process.py \
--symbol BTCUSDT --threshold 250 \
--start-date "${year}-01-01" --end-date "${year}-12-31"
done
# Chain post-processing after ALL groups complete
ALL_JOB_IDS=($(pueue status --json | jq -r \
'.tasks | to_entries[] | select(.value.group | test("-yearly$")) | .value.id'))
pueue add --after "${ALL_JOB_IDS[@]}" \
--label "optimize-table:final" \
-- clickhouse-client --query "OPTIMIZE TABLE mydb.mytable FINAL"When to use per-year vs sequential:
| Scenario | Approach |
|---|---|
| High-volume symbol (many output items) | Per-year (5+ cores idle) |
| Low-volume symbol (fast enough already) | Sequential (simpler) |
| Single parameter, long backfill | Per-year |
| Multiple parameters, same symbol | Sequential per parameter |
Critical rules:
- Working directory: Use
pueue add -w ~/project(preferred) orcd ~/project && pueue add— SSH cwd defaults to$HOME, not the project directory. Jobs fail instantly withNo such file or directoryif this is missed. Note: on macOS,-w /tmpresolves to/private/tmp(symlink). - First year uses domain-specific effective start date, not
01-01 - Last year uses actual latest available date as end
- Chain
OPTIMIZE TABLE FINALafter ALL year-jobs via--after - Memory budget: each job peaks independently — with 61 GB total, 4-5 concurrent jobs at 5 GB each are safe
- No
--force-refreshon per-year jobs when other year-jobs for the same symbol are running — it deletes cached bars by date range and can conflict with concurrent writes.
Pipeline Monitoring (Group-Based Phase Detection)
For multi-group pipelines, monitor job phases by group completion, not hardcoded job IDs. Job IDs change when jobs are removed, re-queued, or split into per-year jobs.
Anti-pattern: Hardcoded job IDs in monitors
# WRONG: Breaks when jobs are removed/re-queued
job14=$(echo "$JOBS" | grep "^14|")
if [ "$(echo "$job14" | cut -d'|' -f2)" = "Done" ]; then
echo "Phase 1 complete"
fiCorrect pattern: Dynamic group detection
get_job_status() {
ssh host "pueue status --json 2>/dev/null" | jq -r \
'.tasks | to_entries[] |
"\(.value.id)|\(.value.status | if type == "object" then keys[0] else . end)|\(.value.label // "-")|\(.value.group)"'
}
group_all_done() {
local group="$1"
local group_jobs
group_jobs=$(echo "$JOBS" | grep "|${group}$" || true)
[ -z "$group_jobs" ] && return 1
echo "$group_jobs" | grep -qE "\|(Running|Queued)\|" && return 1
return 0
}
# Detect phase transitions by group name
SEEN_GROUPS=""
for group in $(echo "$JOBS" | cut -d'|' -f4 | sort -u); do
if group_all_done "$group" && [[ "$SEEN_GROUPS" != *"|${group}|"* ]]; then
echo "GROUP COMPLETE: $group"
run_integrity_checks "$group"
SEEN_GROUPS="${SEEN_GROUPS}|${group}|"
fi
doneIntegrity checks at phase boundaries:
Run automated validation when a group finishes, before starting the next phase:
run_integrity_checks() {
local phase="$1"
# Check 1: Data corruption (negative values, out-of-bounds)
ssh host 'clickhouse-client --query "SELECT ... countIf(value < 0) ... HAVING count > 0"'
# Check 2: Duplicate rows
ssh host 'clickhouse-client --query "SELECT ... count(*) - uniqExact(key) as dupes HAVING dupes > 0"'
# Check 3: Coverage gaps (NULL required fields)
ssh host 'clickhouse-client --query "SELECT ... countIf(field IS NULL) ... HAVING missing > 0"'
# Check 4: System resources (load, memory)
ssh host 'uptime && free -h'
}Monitoring as a background loop:
POLL_INTERVAL=300 # 5 minutes
while true; do
JOBS=$(get_job_status)
# Count statuses, detect failures, detect group completions
# Run integrity checks at phase boundaries
# Exit when all jobs complete
sleep "$POLL_INTERVAL"
doneState File Management (CRITICAL)
Pueue stores ALL task metadata in a single state.json file. This file grows with every completed task and is read/written on EVERY pueue add call. Neglecting state hygiene is the #1 cause of slow job submission in large sweeps.
The State Bloat Anti-Pattern
Symptom: pueue add takes 1-2 seconds instead of <100ms.
Root cause: Pueue serializes/deserializes the entire state file on every operation. With 50K+ completed tasks, state.json grows to 80-100MB. Each pueue add becomes 80MB read + 80MB write = 160MB I/O.
Benchmarks (pueue v4, NVMe SSD, 32-core Linux):
| Completed Tasks | state.json Size | pueue add Latency (sequential) |
pueue add Latency (xargs -P16) |
|---|---|---|---|
| 53,000 | 94 MB | 1,300 ms/add | 455 ms/add (mutex contention) |
| 0 (after clean) | 245 KB | 106 ms/add | 8 ms/add (effective) |
Key insight: Parallelism does NOT help when state is bloated — the pueue daemon serializes all operations through a mutex. The 455ms at P16 is WORSE per-operation than 1,300ms sequential because of lock contention overhead. Clean first, then parallelize.
Pre-Submission Clean (Mandatory Pattern)
Before any bulk submission (>100 jobs), clean completed tasks:
# ALWAYS clean before bulk submission
pueue clean -g mygroup 2>/dev/null || true
# Verify state is manageable
STATE_FILE="$HOME/.local/share/pueue/state.json"
STATE_SIZE=$(stat -c%s "$STATE_FILE" 2>/dev/null || stat -f%z "$STATE_FILE" 2>/dev/null || echo 0)
if [ "$STATE_SIZE" -gt 52428800 ]; then # 50MB
echo "WARNING: state.json is $(( STATE_SIZE / 1048576 ))MB — running extra clean"
pueue clean 2>/dev/null || true
fiPeriodic Clean During Long Sweeps
For sweeps with 100K+ jobs, clean periodically between submission batches:
BATCH_SIZE=5000
POS=0
while [ "$POS" -lt "$TOTAL" ]; do
# Submit batch
tail -n +$((POS + 1)) "$CMDFILE" | head -n "$BATCH_SIZE" | \
xargs -P16 -I{} bash -c '{}' 2>/dev/null || true
POS=$((POS + BATCH_SIZE))
# Prevent state bloat between batches
pueue clean -g mygroup 2>/dev/null || true
doneBulk Submission with xargs -P (High-Throughput Pattern)
For large job counts (1K+), submitting one pueue add at a time via SSH is prohibitively slow. Use a batch command file fed through xargs -P for parallel submission.
Why Not GNU Parallel?
CRITICAL: Many Linux hosts (including Ubuntu/Debian) ship with moreutils parallel, NOT GNU Parallel. They share the binary name /usr/bin/parallel but are completely different tools:
| Feature | GNU Parallel | moreutils parallel |
|---|---|---|
| Job file | --jobs 16 --bar < commands.txt |
Not supported |
| Progress bar | --bar, --eta |
None |
| Resume | --resume --joblog log.txt |
Not supported |
| Syntax | parallel ::: arg1 arg2 |
parallel -- cmd1 -- cmd2 |
--version output |
GNU parallel YYYY |
parallel from moreutils |
Detection:
if parallel --version 2>&1 | grep -q 'GNU'; then
echo "GNU Parallel available"
else
echo "moreutils parallel (or none) — use xargs -P instead"
fiSafe default: Always use xargs -P — it's POSIX standard and available everywhere.
Batch Command File Pattern
Step 1: Generate commands file (one pueue add per line):
# gen_commands.sh — generates commands.txt
for SQL_FILE in /tmp/sweep_sql/*.sql; do
echo "pueue add -g p1 -- /tmp/run_job.sh '${SQL_FILE}' '${LOG_FILE}'"
done > /tmp/commands.txt
echo "Generated $(wc -l < /tmp/commands.txt) commands"Step 2: Feed via xargs -P (parallel submission):
# Submit in batches with periodic state cleanup
BATCH=5000
P=16
TOTAL=$(wc -l < /tmp/commands.txt)
POS=0
while [ "$POS" -lt "$TOTAL" ]; do
tail -n +$((POS + 1)) /tmp/commands.txt | head -n "$BATCH" | \
xargs -P"$P" -I{} bash -c '{}' 2>/dev/null || true
POS=$((POS + BATCH))
# Clean between batches to prevent state bloat
pueue clean -g p1 2>/dev/null || true
QUEUED=$(pueue status -g p1 --json 2>/dev/null | python3 -c \
"import json,sys; d=json.load(sys.stdin); print(sum(1 for t in d.get('tasks',{}).values() if 'Queued' in str(t.get('status',''))))" 2>/dev/null || echo "?")
echo "Batch: ${POS}/${TOTAL} | Queued: ${QUEUED}"
doneCrash Recovery with Skip-Done
For idempotent resubmission after SSH drops or crashes:
# Build done-set from existing JSONL output
declare -A DONE_SET
for logfile in /tmp/sweep_*.jsonl; do
while IFS= read -r config_id; do
DONE_SET["${config_id}"]=1
done < <(jq -r '.feature_config // empty' "$logfile" 2>/dev/null | sort -u)
done
# Generate commands, skipping completed configs
for SQL_FILE in /tmp/sweep_sql/*.sql; do
CONFIG_ID=$(basename "$SQL_FILE" .sql)
if [ "${DONE_SET[${CONFIG_ID}]+_}" ]; then
continue # Already completed
fi
echo "pueue add -g p1 -- /tmp/run_job.sh '${SQL_FILE}' '${LOG_FILE}'"
done > /tmp/commands.txtRequirements: bash 4+ for associative arrays (declare -A).
Two-Tier Architecture (300K+ Jobs)
For sweeps exceeding 10K queries, the single-tier "pueue add per query" pattern is unusable — pueue add has 148ms overhead per call even with clean state (= 8+ hours for 196K jobs). The fix is eliminating pueue add at the query level entirely.
Architecture
macOS (local)
mise run gen:generate → N SQL files
mise run gen:submit-all → rsync + queue M pueue units
mise run gen:collect → scp + validate JSONL
BigBlack (remote)
pueue group p1 (parallel=1) ← sequential units (avoid log contention)
├── Unit 1: submit_unit.sh pattern1 BTCUSDT 750
│ └── xargs -P16 → K queries (direct clickhouse-client, no pueue add)
├── Unit 2: submit_unit.sh pattern1 BTCUSDT 1000
│ └── xargs -P16 → K queries
└── ... (M total units)Key Principles
| Principle | Rationale |
|---|---|
| Pueue at unit level (100s of tasks) | Crash recovery per unit, pueue status readable |
| xargs -P16 at query level (1000s per unit) | Zero overhead, direct process execution |
Sequential units (parallel=1) |
Each unit appends to one JSONL file via flock — parallel units would contend |
| Skip-done dedup inside each unit | comm -23 on sorted config lists (O(N+M)) |
When to Use Each Tier
| Job Count | Pattern |
|---|---|
| 1-10 | Direct pueue add per job |
| 10-1K | Batch pueue add via xargs -P (see "Bulk Submission" section above) |
| 1K-10K | Batch pueue add with periodic pueue clean between batches |
| 10K+ | Two-tier: pueue per unit + xargs -P per query (this section) |
Shell Script Safety (set -euo pipefail)
| Trap | Symptom | Fix |
|---|---|---|
| SIGPIPE (exit 141) | ls path/*.sql | head -10 — head closes pipe early |
Write to temp file first, or use find -print0 | head -z |
| Pipe subshell data loss | echo "$OUT" | while read ...; done > file — writes lost in subshell |
Process substitution: while read ...; done < <(echo "$OUT") |
| eval injection | eval "val=\$$var" with untrusted input |
Use case statement or parameter expansion instead |
Skipped Config NDJSON Pattern
Configs with 0 signals after feature filtering produce 1 JSONL line (skipped entry), not N barrier lines. This is correct behavior, not data loss.
When validating line counts:
expected_lines = (N_normal × barriers_per_query) + (N_skipped × 1) + (N_error × 1)Example: 95 normal configs × 3 barriers + 5 skipped × 1 = 290 lines (not 300).
comm -23 for Large Skip-Done Sets (100K+)
For done-sets exceeding 10K entries, comm -23 (sorted set difference) is O(N+M) vs grep-per-file O(N×M):
# Build sorted done-set from JSONL
python3 -c "
import json
seen = set()
for line in open('\${LOG_FILE}'):
try:
d = json.loads(line)
fc = d.get('feature_config','')
if fc: seen.add(fc)
except: pass
for s in sorted(seen): print(s)
" > /tmp/done.txt
# Build sorted all-configs, compute set difference
ls \${DIR}/*.sql | xargs -n1 basename | sed 's/\.sql$//' | sort > /tmp/all.txt
comm -23 /tmp/all.txt /tmp/done.txt > /tmp/todo.txt
# Submit remaining via xargs
cat /tmp/todo.txt | while read C; do echo "\${DIR}/\${C}.sql"; done | \
xargs -P16 -I{} bash /tmp/wrapper.sh {} \${LOG} \${SYM} \${THR} \${GIT}ClickHouse Parallelism Tuning (pueue + ClickHouse)
When using pueue to orchestrate ClickHouse queries, the interaction between pueue parallelism and ClickHouse's thread scheduler determines actual throughput.
The Thread Soft Limit
ClickHouse has a concurrent_threads_soft_limit_ratio_to_cores setting (default: 2). On a 32-core machine, this means ClickHouse allows 64 concurrent execution threads total, regardless of how many queries are running.
Each query requests max_threads threads (default: auto = nproc = 32 on a 32-core machine). With 8 parallel queries each requesting 32 threads (= 256 requested), ClickHouse throttles to 64 actual threads. The queries get ~8 effective threads each, not 32.
Right-Size max_threads Per Query
Anti-pattern: Letting each query request 32 threads when it only gets 8 effective threads. This creates scheduling overhead for no benefit.
Fix: Set --max_threads to match the effective thread count:
# In the job wrapper script:
clickhouse-client --max_threads=8 --multiquery < "$SQL_FILE"This reduces thread scheduling overhead and allows higher pueue parallelism without oversubscription.
Parallelism Sizing Formula
effective_threads_per_query = concurrent_threads_soft_limit / pueue_parallel_slots
concurrent_threads_soft_limit = nproc * concurrent_threads_soft_limit_ratio_to_cores
# Example: 32-core machine, ratio=2, soft_limit=64
# 8 pueue slots → 8 effective threads/query → ~55% CPU (baseline)
# 16 pueue slots → 4 effective threads/query → ~87% CPU (1.5-1.8x throughput)
# 24 pueue slots → 2-3 effective threads/query → ~95% CPU (diminishing returns)Decision Matrix
| Dimension | Check | Safe Threshold |
|---|---|---|
| Memory | p99 per-query × N slots < server memory limit | < 50% of max_server_memory_usage |
| CPU | Load average < 90% of nproc | load < 0.9 × nproc |
| I/O | iostat disk utilization |
< 70% |
| Swap | vmstat si/so columns |
Must be 0 |
| CH errors | system.query_log ExceptionWhileProcessing |
Must be 0 |
Live Tuning (No Restart Required)
Pueue parallelism can be changed live — running jobs finish with old settings, new jobs use the new limit:
# Check current
pueue group | grep mygroup
# Bump up
pueue parallel 16 -g mygroup
# Monitor for 2-3 minutes, then check
uptime # Load average
free -h # Memory
vmstat 1 3 # Swap (si/so = 0?)
clickhouse-client --query "SELECT count() FROM system.query_log
WHERE event_time > now() - INTERVAL 5 MINUTE
AND type = 'ExceptionWhileProcessing'" # Errors = 0?Callback Hooks (Completion Notifications)
Pueue fires a callback command on every task completion. Configure in pueue.yml:
daemon:
callback: 'curl -s -X POST https://hooks.example.com/pueue -d ''{"id":{{id}},"result":"{{result}}","exit_code":{{exit_code}},"command":"{{command}}"}'''
callback_log_lines: 10 # Lines of stdout/stderr available in {{output}}Template Variables (14 total, Handlebars syntax)
| Variable | Type | Description |
|---|---|---|
{{id}} |
int | Task ID |
{{command}} |
string | The command that was run |
{{path}} |
string | Working directory |
{{group}} |
string | Group name |
{{result}} |
string | Success, Failed, Killed, DependencyFailed |
{{exit_code}} |
string | 0 on success, error code on failure, None otherwise |
{{start}} |
string | Unix timestamp of start time |
{{end}} |
string | Unix timestamp of end time |
{{output}} |
string | Last N lines of stdout/stderr (see callback_log_lines) |
{{output_path}} |
string | Full path to log file on disk |
{{queued_count}} |
string | Remaining queued tasks in this group |
{{stashed_count}} |
string | Remaining stashed tasks in this group |
Production Examples
# File-based sentinel (for script polling)
callback: "echo '{{id}}:{{result}}:{{exit_code}}' >> /tmp/pueue-completions.log"
# Telegram notification
callback: "curl -s 'https://api.telegram.org/bot${BOT_TOKEN}/sendMessage?chat_id=${CHAT_ID}&text=Job%20{{id}}%20{{result}}%20(exit%20{{exit_code}})'"
# Conditional alert (only on failure)
callback: "/bin/bash -c 'if [ \"{{result}}\" != \"Success\" ]; then echo \"FAILED: {{command}}\" | mail -s \"Pueue Alert\" user@example.com; fi'"Config File Location (Platform Difference)
| Platform | Config Path |
|---|---|
| macOS | ~/Library/Application Support/pueue/pueue.yml |
| Linux | ~/.config/pueue/pueue.yml |
See Pueue Config Reference for all settings.
Delayed Scheduling (--delay)
Queue a job that starts after a specified delay:
# Relative time
pueue add --delay 3h -- python heavy_computation.py
# Natural language
pueue add --delay "next wednesday 5pm" -- python weekly_report.py
# RFC 3339
pueue add --delay "2026-03-01T02:00:00" -- python overnight_batch.pyStashed + Delay Combo
Create stashed jobs that auto-enqueue at a future time:
# Stash now, auto-enqueue in 2 hours
pueue add --stashed --delay 2h -- python populate_cache.pyPatterns
| Pattern | Command |
|---|---|
| Off-peak batch scheduling | pueue add --delay "2am" -- python heavy_etl.py |
| Staggered thundering-herd prevention | pueue add --delay "${i}s" -- curl api/endpoint |
| Weekend-only processing | pueue add --delay "next saturday" -- python batch.py |
Priority Scheduling (--priority)
Higher priority number = runs first when a queue slot opens:
# Urgent validation (runs before queued lower-priority jobs)
pueue add --priority 10 -- python validate_critical.py
# Normal compute (default priority is 0)
pueue add -- python train_model.py
# Low-priority background task
pueue add --priority -5 -- python cleanup_logs.pyPriority only affects queued jobs waiting for an open slot. Running jobs are not preempted.
Per-Task Environment Override (pueue env)
Inject or override environment variables on stashed or queued tasks:
# Create a stashed job
JOB_ID=$(pueue add --stashed --print-task-id -- python train.py)
# Set environment variables (NOTE: separate args, NOT KEY=VALUE)
pueue env set "$JOB_ID" BATCH_SIZE 64
pueue env set "$JOB_ID" LEARNING_RATE 0.001
# Enqueue when ready
pueue enqueue "$JOB_ID"Syntax: pueue env set <id> KEY VALUE — the key and value are separate positional arguments.
Constraint: Only works on stashed/queued tasks. Cannot modify environment of running tasks.
Relationship to mise.toml [env]: mise [env] remains the SSoT for default environment. Use pueue env set only for one-off overrides (e.g., hyperparameter sweeps) without modifying config files.
Preferred Pattern: python-dotenv for Pueue Job Secrets
Pueue jobs run in clean shells without .bashrc, .zshrc, or mise activation. This means mise.toml [env] variables are invisible to pueue jobs. The most portable solution is python-dotenv:
Architecture
mise.toml → Task definitions only (no [env] for secrets)
.env → Secrets (gitignored, loaded by python-dotenv at runtime)
scripts/backfill.sh → Pueue orchestrator (just `cd $PROJECT_DIR` for dotenv)Implementation
1. Project .env (gitignored):
# .env — loaded by python-dotenv at runtime
API_KEY=sk-abc123
DATABASE_URL=postgresql://localhost/mydb2. Python entry point — call load_dotenv() early:
from dotenv import load_dotenv
load_dotenv() # Auto-loads .env from cwd
import os
API_KEY = os.getenv("API_KEY") # Works everywhere3. Pueue job — just needs cd to project root:
# The only requirement: cwd must contain .env
pueue add -- bash -c 'cd ~/project && uv run python my_script.py'Why This Beats Alternatives
| Approach | Interactive Shell | Pueue Job | Cron | SSH Remote | Cross-Platform |
|---|---|---|---|---|---|
mise [env] |
Yes | No | No | Fragile | macOS+Linux |
pueue env set |
N/A | Yes | No | No | N/A |
Export in .bashrc |
Yes | No | No | Depends | Varies |
python-dotenv + .env |
Yes | Yes | Yes | Yes | Yes |
Cross-reference: See distributed-job-safety skill — G-15, AP-16
Blocking Wait (pueue wait)
Block until tasks complete — simpler than polling loops for scripts:
# Wait for specific task
pueue wait 42
# Wait for all tasks in a group
pueue wait --group mygroup
# Wait for ALL tasks across all groups
pueue wait --all
# Wait quietly (no progress output)
pueue wait 42 --quiet
# Wait for tasks to reach a specific status
pueue wait --status queuedScript Integration Pattern
# Queue → wait → process results
TASK_ID=$(pueue add --print-task-id -- python etl_pipeline.py)
pueue wait "$TASK_ID" --quiet
EXIT_CODE=$(pueue status --json | jq -r ".tasks[\"$TASK_ID\"].status.Done.result" 2>/dev/null)
if [ "$EXIT_CODE" = "Success" ]; then
echo "Pipeline succeeded"
pueue log "$TASK_ID" --full
else
echo "Pipeline failed"
pueue log "$TASK_ID" --full >&2
fiCompressed State File
Reduce I/O for state persistence with zstd compression:
# In pueue.yml
daemon:
compress_state_file: trueCompression ratio: ~10:1 (from pueue source code).
When to enable:
- I/O-constrained hosts (spinning disks, NFS mounts)
- Large task histories (hundreds of completed tasks)
- Defense-in-depth alongside periodic
pueue clean
Note: Compression helps I/O performance. pueue clean reduces data volume. They are complementary, not alternatives.
macOS Auto-Start (launchd)
Auto-start the pueue daemon on login. Create the plist at ~/Library/LaunchAgents/com.nukesor.pueued.plist:
# Generate the launchd plist (standard Apple plist format) # SSoT-OK
cat > ~/Library/LaunchAgents/com.nukesor.pueued.plist << 'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>com.nukesor.pueued</string>
<key>ProgramArguments</key>
<array>
<string>/opt/homebrew/bin/pueued</string>
<string>-v</string>
</array>
<key>RunAtLoad</key>
<true/>
<key>KeepAlive</key>
<true/>
<key>StandardOutPath</key>
<string>/tmp/pueued.stdout.log</string>
<key>StandardErrorPath</key>
<string>/tmp/pueued.stderr.log</string>
</dict>
</plist>
PLISTThen load the agent:
# Load (starts immediately + on login)
launchctl load ~/Library/LaunchAgents/com.nukesor.pueued.plist
# Unload
launchctl unload ~/Library/LaunchAgents/com.nukesor.pueued.plist
# Check status
launchctl list | grep pueuedLinux equivalent: Use systemd — see pueued --systemd or create a user service in ~/.config/systemd/user/.
Related
- Hook:
itp-hooks/posttooluse-reminder.ts- Reminds to use Pueue for detected long-running commands - Reference: Pueue GitHub
- Issue: rangebar-py#77 - Original implementation
- Issue: rangebar-py#88 - Production deployment lessons