amanning3390

deepswarm

Use when running parallel AI workers for any long-running or multi-turn batch API task. Auto-calculates optimal workers + stagger. Supports tiered delegation (V4 Pro orchestrator → V4 Flash workers). 99.95% API success rate at scale.

amanning3390 122 4 Updated 4w ago

Resources

7
GitHub

Install

npx skillscat add amanning3390/deepswarm

Install via the SkillsCat registry.

SKILL.md

DeepSwarm — Task-Agnostic Parallel Worker Orchestration

Spawn N parallel API workers for any long-running or multi-turn batch task. Auto-calculates optimal worker count and stagger delay. Supports tiered model delegation: orchestrator plans with a frontier model (V4 Pro), workers execute with a cheaper model (V4 Flash).

Overview

DeepSwarm 2.0 generalizes the proven orchestration pattern from the 19,331-trace generation project to any batch API task. You define a task — translations, reasoning traces, code reviews, summarization — and DeepSwarm parallelizes it across optimal workers with the right stagger for your API.

The core insight: API rate limits are a function of simultaneous connections, not total volume. Auto-calculated stagger + worker count = 99.95% success.

When to Use

  • Any batch API task: generation, translation, summarization, extraction, classification
  • Long-running individual calls (30s+) that benefit from parallelization
  • Multi-turn tasks where each worker loops through conversation turns
  • Cost optimization via tiered delegation (orchestrator ≠ worker model)
  • Crash-resilient batch processing (checkpointed, idempotent)

Don't use for:

  • Quick calls under 10s (overhead not worth it — just loop)
  • Tasks requiring inter-worker coordination (use delegate_task)
  • Real-time interactive sessions (use tmux-agent-orchestrator)

Quick Start

# Install
hermes skills tap add amanning3390/deepswarm

# Define your task (task.yaml)
# Generate seeds
python3 scripts/seed.py --task task.yaml

# Launch — auto-optimizes workers, stagger, model routing
python3 scripts/swarm.py --task task.yaml --total 1000

# Filter — repair JSON, validate structure, apply length thresholds
python3 scripts/filter.py --input-dir output/ --output clean.jsonl --errors errors.jsonl

Task Definition (task.yaml)

# What to do
task_type: generation              # generation | translation | summarization | custom
prompt_template: |
  You are an AI assistant. {{seed}}

# Model routing (tiered delegation)
orchestrator_model: deepseek-v4-pro  # Plans, monitors, handles errors
worker_model: deepseek-v4-flash      # Executes batches (cheaper!)
worker_api_base: https://api.deepseek.com/v1/chat/completions
worker_max_tokens: 4096

# Execution control
multi_turn: true                    # Workers loop through conversation turns
max_turns: 20                       # Max turns per worker conversation
seeds_file: seeds.jsonl             # Pre-generated task seeds

# Worker optimization (auto-calculated if omitted)
workers: auto                       # auto | N
stagger: auto                       # auto | seconds
batch_size: auto                    # auto | tasks per worker

# Output
output_dir: output/
output_format: jsonl               # jsonl | json | parquet
checkpoint_every: 10                # Save progress every N tasks

# Optional: custom worker logic
worker_script: custom_worker.py     # Override default worker behavior

Tiered Model Delegation

Orchestrator (V4 Pro) and workers (V4 Flash) can use different models:

User Task → V4 Pro (plans, monitors)
              ├─ V4 Flash Worker 0 → API → output/
              ├─ V4 Flash Worker 1 → API → output/
              ├─ V4 Flash Worker 2 → API → output/
              └─ ...

Why tiered delegation matters:

  • V4 Pro costs ~3× V4 Flash per token
  • Orchestrator only plans + monitors (few calls)
  • Workers make thousands of calls — use the cheapest model that works
  • Typical savings: 60-70% vs using V4 Pro for everything

When to use same model for both:

  • Task quality requires frontier reasoning at every step
  • Worker model doesn't support the required format
  • Budget allows it and quality is paramount

Auto-Optimization

When workers: auto and stagger: auto:

  1. DeepSwarm runs a single calibration call to measure call duration
  2. Calculates optimal workers: min(8, floor(rate_limit / call_duration))
  3. Sets stagger: call_duration / workers × 2
  4. Adjusts batch_size: total / workers

Calibration table (pre-computed):

Call Duration Workers Stagger Success Throughput
<10s 16 1s 99.9% ~5,760/hr
10-30s 12 2s 99.9% ~1,440/hr
30-60s 8 5s 99.95% ~440/hr
60-90s 6 10s 99.9% ~240/hr
>90s 4 15s 99.9% ~96/hr

Multi-Turn Task Support

For tasks requiring conversation loops (generation, debugging, interactive work):

Worker loop:
  for each seed:
    messages = [system_prompt, user_task]
    for turn in range(max_turns):
      response = api_call(messages, model=worker_model)
      messages.append({"role": "assistant", "content": response})
      if task_complete(response):
        break
      if needs_tool_call(response):
        messages.append(simulate_tool_response(response))

Each turn is an independent API call. Multi-turn tasks benefit most from parallelization because per-task latency is high.

Task-Agnostic Worker Design

The worker (worker.py) accepts a YAML task definition and executes any pipeline:

def run_task(seed, config):
    messages = build_messages(seed, config)
    for turn in range(config["max_turns"]):
        response = call_api(messages, config)
        if is_complete(response, config):
            return finish(response, messages)
        if needs_continuation(response, config):
            messages = append_turn(messages, response, config)
    return messages

Built-in task types:

  • generation — Generate content from seed (the trace generation pattern)
  • translation — Translate each seed text
  • summarization — Summarize each seed document
  • classification — Classify each seed input
  • custom — Uses worker_script for completely custom logic

Common Pitfalls

  1. workers: auto choosing too many. If calibration call was fast but actual calls are slow, override manually.
  2. Forgetting to stagger between workers. Even 2 workers at same millisecond can trigger rate limits on slow APIs.
  3. Mixing models without checking format compatibility. Worker model must support the same prompt format as orchestrator.
  4. Not checkpointing. Worker dies at 120/125 = lost work. Checkpoint every 10.
  5. Shell & without wait. Without wait, shell exits early and kills child workers.
  6. Using V4 Pro for workers when V4 Flash would work. Check quality on 10-sample test before committing to expensive model.
  7. Not deleting error outputs before restarting. Error files consume indices and inflate disk.

Verification Checklist

  • Task YAML has valid model names and API base URL
  • API key exported for both orchestrator and worker models
  • workers: auto or manual count ≤ 8 per batch
  • stagger: auto or manual ≥ call_duration / workers × 2
  • Worker model tested on 5-sample run before full batch
  • Output directory exists and is writable
  • Seeds file exists with correct format
  • Checkpointing enabled for runs >100 tasks
  • Orchestrator model is V4 Pro (or equivalent frontier) for planning
  • Worker model is V4 Flash (or cheapest model that handles the task)