deepswarm

Use when running parallel AI workers for any long-running or multi-turn batch API task. Auto-calculates optimal workers + stagger. Supports tiered delegation (V4 Pro orchestrator → V4 Flash workers). 99.95% API success rate at scale.

amanning3390 121 5 Updated 2mo ago

Resources

GitHub

Install

npx skillscat add amanning3390/deepswarm

Install via the SkillsCat registry.

SKILL.md

DeepSwarm — Task-Agnostic Parallel Worker Orchestration

Spawn N parallel API workers for any long-running or multi-turn batch task. Auto-calculates optimal worker count and stagger delay. Supports tiered model delegation: orchestrator plans with a frontier model (V4 Pro), workers execute with a cheaper model (V4 Flash).

Overview

DeepSwarm 2.0 generalizes the proven orchestration pattern from the 19,331-trace generation project to any batch API task. You define a task — translations, reasoning traces, code reviews, summarization — and DeepSwarm parallelizes it across optimal workers with the right stagger for your API.

The core insight: API rate limits are a function of simultaneous connections, not total volume. Auto-calculated stagger + worker count = 99.95% success.

When to Use

Any batch API task: generation, translation, summarization, extraction, classification
Long-running individual calls (30s+) that benefit from parallelization
Multi-turn tasks where each worker loops through conversation turns
Cost optimization via tiered delegation (orchestrator ≠ worker model)
Crash-resilient batch processing (checkpointed, idempotent)

Don't use for:

Quick calls under 10s (overhead not worth it — just loop)
Tasks requiring inter-worker coordination (use delegate_task)
Real-time interactive sessions (use tmux-agent-orchestrator)

Quick Start

# Install
hermes skills tap add amanning3390/deepswarm

# Define your task (task.yaml)
# Generate seeds
python3 scripts/seed.py --task task.yaml

# Launch — auto-optimizes workers, stagger, model routing
python3 scripts/swarm.py --task task.yaml --total 1000

# Filter — repair JSON, validate structure, apply length thresholds
python3 scripts/filter.py --input-dir output/ --output clean.jsonl --errors errors.jsonl

Task Definition (task.yaml)

# What to do
task_type: generation              # generation | translation | summarization | custom
prompt_template: |
  You are an AI assistant. {{seed}}

# Model routing (tiered delegation)
orchestrator_model: deepseek-v4-pro  # Plans, monitors, handles errors
worker_model: deepseek-v4-flash      # Executes batches (cheaper!)
worker_api_base: https://api.deepseek.com/v1/chat/completions
worker_max_tokens: 4096

# Execution control
multi_turn: true                    # Workers loop through conversation turns
max_turns: 20                       # Max turns per worker conversation
seeds_file: seeds.jsonl             # Pre-generated task seeds

# Worker optimization (auto-calculated if omitted)
workers: auto                       # auto | N
stagger: auto                       # auto | seconds
batch_size: auto                    # auto | tasks per worker

# Output
output_dir: output/
output_format: jsonl               # jsonl | json | parquet
checkpoint_every: 10                # Save progress every N tasks

# Optional: custom worker logic
worker_script: custom_worker.py     # Override default worker behavior

Tiered Model Delegation

Orchestrator (V4 Pro) and workers (V4 Flash) can use different models:

User Task → V4 Pro (plans, monitors)
              ├─ V4 Flash Worker 0 → API → output/
              ├─ V4 Flash Worker 1 → API → output/
              ├─ V4 Flash Worker 2 → API → output/
              └─ ...

Why tiered delegation matters:

V4 Pro costs ~3× V4 Flash per token
Orchestrator only plans + monitors (few calls)
Workers make thousands of calls — use the cheapest model that works
Typical savings: 60-70% vs using V4 Pro for everything

When to use same model for both:

Task quality requires frontier reasoning at every step
Worker model doesn't support the required format
Budget allows it and quality is paramount

Auto-Optimization

When workers: auto and stagger: auto:

DeepSwarm runs a single calibration call to measure call duration
Calculates optimal workers: min(8, floor(rate_limit / call_duration))
Sets stagger: call_duration / workers × 2
Adjusts batch_size: total / workers

Calibration table (pre-computed):

Call Duration	Workers	Stagger	Success	Throughput
<10s	16	1s	99.9%	~5,760/hr
10-30s	12	2s	99.9%	~1,440/hr
30-60s	8	5s	99.95%	~440/hr
60-90s	6	10s	99.9%	~240/hr
>90s	4	15s	99.9%	~96/hr

Multi-Turn Task Support

For tasks requiring conversation loops (generation, debugging, interactive work):

Worker loop:
  for each seed:
    messages = [system_prompt, user_task]
    for turn in range(max_turns):
      response = api_call(messages, model=worker_model)
      messages.append({"role": "assistant", "content": response})
      if task_complete(response):
        break
      if needs_tool_call(response):
        messages.append(simulate_tool_response(response))

Each turn is an independent API call. Multi-turn tasks benefit most from parallelization because per-task latency is high.

Task-Agnostic Worker Design

The worker (worker.py) accepts a YAML task definition and executes any pipeline:

def run_task(seed, config):
    messages = build_messages(seed, config)
    for turn in range(config["max_turns"]):
        response = call_api(messages, config)
        if is_complete(response, config):
            return finish(response, messages)
        if needs_continuation(response, config):
            messages = append_turn(messages, response, config)
    return messages

Built-in task types:

generation — Generate content from seed (the trace generation pattern)
translation — Translate each seed text
summarization — Summarize each seed document
classification — Classify each seed input
custom — Uses worker_script for completely custom logic

Common Pitfalls

workers: auto choosing too many. If calibration call was fast but actual calls are slow, override manually.
Forgetting to stagger between workers. Even 2 workers at same millisecond can trigger rate limits on slow APIs.
Mixing models without checking format compatibility. Worker model must support the same prompt format as orchestrator.
Not checkpointing. Worker dies at 120/125 = lost work. Checkpoint every 10.
Shell & without wait. Without wait, shell exits early and kills child workers.
Using V4 Pro for workers when V4 Flash would work. Check quality on 10-sample test before committing to expensive model.
Not deleting error outputs before restarting. Error files consume indices and inflate disk.

Verification Checklist

Task YAML has valid model names and API base URL
API key exported for both orchestrator and worker models
workers: auto or manual count ≤ 8 per batch
stagger: auto or manual ≥ call_duration / workers × 2
Worker model tested on 5-sample run before full batch
Output directory exists and is writable
Seeds file exists with correct format
Checkpointing enabled for runs >100 tasks
Orchestrator model is V4 Pro (or equivalent frontier) for planning
Worker model is V4 Flash (or cheapest model that handles the task)

deepswarm

Resources

Install

DeepSwarm — Task-Agnostic Parallel Worker Orchestration

Overview

When to Use

Quick Start

Task Definition (task.yaml)

Tiered Model Delegation

Auto-Optimization

Multi-Turn Task Support

Task-Agnostic Worker Design

Common Pitfalls

Verification Checklist

Categories

Install

Recommended Skills