Use when running parallel AI workers for any long-running or multi-turn batch API task. Auto-calculates optimal workers + stagger. Supports tiered delegation (V4 Pro orchestrator → V4 Flash workers). 99.95% API success rate at scale.
Resources
7Install
npx skillscat add amanning3390/deepswarm Install via the SkillsCat registry.
DeepSwarm — Task-Agnostic Parallel Worker Orchestration
Spawn N parallel API workers for any long-running or multi-turn batch task. Auto-calculates optimal worker count and stagger delay. Supports tiered model delegation: orchestrator plans with a frontier model (V4 Pro), workers execute with a cheaper model (V4 Flash).
Overview
DeepSwarm 2.0 generalizes the proven orchestration pattern from the 19,331-trace generation project to any batch API task. You define a task — translations, reasoning traces, code reviews, summarization — and DeepSwarm parallelizes it across optimal workers with the right stagger for your API.
The core insight: API rate limits are a function of simultaneous connections, not total volume. Auto-calculated stagger + worker count = 99.95% success.
When to Use
- Any batch API task: generation, translation, summarization, extraction, classification
- Long-running individual calls (30s+) that benefit from parallelization
- Multi-turn tasks where each worker loops through conversation turns
- Cost optimization via tiered delegation (orchestrator ≠ worker model)
- Crash-resilient batch processing (checkpointed, idempotent)
Don't use for:
- Quick calls under 10s (overhead not worth it — just loop)
- Tasks requiring inter-worker coordination (use
delegate_task) - Real-time interactive sessions (use tmux-agent-orchestrator)
Quick Start
# Install
hermes skills tap add amanning3390/deepswarm
# Define your task (task.yaml)
# Generate seeds
python3 scripts/seed.py --task task.yaml
# Launch — auto-optimizes workers, stagger, model routing
python3 scripts/swarm.py --task task.yaml --total 1000
# Filter — repair JSON, validate structure, apply length thresholds
python3 scripts/filter.py --input-dir output/ --output clean.jsonl --errors errors.jsonlTask Definition (task.yaml)
# What to do
task_type: generation # generation | translation | summarization | custom
prompt_template: |
You are an AI assistant. {{seed}}
# Model routing (tiered delegation)
orchestrator_model: deepseek-v4-pro # Plans, monitors, handles errors
worker_model: deepseek-v4-flash # Executes batches (cheaper!)
worker_api_base: https://api.deepseek.com/v1/chat/completions
worker_max_tokens: 4096
# Execution control
multi_turn: true # Workers loop through conversation turns
max_turns: 20 # Max turns per worker conversation
seeds_file: seeds.jsonl # Pre-generated task seeds
# Worker optimization (auto-calculated if omitted)
workers: auto # auto | N
stagger: auto # auto | seconds
batch_size: auto # auto | tasks per worker
# Output
output_dir: output/
output_format: jsonl # jsonl | json | parquet
checkpoint_every: 10 # Save progress every N tasks
# Optional: custom worker logic
worker_script: custom_worker.py # Override default worker behaviorTiered Model Delegation
Orchestrator (V4 Pro) and workers (V4 Flash) can use different models:
User Task → V4 Pro (plans, monitors)
├─ V4 Flash Worker 0 → API → output/
├─ V4 Flash Worker 1 → API → output/
├─ V4 Flash Worker 2 → API → output/
└─ ...Why tiered delegation matters:
- V4 Pro costs ~3× V4 Flash per token
- Orchestrator only plans + monitors (few calls)
- Workers make thousands of calls — use the cheapest model that works
- Typical savings: 60-70% vs using V4 Pro for everything
When to use same model for both:
- Task quality requires frontier reasoning at every step
- Worker model doesn't support the required format
- Budget allows it and quality is paramount
Auto-Optimization
When workers: auto and stagger: auto:
- DeepSwarm runs a single calibration call to measure call duration
- Calculates optimal workers:
min(8, floor(rate_limit / call_duration)) - Sets stagger:
call_duration / workers × 2 - Adjusts batch_size:
total / workers
Calibration table (pre-computed):
| Call Duration | Workers | Stagger | Success | Throughput |
|---|---|---|---|---|
| <10s | 16 | 1s | 99.9% | ~5,760/hr |
| 10-30s | 12 | 2s | 99.9% | ~1,440/hr |
| 30-60s | 8 | 5s | 99.95% | ~440/hr |
| 60-90s | 6 | 10s | 99.9% | ~240/hr |
| >90s | 4 | 15s | 99.9% | ~96/hr |
Multi-Turn Task Support
For tasks requiring conversation loops (generation, debugging, interactive work):
Worker loop:
for each seed:
messages = [system_prompt, user_task]
for turn in range(max_turns):
response = api_call(messages, model=worker_model)
messages.append({"role": "assistant", "content": response})
if task_complete(response):
break
if needs_tool_call(response):
messages.append(simulate_tool_response(response))Each turn is an independent API call. Multi-turn tasks benefit most from parallelization because per-task latency is high.
Task-Agnostic Worker Design
The worker (worker.py) accepts a YAML task definition and executes any pipeline:
def run_task(seed, config):
messages = build_messages(seed, config)
for turn in range(config["max_turns"]):
response = call_api(messages, config)
if is_complete(response, config):
return finish(response, messages)
if needs_continuation(response, config):
messages = append_turn(messages, response, config)
return messagesBuilt-in task types:
generation— Generate content from seed (the trace generation pattern)translation— Translate each seed textsummarization— Summarize each seed documentclassification— Classify each seed inputcustom— Usesworker_scriptfor completely custom logic
Common Pitfalls
workers: autochoosing too many. If calibration call was fast but actual calls are slow, override manually.- Forgetting to stagger between workers. Even 2 workers at same millisecond can trigger rate limits on slow APIs.
- Mixing models without checking format compatibility. Worker model must support the same prompt format as orchestrator.
- Not checkpointing. Worker dies at 120/125 = lost work. Checkpoint every 10.
- Shell
&withoutwait. Withoutwait, shell exits early and kills child workers. - Using V4 Pro for workers when V4 Flash would work. Check quality on 10-sample test before committing to expensive model.
- Not deleting error outputs before restarting. Error files consume indices and inflate disk.
Verification Checklist
- Task YAML has valid model names and API base URL
- API key exported for both orchestrator and worker models
-
workers: autoor manual count ≤ 8 per batch -
stagger: autoor manual ≥ call_duration / workers × 2 - Worker model tested on 5-sample run before full batch
- Output directory exists and is writable
- Seeds file exists with correct format
- Checkpointing enabled for runs >100 tasks
- Orchestrator model is V4 Pro (or equivalent frontier) for planning
- Worker model is V4 Flash (or cheapest model that handles the task)