Resources
22Install
npx skillscat add diazmelgarejo/perplexity-tools Install via the SkillsCat registry.
SKILL.md — Perplexity-Tools Model Selection Skill
Version: v0.9.7.0 (standardized from v0.9.0.0 onward) · Updated: 2026-03-28
Repo: https://github.com/diazMelgarejo/Perplexity-Tools · Branch: main
Layering (all interoperable and independently configurable):
| Layer | Repo | Role |
|---|---|---|
| Orchestrator & instance manager | Perplexity-Tools (this repo) | Top-level agent lifecycle, ModelRegistry / config/*.yml, FastAPI /orchestrate, idempotency |
| Reasoning & routing methodology | ultrathink-system | single_agent/SKILL.md, AFRP (pre-router gate) / CIDF / process; multi-agent registry is separately installable and not required to run this orchestrator |
| Subagent auto-selection (ECC-style) | ECC Tools | Default subagent routing unless the top-level orchestrator overrides roles |
| Karpathy AutoResearch sync | karpathy/autoresearch | Idempotent sync of the automated ML research loop; integrated via /autoresearch/* endpoints and orchestrator/autoresearch_bridge.py |
Selection order: Top-level model routing follows this SKILL.md → orchestrator/model_registry.py + config/models.yml / routing.yml first. Subagents use ECC-tools defaults unless overridden. ultrathink-system remains the methodology layer for reasoning execution, not a hard dependency of the YAML registry.
State Ownership & Redis Strategy
Canonical MVP wording: For MVP/v1.0, ultrathink remains stateless and has no Redis requirement. PT is the sole orchestration layer and owns agent instantiation, tracking, queueing, budget enforcement, and file-based runtime state. Redis-backed coordination is a future PT-only enhancement planned for multi-instance distributed deployments in v1.1 and above.
Rules:
- Single PT instance or LAN MVP per machine: file-based state only (
.state/agents.json,.state/budget.json) - No Redis mentions in ultrathink install/runtime requirements
- Any future queue/cache/distributed lock support belongs to PT
- Redis only activates when PT supports multi-instance distributed operation (v1.1+), not before
Multi-Computer Orchestration (Hardware-Aware)
This orchestrator is designed for full hardware profile awareness [web:40] across a distributed LAN environment. It adapts standard multi-agent orchestration strategies [web:23][web:25] (sequential, concurrent, routing) to physical hardware constraints.
LAN Resume & Distributed Discovery
- Automatic Resume (LAN Detect): On startup, the system scans the LAN for existing instances (Redis:
agent:registry:*or local.state/agents.json). - Session Continuity: Resume from the last known state by re-attaching to running agent processes or resuming from the Short Persistence Log (
.state/session.log). - Discovery Strategy: Attempt to connect to
REDIS_HOST. If unreachable, fallback to the local state file for standalone operations.
Spawn Reconciliation (Pre-Model Spawning)
- Centralized Registry Check: Before spawning any agent, the orchestrator MUST check the
AgentTracker(global registry) for an existing agent with the sameroleandtask_hash. - Proper Session Planning: Reconcile spawns before model assignment to prevent dual-task GPU contention or redundant model loading.
- Model Assignment: Once a spawn is reconciled, assign the model based on the hardware profile's VRAM/RAM ceiling (see hardware/SKILL.md).
Profile Deployment Logic
| Profile ID | Architecture | Core Specialization | Primary Use Case |
|---|---|---|---|
mac-studio |
Apple Silicon | Low-latency, Large RAM | Orchestration, Synthesis, Multi-step reasoning |
win-rtx3080 |
x86-64 / CUDA | Parallel GPU compute | Heavy coding, Critic passes, ML experiments |
Hardware Profiles (Summary)
Refer to hardware/SKILL.md for full specs.
Profile A — mac-studio (16GB+ Unified Memory)
- Primary:
qwen3.5-9b-mlx-4bit(MLX) - Roles: Orchestrator, Manager, General, Synthesis.
- VRAM: N/A (Unified).
Profile B — win-rtx3080 (10GB VRAM)
- Primary:
qwen3.5-35b-a3b-q4(Ollama) - Roles: Coding, Autoresearch, Heavy Reasoning, Critic.
- Constraints: 10GB hard ceiling;
num_ctx <= 8192.
Cloud Routing Rules (< $5/month budget)
Priority 1 — Orchestration (Claude Sonnet 4.5 Thinking via Perplexity)
CLOUD_ORCHESTRATION = {
"provider": "perplexity",
"model": "anthropic/claude-sonnet-4.5-thinking",
"trigger_conditions": [
"strategic_decision == True",
"reasoning_steps > 200",
"multi_repo_coordination == True"
],
"max_calls_per_day": 3,
"max_tokens_per_call": 500, # Keep prompts SHORT
"estimated_cost_per_call": 0.05,
"fallback": "qwen3-30b-critic" # Dell local
}Priority 2 — Finance & Real-Time Research (Grok 4.1 Thinking via Perplexity)
CLOUD_RESEARCH = {
"provider": "perplexity",
"model": "xai/grok-4.1-thinking",
"trigger_conditions": [
"requires_recent_info == True",
"is_finance_realtime == True",
"query_date_range < 7_days"
],
"max_calls_per_day": 2,
"max_tokens_per_call": 1500,
"estimated_cost_per_call": 0.03,
"fallback": "qwen3-30b-critic" # Dell local, no real-time data
}Budget Guard
BUDGET_GUARD = {
"max_daily_spend_usd": 0.17, # ~$5/month / 30 days
"max_daily_calls": 5,
"redis_tracking": True,
"hard_cutoff": True, # NEVER exceed, always fallback
"fallback_on_exceed": "qwen3-30b-critic"
}Local Model Routing Decision Tree
Task Received
│
├─ Privacy Critical?
│ └─ YES → ALWAYS local, skip cloud
│ ├─ Code task → win-rtx3080: qwen3.5-35b-a3b-q4
│ └─ Standard → mac-studio: qwen3.5-9b-mlx-4bit
│
├─ Budget exhausted OR Internet offline?
│ └─ YES → win-rtx3080: qwen3-30b-critic (FALLBACK)
│
├─ Real-time data needed (< 7 days)?
│ └─ YES + Budget OK → Perplexity: grok-4.1-thinking
│
├─ Strategic reasoning (> 200 steps)?
│ └─ YES + Budget OK → Perplexity: claude-sonnet-4.5-thinking
│
├─ Heavy code generation (> 500 lines)?
│ ├─ > 2000 lines → win-rtx3080: qwen3-30b-critic
│ └─ 500-2000 lines → win-rtx3080: qwen3.5-35b-a3b-q4
│
├─ Quick/interactive task?
│ └─ mac-studio: qwen3-8b-instruct (fastest)
│
└─ Default → mac-studio: qwen3.5-9b-mlx-4bitCritic & Refinement Pass (Qwen3-30B)
The qwen3:30b-a3b-instruct-q4_K_M model on Dell serves as:
- Default Offline Fallback — Replaces any cloud model when unreachable
- Local Critic — Reviews batch agent outputs for quality (score 1-10)
- Refiner — Improves sub-agent outputs before synthesis
- Orchestration Fallback — Decomposes tasks when Claude is unavailable
CRITIC_CONFIG = {
"model": "qwen3:30b-a3b-instruct-q4_K_M",
"endpoint": "http://192.168.1.100:11434",
"temperature": 0.6,
"max_tokens": 8192,
"critic_prompt_template": """
Review these results for quality, accuracy, completeness:
{results}
Provide:
1. Quality score (1-10)
2. Issues found
3. Recommended improvements
4. Verdict: APPROVE or NEEDS_REVISION
""",
"trigger": "always_after_batch_if_subtasks > 1"
}Runtime Modes
Mode 1: Mac Only (Standalone)
mode: mac_only
active_models:
- qwen3.5-9b-mlx-4bit # primary
- qwen3-8b-instruct # fallback
cloud_enabled: true # via Perplexity API
critic_pass: false # No Dell available
note: Reduced capability, no critic passMode 2: Dell Only (Standalone)
mode: dell_only
active_models:
- qwen3.5-35b-a3b-q4 # primary coding
- qwen3-30b-critic # critic + fallback
cloud_enabled: true
critic_pass: trueMode 3: Mac + Dell LAN (Full Orchestration — RECOMMENDED)
mode: lan_full
mac_endpoint: http://192.168.1.101:11434
dell_endpoint: http://192.168.1.100:11434
redis_broker: http://192.168.1.100:6379
cloud_enabled: true
critic_pass: true
fallback_chain:
- cloud_perplexity
- dell_qwen3_30b
- dell_qwen3_35b
- mac_qwen3_9bIdempotent Orchestrator Rules
This repo (Perplexity-Tools) is the top-level orchestrator and instance manager:
- Check before creating: Consult
.state/agents.jsonviaAgentTracker. - Reuse existing: Return conflict if matching running agent exists (override with
force=true). - Conflict resolution: Ask user before overriding idempotency.
- Destroy on completion: Mark agents stopped when tasks complete.
Changelog
v0.9.7.0 (2026-03-28)
- AFRP cross-reference: ultrathink-system layer now documents AFRP (pre-router gate) in 4-layer architecture table [SYNC]
- Fixes: orchestrator.py syntax errors, confidential folder references removed, FastAPI version aligned
- Sync: Both repos synchronized to v0.9.7.0 [SYNC]
v0.9.6.0 (2026-03-27)
- LAN Continuity: Implemented LAN Detect & Resume for seamless multi-computer operation.
- Orchestration: Full hardware profile awareness implemented [web:40].
- Pre-Flight Reconciliation: Added spawn detection and reconciliation before model spawning for efficiency.
- Logging: Added Short Persistence Log (
.state/session.log) for low-overhead session tracking. - Strategies: Adapted durable workflow and intelligent routing for multi-computer LAN.
- Models: Updated primaries to Qwen 3.5 series (9B MLX on Mac, 35B MoE on Dell).
- Hardening: Reinforced VRAM safety rules and hardware-bound routing.
v0.9.1.0 (2026-03-22)
- Added SKILL.md with complete model selection logic
- Added Qwen3-30B-A3B as critic, refiner, and offline fallback
- Added Perplexity API integration (Claude Sonnet 4.5 + Grok 4.1)
- Added 4 runtime modes (Mac-only, Dell-only, LAN-full, LM-Studio-MLX)
- Integrated with ultrathink-system and ECC-tools