SKILL.md — Perplexity-Tools Model Selection Skill

- Integrated with ultrathink-system and ECC-tools

diazMelgarejo 2 Updated 4mo ago

SKILL.md

SKILL.md — Perplexity-Tools Model Selection Skill

Version: v0.9.7.0 (standardized from v0.9.0.0 onward) · Updated: 2026-03-28
Repo: https://github.com/diazMelgarejo/Perplexity-Tools · Branch: main

Layering (all interoperable and independently configurable):

Layer	Repo	Role
Orchestrator & instance manager	Perplexity-Tools (this repo)	Top-level agent lifecycle, `ModelRegistry` / `config/*.yml`, FastAPI `/orchestrate`, idempotency
Reasoning & routing methodology	ultrathink-system	`single_agent/SKILL.md`, AFRP (pre-router gate) / CIDF / process; multi-agent registry is separately installable and not required to run this orchestrator
Subagent auto-selection (ECC-style)	ECC Tools	Default subagent routing unless the top-level orchestrator overrides roles
Karpathy AutoResearch sync	karpathy/autoresearch	Idempotent sync of the automated ML research loop; integrated via `/autoresearch/*` endpoints and `orchestrator/autoresearch_bridge.py`

Selection order: Top-level model routing follows this SKILL.md → orchestrator/model_registry.py + config/models.yml / routing.yml first. Subagents use ECC-tools defaults unless overridden. ultrathink-system remains the methodology layer for reasoning execution, not a hard dependency of the YAML registry.

State Ownership & Redis Strategy

Canonical MVP wording: For MVP/v1.0, ultrathink remains stateless and has no Redis requirement. PT is the sole orchestration layer and owns agent instantiation, tracking, queueing, budget enforcement, and file-based runtime state. Redis-backed coordination is a future PT-only enhancement planned for multi-instance distributed deployments in v1.1 and above.

Rules:

Single PT instance or LAN MVP per machine: file-based state only (.state/agents.json, .state/budget.json)
No Redis mentions in ultrathink install/runtime requirements
Any future queue/cache/distributed lock support belongs to PT
Redis only activates when PT supports multi-instance distributed operation (v1.1+), not before

Multi-Computer Orchestration (Hardware-Aware)

This orchestrator is designed for full hardware profile awareness [web:40] across a distributed LAN environment. It adapts standard multi-agent orchestration strategies [web:23][web:25] (sequential, concurrent, routing) to physical hardware constraints.

LAN Resume & Distributed Discovery

Automatic Resume (LAN Detect): On startup, the system scans the LAN for existing instances (Redis: agent:registry:* or local .state/agents.json).
Session Continuity: Resume from the last known state by re-attaching to running agent processes or resuming from the Short Persistence Log (.state/session.log).
Discovery Strategy: Attempt to connect to REDIS_HOST. If unreachable, fallback to the local state file for standalone operations.

Spawn Reconciliation (Pre-Model Spawning)

Centralized Registry Check: Before spawning any agent, the orchestrator MUST check the AgentTracker (global registry) for an existing agent with the same role and task_hash.
Proper Session Planning: Reconcile spawns before model assignment to prevent dual-task GPU contention or redundant model loading.
Model Assignment: Once a spawn is reconciled, assign the model based on the hardware profile's VRAM/RAM ceiling (see hardware/SKILL.md).

Profile Deployment Logic

Profile ID	Architecture	Core Specialization	Primary Use Case
`mac-studio`	Apple Silicon	Low-latency, Large RAM	Orchestration, Synthesis, Multi-step reasoning
`win-rtx3080`	x86-64 / CUDA	Parallel GPU compute	Heavy coding, Critic passes, ML experiments

Hardware Profiles (Summary)

Refer to hardware/SKILL.md for full specs.

Profile A — mac-studio (16GB+ Unified Memory)

Primary: qwen3.5-9b-mlx-4bit (MLX)
Roles: Orchestrator, Manager, General, Synthesis.
VRAM: N/A (Unified).

Profile B — win-rtx3080 (10GB VRAM)

Primary: qwen3.5-35b-a3b-q4 (Ollama)
Roles: Coding, Autoresearch, Heavy Reasoning, Critic.
Constraints: 10GB hard ceiling; num_ctx <= 8192.

Cloud Routing Rules (< $5/month budget)

Priority 1 — Orchestration (Claude Sonnet 4.5 Thinking via Perplexity)

CLOUD_ORCHESTRATION = {
    "provider": "perplexity",
    "model": "anthropic/claude-sonnet-4.5-thinking",
    "trigger_conditions": [
        "strategic_decision == True",
        "reasoning_steps > 200",
        "multi_repo_coordination == True"
    ],
    "max_calls_per_day": 3,
    "max_tokens_per_call": 500, # Keep prompts SHORT
    "estimated_cost_per_call": 0.05,
    "fallback": "qwen3-30b-critic" # Dell local
}

Priority 2 — Finance & Real-Time Research (Grok 4.1 Thinking via Perplexity)

CLOUD_RESEARCH = {
    "provider": "perplexity",
    "model": "xai/grok-4.1-thinking",
    "trigger_conditions": [
        "requires_recent_info == True",
        "is_finance_realtime == True",
        "query_date_range < 7_days"
    ],
    "max_calls_per_day": 2,
    "max_tokens_per_call": 1500,
    "estimated_cost_per_call": 0.03,
    "fallback": "qwen3-30b-critic" # Dell local, no real-time data
}

Budget Guard

BUDGET_GUARD = {
    "max_daily_spend_usd": 0.17, # ~$5/month / 30 days
    "max_daily_calls": 5,
    "redis_tracking": True,
    "hard_cutoff": True, # NEVER exceed, always fallback
    "fallback_on_exceed": "qwen3-30b-critic"
}

Local Model Routing Decision Tree

Task Received
│
├─ Privacy Critical?
│  └─ YES → ALWAYS local, skip cloud
│     ├─ Code task → win-rtx3080: qwen3.5-35b-a3b-q4
│     └─ Standard → mac-studio: qwen3.5-9b-mlx-4bit
│
├─ Budget exhausted OR Internet offline?
│  └─ YES → win-rtx3080: qwen3-30b-critic (FALLBACK)
│
├─ Real-time data needed (< 7 days)?
│  └─ YES + Budget OK → Perplexity: grok-4.1-thinking
│
├─ Strategic reasoning (> 200 steps)?
│  └─ YES + Budget OK → Perplexity: claude-sonnet-4.5-thinking
│
├─ Heavy code generation (> 500 lines)?
│  ├─ > 2000 lines → win-rtx3080: qwen3-30b-critic
│  └─ 500-2000 lines → win-rtx3080: qwen3.5-35b-a3b-q4
│
├─ Quick/interactive task?
│  └─ mac-studio: qwen3-8b-instruct (fastest)
│
└─ Default → mac-studio: qwen3.5-9b-mlx-4bit

Critic & Refinement Pass (Qwen3-30B)

The qwen3:30b-a3b-instruct-q4_K_M model on Dell serves as:

Default Offline Fallback — Replaces any cloud model when unreachable
Local Critic — Reviews batch agent outputs for quality (score 1-10)
Refiner — Improves sub-agent outputs before synthesis
Orchestration Fallback — Decomposes tasks when Claude is unavailable

CRITIC_CONFIG = {
    "model": "qwen3:30b-a3b-instruct-q4_K_M",
    "endpoint": "http://192.168.1.100:11434",
    "temperature": 0.6,
    "max_tokens": 8192,
    "critic_prompt_template": """
    Review these results for quality, accuracy, completeness:
    {results}
    Provide:
    1. Quality score (1-10)
    2. Issues found
    3. Recommended improvements
    4. Verdict: APPROVE or NEEDS_REVISION
    """,
    "trigger": "always_after_batch_if_subtasks > 1"
}

Runtime Modes

Mode 1: Mac Only (Standalone)

mode: mac_only
active_models:
  - qwen3.5-9b-mlx-4bit # primary
  - qwen3-8b-instruct # fallback
cloud_enabled: true # via Perplexity API
critic_pass: false # No Dell available
note: Reduced capability, no critic pass

Mode 2: Dell Only (Standalone)

mode: dell_only
active_models:
  - qwen3.5-35b-a3b-q4 # primary coding
  - qwen3-30b-critic # critic + fallback
cloud_enabled: true
critic_pass: true

Mode 3: Mac + Dell LAN (Full Orchestration — RECOMMENDED)

mode: lan_full
mac_endpoint: http://192.168.1.101:11434
dell_endpoint: http://192.168.1.100:11434
redis_broker: http://192.168.1.100:6379
cloud_enabled: true
critic_pass: true
fallback_chain:
  - cloud_perplexity
  - dell_qwen3_30b
  - dell_qwen3_35b
  - mac_qwen3_9b

Idempotent Orchestrator Rules

This repo (Perplexity-Tools) is the top-level orchestrator and instance manager:

Check before creating: Consult .state/agents.json via AgentTracker.
Reuse existing: Return conflict if matching running agent exists (override with force=true).
Conflict resolution: Ask user before overriding idempotency.
Destroy on completion: Mark agents stopped when tasks complete.

Changelog

v0.9.7.0 (2026-03-28)

AFRP cross-reference: ultrathink-system layer now documents AFRP (pre-router gate) in 4-layer architecture table [SYNC]
Fixes: orchestrator.py syntax errors, confidential folder references removed, FastAPI version aligned
Sync: Both repos synchronized to v0.9.7.0 [SYNC]

v0.9.6.0 (2026-03-27)

LAN Continuity: Implemented LAN Detect & Resume for seamless multi-computer operation.
Orchestration: Full hardware profile awareness implemented [web:40].
Pre-Flight Reconciliation: Added spawn detection and reconciliation before model spawning for efficiency.
Logging: Added Short Persistence Log (.state/session.log) for low-overhead session tracking.
Strategies: Adapted durable workflow and intelligent routing for multi-computer LAN.
Models: Updated primaries to Qwen 3.5 series (9B MLX on Mac, 35B MoE on Dell).
Hardening: Reinforced VRAM safety rules and hardware-bound routing.

v0.9.1.0 (2026-03-22)

Added SKILL.md with complete model selection logic
Added Qwen3-30B-A3B as critic, refiner, and offline fallback
Added Perplexity API integration (Claude Sonnet 4.5 + Grok 4.1)
Added 4 runtime modes (Mac-only, Dell-only, LAN-full, LM-Studio-MLX)
Integrated with ultrathink-system and ECC-tools

SKILL.md — Perplexity-Tools Model Selection Skill

Resources

Install

SKILL.md — Perplexity-Tools Model Selection Skill

State Ownership & Redis Strategy

Multi-Computer Orchestration (Hardware-Aware)

LAN Resume & Distributed Discovery

Spawn Reconciliation (Pre-Model Spawning)

Profile Deployment Logic

Hardware Profiles (Summary)

Profile A — mac-studio (16GB+ Unified Memory)

Profile B — win-rtx3080 (10GB VRAM)

Cloud Routing Rules (< $5/month budget)

Priority 1 — Orchestration (Claude Sonnet 4.5 Thinking via Perplexity)

Priority 2 — Finance & Real-Time Research (Grok 4.1 Thinking via Perplexity)

Budget Guard

Local Model Routing Decision Tree

Critic & Refinement Pass (Qwen3-30B)

Runtime Modes

Mode 1: Mac Only (Standalone)

Mode 2: Dell Only (Standalone)

Mode 3: Mac + Dell LAN (Full Orchestration — RECOMMENDED)

Idempotent Orchestrator Rules

Changelog

v0.9.7.0 (2026-03-28)

v0.9.6.0 (2026-03-27)

v0.9.1.0 (2026-03-22)

Categories

Install

SKILL.md — Perplexity-Tools Model Selection Skill

Resources

Install

SKILL.md — Perplexity-Tools Model Selection Skill

State Ownership & Redis Strategy

Multi-Computer Orchestration (Hardware-Aware)

LAN Resume & Distributed Discovery

Spawn Reconciliation (Pre-Model Spawning)

Profile Deployment Logic

Hardware Profiles (Summary)

Profile A — mac-studio (16GB+ Unified Memory)

Profile B — win-rtx3080 (10GB VRAM)

Cloud Routing Rules (< $5/month budget)

Priority 1 — Orchestration (Claude Sonnet 4.5 Thinking via Perplexity)

Priority 2 — Finance & Real-Time Research (Grok 4.1 Thinking via Perplexity)

Budget Guard

Local Model Routing Decision Tree

Critic & Refinement Pass (Qwen3-30B)

Runtime Modes

Mode 1: Mac Only (Standalone)

Mode 2: Dell Only (Standalone)

Mode 3: Mac + Dell LAN (Full Orchestration — RECOMMENDED)

Idempotent Orchestrator Rules

Changelog

v0.9.7.0 (2026-03-28)

v0.9.6.0 (2026-03-27)

v0.9.1.0 (2026-03-22)

Categories

Install

Recommended Skills