"$0 test-time scaling with NVIDIA NIM free tier. Smart cascade routes questions to the best free model based on measured capability profiles, escalating only on uncertainty. 15 models, 7 capability categories, data-driven panels. Use for auditing, code review, fact-checking, compliance, or any judgment task."
Install
npx skillscat add isotrivial/free-scaling Install via the SkillsCat registry.
NIM Ensemble
$0 multi-model reasoning using NVIDIA NIM free tier. Two modes:
smart_vote()— cascade: routes to the best expert for the task type, escalates only on uncertainty. Average 1.2 API calls per question.vote()— flat ensemble: asks N models, majority vote. Simple but uses more calls.
Setup
- Go to build.nvidia.com and sign in (free NVIDIA account)
- Pick any model (e.g. Llama 3.3 70B) and click "Get API Key"
- One key works for all NIM models — no per-model setup needed
- Set it in your environment:
export NVIDIA_API_KEY="nvapi-..." - No pip dependencies — stdlib only (Python 3.10+)
Quick Start
from nim_ensemble import scale
# k=3: ask 3 diverse models, majority vote
result = scale("Is eval(input()) safe?", k=3, answer_patterns=["SAFE", "VULNERABLE"])
print(result.answer) # VULNERABLE
print(result.calls_made) # 3
print(result.confidence) # 1.0
# k="auto": smart cascade (starts with 1, escalates on uncertainty)
result = scale("Is this correct?", k="auto", answer_patterns=["YES", "NO"])CLI
# Scale to k models
python3 -m nim_ensemble.cli scale "Is this code vulnerable?" -k 3 --answers "SAFE,VULNERABLE"
# → VULNERABLE (k=3, conf=100%, 3 calls, 1.2s)
# Single model (fastest)
python3 -m nim_ensemble.cli scale "Is 91 prime?" -k 1 --answers "YES,NO"
# Auto-scale (smart cascade)
python3 -m nim_ensemble.cli scale "Is this safe?" -k auto
# List models and panels
python3 -m nim_ensemble.cli models
python3 -m nim_ensemble.cli panels
# Benchmark all models on a question
python3 -m nim_ensemble.cli bench "Is 91 prime? YES or NO." --speed fastHow Smart Cascade Works
Question → classify task type (code/compliance/reasoning/factual/nuance)
→ call best expert for that type (1 call)
→ confident? (weight ≥ 85%) → done
→ uncertain? → call arbiter (mistral-large, 100% all categories)
→ still split? → full panel, weighted vote by measured accuracyMost questions resolve at stage 1. Hallucinating models never get called because the capability map routes around their blind spots.
Capability Profiling
No hardcoded capability scores — profile models on your tasks:
# Profile specific models (3 trials each)
python3 -m nim_ensemble.capability_map --models llama-3.3 qwen-80b mistral-large --trials 3
# Profile all fast models
python3 -m nim_ensemble.capability_map --speed fast --trials 2This generates capability_map.json with per-model accuracy, latency, strengths/weaknesses, and error correlations. The cascade automatically loads it for data-driven routing.
Without profiling, the cascade uses sensible defaults (mistral-large as arbiter, diverse 3-model panels). Profiling lets it route around your models' specific blind spots.
Default Panels
Panels maximize architectural diversity (independent errors cancel out):
| Panel | Models | Use Case |
|---|---|---|
general |
mistral-large, llama-3.3, qwen-80b | Default (3 families) |
fast |
llama-3.3, mistral-nemotron, gemma-27b | All <1.5s |
max |
5 models across 5 families | High-stakes |
arbiter |
mistral-large | Single tiebreaker |
For task-specific panels, run capability_map to profile models on your data.
Python API
from nim_ensemble import scale, smart_vote, vote, call_model
# scale() — the core API, control k
result = scale("Is X safe?", k=3, answer_patterns=["SAFE", "VULNERABLE"])
result = scale("Is X safe?", k="auto") # smart cascade
result = scale("Is X safe?", k=1) # single model
# smart_vote() — cascade with task-type routing
result = smart_vote("Is X correct?", answer_patterns=["YES", "NO"])
# vote() — flat ensemble with named panels
result = vote("Is X true?", panel="general", answer_patterns=["YES", "NO"])
# call_model() — single model, raw access
from nim_ensemble import call_model
answer, raw = call_model("Is X true?", "mistral-large")Prompt Tips
For best results with ensemble voting:
- Ask for the answer on the first line: "Answer YES or NO on the first line, then explain."
- Give explicit answer options: "Answer SAFE, UNSAFE, or NEEDS_REVIEW."
- Include context/evidence in the question, not just the judgment call.
Architecture
nim_ensemble/
├── __init__.py # Public API: smart_vote, vote, call_model
├── cascade.py # Smart cascade with capability routing
├── voter.py # Flat ensemble voting engine
├── models.py # Model registry + data-driven panels
├── parser.py # Answer extraction (thinking models, word boundaries)
├── cli.py # CLI interface
├── benchmark.py # Single-trial model profiling
└── capability_map.py # Multi-trial profiling with error correlationRequirements
NVIDIA_API_KEYenvironment variable (free at build.nvidia.com)- Python 3.10+
- No pip dependencies (stdlib only, uses
urllibfor API calls)