Optimize any HuggingFace model to minimize energy consumption, CO₂ emissions, and inference cost using real GPU measurements and Mistral Large reasoning. Use this skill when asked to quantize a model, reduce its carbon footprint, benchmark deployment configurations (datacenter GPU, Jetson Orin, edge devices), or generate an optimization report with W&B experiment tracking.
Install
npx skillscat add wilfred-dore/ecostral/ecostral-optimizer Install via the SkillsCat registry.
Ecostral Optimizer Skill
Ecostral is an AI agent powered by Mistral Large that autonomously finds the most
energy-efficient deployment configuration for any HuggingFace model. It runs a
closed optimization loop: propose → measure → log to W&B → repeat.
When to activate this skill
- User asks to "optimize", "quantize", or "compress" an LLM
- User wants to reduce CO₂, energy, or VRAM usage of a model
- User wants to deploy a model on edge hardware (Jetson Orin, embedded ARM, etc.)
- User wants W&B experiment tracking for model optimization runs
- User asks "which precision is best for model X on hardware Y?"
Prerequisites
# 1. Install dependencies
pip install -e /path/to/Ecostral
# or
uv pip install -r /path/to/Ecostral/requirements.txt
# 2. Ensure secrets.json exists with API keys
cp secrets.json.example secrets.json
# Required keys: mistral_api_key, wandb_api_key, huggingface_tokenStep 1 — Create an input config
Create a JSON config file in examples/input/. Use an existing one as template:
cat examples/input/mistral_7b_demo.jsonMinimal config for a new model:
{
"model_id": "mistralai/Mistral-7B-Instruct-v0.3",
"param_count": "7B",
"architecture": "transformer",
"context_length": 32768,
"deployment_target": "datacenter_gpu",
"max_iterations": 5,
"patience": 3,
"accuracy_threshold": 0.9,
"n_mmlu_samples": 100,
"n_latency_runs": 20,
"n_energy_runs": 10
}Deployment targets: datacenter_gpu | laptop_gpu | jetson_orin | cpu_only | smartphone | embedded_arm
Fast test config (3 min): set max_iterations: 3, n_mmlu_samples: 25, n_latency_runs: 10, n_energy_runs: 5
Step 2 — Run the optimization
cd /path/to/Ecostral/examples
# Full run (~30 min for 7B, ~5 min for 1.1B)
python run_demo.py --input input/your_config.json
# Override W&B project name
python run_demo.py --input input/your_config.json --wandb-project my-custom-project
# Regenerate report from existing results (no GPU needed)
python run_demo.py --from-results output/model_slug/results.jsonOutput is written to examples/output/<model-slug>/:
results.json ← machine-readable metrics (accuracy, latency, energy, CO₂)
report.md ← hybrid Markdown report (Python tables + Mistral prose + charts)
paper.pdf ← 2-column LaTeX research paper
chart_comparison.png ← baseline vs optimized bar chart
chart_trajectory.png ← CO₂ and latency across iterations
chart_annual_impact.pngStep 3 — Read the results
import json
with open("examples/output/mistral_7b_instruct_v0_3/results.json") as f:
r = json.load(f)
print(r["best_config"]["precision"]) # e.g. "bf16"
print(r["best_config"]["technique"]) # e.g. "tensorrt"
print(r["gains"]["energy_pct"]) # e.g. -36.0 (% reduction)
print(r["gains"]["co2_pct"]) # e.g. -36.0
print(r["gains"]["memory_pct"]) # e.g. -50.0
print(r["gains"]["accuracy_retained_pct"]) # e.g. 100.0
print(r["annual_co2_saved_kg"]) # e.g. 4872.0Step 4 — Interpret key metrics
| Metric | What it means |
|---|---|
accuracy_retained_pct |
How much MMLU accuracy is kept vs FP32 baseline. 100% = no degradation. |
energy_pct |
% energy saved per inference vs FP32 baseline. Negative = saving. |
memory_pct |
VRAM reduction. Critical for edge deployment. |
annual_co2_saved_kg |
CO₂ saved at 1M inferences/day for 1 year. |
Edge deployment note
For deployment_target: jetson_orin (or other edge targets), measurements are
proxy measurements on H100. Accuracy and VRAM savings transfer directly to the
device. Latency/energy on H100 do NOT reflect real device performance — read VRAM
reduction as the primary result.
W&B experiment tracking
Each run logs to a dedicated W&B project:
- Project:
optimization-<model-slug>(e.g.optimization-mistral-7b-instruct-v0-3) - Group:
run-<date>(e.g.run-2026-03-01) - Metrics: accuracy, latency_ms, energy_kwh, co2_kg, total_cost_eur, memory_usage_gb
To query past runs programmatically, use the wandb-experiment-memory skill.
Combine with other skills
After running Ecostral, chain with:
hugging-face-evaluation(HF official skill) → push accuracy results to HF model cardwandb-experiment-memory(this repo) → query past runs, suggest next iterationhugging-face-model-trainer(HF official skill) → fine-tune based on Ecostral's recommended config