wilfred-dore

ecostral-optimizer

Optimize any HuggingFace model to minimize energy consumption, CO₂ emissions, and inference cost using real GPU measurements and Mistral Large reasoning. Use this skill when asked to quantize a model, reduce its carbon footprint, benchmark deployment configurations (datacenter GPU, Jetson Orin, edge devices), or generate an optimization report with W&B experiment tracking.

wilfred-dore 0 Updated 3mo ago
GitHub

Install

npx skillscat add wilfred-dore/ecostral/ecostral-optimizer

Install via the SkillsCat registry.

SKILL.md

Ecostral Optimizer Skill

Ecostral is an AI agent powered by Mistral Large that autonomously finds the most
energy-efficient deployment configuration for any HuggingFace model. It runs a
closed optimization loop: propose → measure → log to W&B → repeat.

When to activate this skill

  • User asks to "optimize", "quantize", or "compress" an LLM
  • User wants to reduce CO₂, energy, or VRAM usage of a model
  • User wants to deploy a model on edge hardware (Jetson Orin, embedded ARM, etc.)
  • User wants W&B experiment tracking for model optimization runs
  • User asks "which precision is best for model X on hardware Y?"

Prerequisites

# 1. Install dependencies
pip install -e /path/to/Ecostral
# or
uv pip install -r /path/to/Ecostral/requirements.txt

# 2. Ensure secrets.json exists with API keys
cp secrets.json.example secrets.json
# Required keys: mistral_api_key, wandb_api_key, huggingface_token

Step 1 — Create an input config

Create a JSON config file in examples/input/. Use an existing one as template:

cat examples/input/mistral_7b_demo.json

Minimal config for a new model:

{
  "model_id": "mistralai/Mistral-7B-Instruct-v0.3",
  "param_count": "7B",
  "architecture": "transformer",
  "context_length": 32768,
  "deployment_target": "datacenter_gpu",
  "max_iterations": 5,
  "patience": 3,
  "accuracy_threshold": 0.9,
  "n_mmlu_samples": 100,
  "n_latency_runs": 20,
  "n_energy_runs": 10
}

Deployment targets: datacenter_gpu | laptop_gpu | jetson_orin | cpu_only | smartphone | embedded_arm

Fast test config (3 min): set max_iterations: 3, n_mmlu_samples: 25, n_latency_runs: 10, n_energy_runs: 5

Step 2 — Run the optimization

cd /path/to/Ecostral/examples

# Full run (~30 min for 7B, ~5 min for 1.1B)
python run_demo.py --input input/your_config.json

# Override W&B project name
python run_demo.py --input input/your_config.json --wandb-project my-custom-project

# Regenerate report from existing results (no GPU needed)
python run_demo.py --from-results output/model_slug/results.json

Output is written to examples/output/<model-slug>/:

results.json          ← machine-readable metrics (accuracy, latency, energy, CO₂)
report.md             ← hybrid Markdown report (Python tables + Mistral prose + charts)
paper.pdf             ← 2-column LaTeX research paper
chart_comparison.png  ← baseline vs optimized bar chart
chart_trajectory.png  ← CO₂ and latency across iterations
chart_annual_impact.png

Step 3 — Read the results

import json
with open("examples/output/mistral_7b_instruct_v0_3/results.json") as f:
    r = json.load(f)

print(r["best_config"]["precision"])          # e.g. "bf16"
print(r["best_config"]["technique"])          # e.g. "tensorrt"
print(r["gains"]["energy_pct"])               # e.g. -36.0  (% reduction)
print(r["gains"]["co2_pct"])                  # e.g. -36.0
print(r["gains"]["memory_pct"])               # e.g. -50.0
print(r["gains"]["accuracy_retained_pct"])    # e.g. 100.0
print(r["annual_co2_saved_kg"])               # e.g. 4872.0

Step 4 — Interpret key metrics

Metric What it means
accuracy_retained_pct How much MMLU accuracy is kept vs FP32 baseline. 100% = no degradation.
energy_pct % energy saved per inference vs FP32 baseline. Negative = saving.
memory_pct VRAM reduction. Critical for edge deployment.
annual_co2_saved_kg CO₂ saved at 1M inferences/day for 1 year.

Edge deployment note

For deployment_target: jetson_orin (or other edge targets), measurements are
proxy measurements on H100. Accuracy and VRAM savings transfer directly to the
device. Latency/energy on H100 do NOT reflect real device performance — read VRAM
reduction as the primary result.

W&B experiment tracking

Each run logs to a dedicated W&B project:

  • Project: optimization-<model-slug> (e.g. optimization-mistral-7b-instruct-v0-3)
  • Group: run-<date> (e.g. run-2026-03-01)
  • Metrics: accuracy, latency_ms, energy_kwh, co2_kg, total_cost_eur, memory_usage_gb

To query past runs programmatically, use the wandb-experiment-memory skill.

Combine with other skills

After running Ecostral, chain with:

  • hugging-face-evaluation (HF official skill) → push accuracy results to HF model card
  • wandb-experiment-memory (this repo) → query past runs, suggest next iteration
  • hugging-face-model-trainer (HF official skill) → fine-tune based on Ecostral's recommended config