wandb-experiment-memory

Query Weights & Biases experiment history to retrieve past optimization runs, compare configurations, and inform the next optimization decision. Use this skill when asked to review past AI optimization experiments, find the best configuration tried so far, understand what has already been explored, or suggest the next experiment to run based on W&B history.

wilfred-dore 0 Updated 4mo ago

GitHub

Install

npx skillscat add wilfred-dore/ecostral/wandb-experiment-memory

Install via the SkillsCat registry.

SKILL.md

W&B Experiment Memory Skill

This skill enables a coding agent to act as an informed optimizer by reading
past experiment results from Weights & Biases before proposing the next action.
It closes the self-improvement loop: measure → log → remember → improve.

When to activate this skill

User asks "what configurations have we already tried for model X?"
User asks "what was the best result so far for this model?"
User asks "what should we try next?"
Before running a new Ecostral optimization (to avoid repeating past experiments)
After an optimization run (to confirm W&B logging succeeded)

Prerequisites

pip install wandb
# W&B API key must be set in secrets.json or as WANDB_API_KEY env var

Step 1 — Query past runs for a model

from ecostral.memory.wandb_mcp import WandbMemory

memory = WandbMemory(
    api_key="<wandb_api_key>",   # from secrets.json
    entity="wdore-personal",
)

# Fetch recent runs for a specific model
# Project name pattern: optimization-<model-slug>
# e.g. optimization-mistral-7b-instruct-v0-3
runs = memory.get_recent_runs(
    project="optimization-mistral-7b-instruct-v0-3",
    n=20,
)

for run in runs:
    print(run["precision"], run["accuracy"], run["co2_kg"], run["latency_ms"])

Step 2 — Find the best configuration

# Filter runs that meet accuracy threshold
threshold = 0.9   # 90% of FP32 baseline
qualified = [r for r in runs if r.get("accuracy", 0) >= threshold]

# Sort by CO₂ (ascending = most frugal first)
best = sorted(qualified, key=lambda r: r.get("co2_kg", float("inf")))
if best:
    print(f"Best config: {best[0]['precision']} / {best[0].get('technique', '?')}")
    print(f"  CO₂/inf : {best[0]['co2_kg']*1e6:.2f} mg")
    print(f"  Accuracy: {best[0]['accuracy']:.4f}")

Step 3 — Identify unexplored configurations

tried_precisions = {r["precision"] for r in runs}
all_precisions   = {"fp32", "bf16", "fp16", "int8", "int4", "fp8"}
unexplored       = all_precisions - tried_precisions
print(f"Not yet tried: {unexplored}")

Step 4 — Summarize for a Mistral reasoning prompt

# Format past runs as context for the next Mistral proposal
summary_lines = []
for r in runs[:10]:
    summary_lines.append(
        f"- {r['precision']}/{r.get('technique','?')}: "
        f"accuracy={r.get('accuracy',0):.3f}, "
        f"co2={r.get('co2_kg',0)*1e6:.1f}mg, "
        f"latency={r.get('latency_ms',0):.0f}ms, "
        f"vram={r.get('memory_usage_gb',0):.1f}GB"
    )
context = "\n".join(summary_lines)
print(context)
# → pass this string to Mistral Large as experiment history

W&B project naming convention

Model	W&B project
mistralai/Mistral-7B-Instruct-v0.3	`optimization-mistral-7b-instruct-v0-3`
TinyLlama/TinyLlama-1.1B-Chat-v1.0	`optimization-tinyllama-1-1b-chat-v1-0`
microsoft/Phi-3.5-mini-instruct	`optimization-phi-3-5-mini-instruct`
Any model, Jetson Orin target	`optimization-<slug>-jetson-orin`

Group naming: run-<date> (e.g. run-2026-03-01) — one group per optimization session.

W&B MCP Server (programmatic access)

For MCP-compatible agents, the W&B MCP server exposes experiment history directly:

# The WandbMemory class in ecostral/memory/wandb_mcp.py wraps the W&B API
# and formats run history as structured context for LLM prompts.
# It is the same module used internally by the Ecostral optimization loop.
from ecostral.memory.wandb_mcp import WandbMemory

Combine with other skills

ecostral-optimizer (this repo) → run the next optimization based on memory insights
hugging-face-evaluation (HF official) → cross-reference W&B accuracy with HF model card evals
hugging-face-trackio (HF official) → if migrating experiment tracking from W&B to HF TrackIO

wandb-experiment-memory

Install

W&B Experiment Memory Skill

When to activate this skill

Prerequisites

Step 1 — Query past runs for a model

Step 2 — Find the best configuration

Step 3 — Identify unexplored configurations

Step 4 — Summarize for a Mistral reasoning prompt

W&B project naming convention

W&B MCP Server (programmatic access)

Combine with other skills

Categories

Install

Recommended Skills