wilfred-dore

wandb-experiment-memory

Query Weights & Biases experiment history to retrieve past optimization runs, compare configurations, and inform the next optimization decision. Use this skill when asked to review past AI optimization experiments, find the best configuration tried so far, understand what has already been explored, or suggest the next experiment to run based on W&B history.

wilfred-dore 0 Updated 3mo ago
GitHub

Install

npx skillscat add wilfred-dore/ecostral/wandb-experiment-memory

Install via the SkillsCat registry.

SKILL.md

W&B Experiment Memory Skill

This skill enables a coding agent to act as an informed optimizer by reading
past experiment results from Weights & Biases before proposing the next action.
It closes the self-improvement loop: measure → log → remember → improve.

When to activate this skill

  • User asks "what configurations have we already tried for model X?"
  • User asks "what was the best result so far for this model?"
  • User asks "what should we try next?"
  • Before running a new Ecostral optimization (to avoid repeating past experiments)
  • After an optimization run (to confirm W&B logging succeeded)

Prerequisites

pip install wandb
# W&B API key must be set in secrets.json or as WANDB_API_KEY env var

Step 1 — Query past runs for a model

from ecostral.memory.wandb_mcp import WandbMemory

memory = WandbMemory(
    api_key="<wandb_api_key>",   # from secrets.json
    entity="wdore-personal",
)

# Fetch recent runs for a specific model
# Project name pattern: optimization-<model-slug>
# e.g. optimization-mistral-7b-instruct-v0-3
runs = memory.get_recent_runs(
    project="optimization-mistral-7b-instruct-v0-3",
    n=20,
)

for run in runs:
    print(run["precision"], run["accuracy"], run["co2_kg"], run["latency_ms"])

Step 2 — Find the best configuration

# Filter runs that meet accuracy threshold
threshold = 0.9   # 90% of FP32 baseline
qualified = [r for r in runs if r.get("accuracy", 0) >= threshold]

# Sort by CO₂ (ascending = most frugal first)
best = sorted(qualified, key=lambda r: r.get("co2_kg", float("inf")))
if best:
    print(f"Best config: {best[0]['precision']} / {best[0].get('technique', '?')}")
    print(f"  CO₂/inf : {best[0]['co2_kg']*1e6:.2f} mg")
    print(f"  Accuracy: {best[0]['accuracy']:.4f}")

Step 3 — Identify unexplored configurations

tried_precisions = {r["precision"] for r in runs}
all_precisions   = {"fp32", "bf16", "fp16", "int8", "int4", "fp8"}
unexplored       = all_precisions - tried_precisions
print(f"Not yet tried: {unexplored}")

Step 4 — Summarize for a Mistral reasoning prompt

# Format past runs as context for the next Mistral proposal
summary_lines = []
for r in runs[:10]:
    summary_lines.append(
        f"- {r['precision']}/{r.get('technique','?')}: "
        f"accuracy={r.get('accuracy',0):.3f}, "
        f"co2={r.get('co2_kg',0)*1e6:.1f}mg, "
        f"latency={r.get('latency_ms',0):.0f}ms, "
        f"vram={r.get('memory_usage_gb',0):.1f}GB"
    )
context = "\n".join(summary_lines)
print(context)
# → pass this string to Mistral Large as experiment history

W&B project naming convention

Model W&B project
mistralai/Mistral-7B-Instruct-v0.3 optimization-mistral-7b-instruct-v0-3
TinyLlama/TinyLlama-1.1B-Chat-v1.0 optimization-tinyllama-1-1b-chat-v1-0
microsoft/Phi-3.5-mini-instruct optimization-phi-3-5-mini-instruct
Any model, Jetson Orin target optimization-<slug>-jetson-orin

Group naming: run-<date> (e.g. run-2026-03-01) — one group per optimization session.

W&B MCP Server (programmatic access)

For MCP-compatible agents, the W&B MCP server exposes experiment history directly:

# The WandbMemory class in ecostral/memory/wandb_mcp.py wraps the W&B API
# and formats run history as structured context for LLM prompts.
# It is the same module used internally by the Ecostral optimization loop.
from ecostral.memory.wandb_mcp import WandbMemory

Combine with other skills

  • ecostral-optimizer (this repo) → run the next optimization based on memory insights
  • hugging-face-evaluation (HF official) → cross-reference W&B accuracy with HF model card evals
  • hugging-face-trackio (HF official) → if migrating experiment tracking from W&B to HF TrackIO