Safe experimentation framework for AI agents. Creates isolated sandbox environments for trying new features, testing approaches, and exploring solutions without polluting the main codebase. USE WHEN: Agent needs to try something uncertain, explore multiple approaches, test a new library, prototype a feature, or run a technical spike before committing to implementation. PRIMARY TRIGGERS: "experiment with" = Setup sandbox + run experiment "try this approach" = Quick experiment in sandbox "spike" / "POC" / "prototype" = Time-boxed technical investigation "tinker" / "tinkering mode" = Enter experimentation workflow "explore options" = Multi-approach comparison in sandbox NOT FOR: Debugging (use debugger), testing (use test runner), or committed feature work (use git branches). DIFFERENTIATOR: Unlike git branches (for committed direction), tinkering is for "I don't know if this will work" exploration. Try 5 things in sandbox before committing to a branch. Faster feedback, zero codebase pollution.
Install
npx skillscat add rfxlamia/claude-skillkit/tinkering Install via the SkillsCat registry.
Tinkering
Overview
Structured experimentation framework. When uncertain about an approach, don't
hack at production code - create an isolated sandbox, try freely, then graduate
successful experiments or discard failed ones cleanly.
Core principle: The output of tinkering is knowledge, not production code.
A successful experiment teaches you how to solve the problem. The actual
implementation happens after, informed by what you learned.
When to Use
| Situation | Tinkering? | Why |
|---|---|---|
| "Will this library work for our use case?" | Yes | Unknown outcome, need to explore |
| "Which of these 3 approaches is fastest?" | Yes | Comparing multiple options |
| "How do I integrate this API?" | Yes | Technical spike, learning-focused |
| "Add a login button to the header" | No | Clear requirement, use git branch |
| "Fix the null pointer on line 42" | No | Debugging, not experimenting |
| "Refactor auth module to use JWT" | Maybe | If approach uncertain, spike first |
Workflow
Phase 1: Setup Sandbox
Create isolated experiment environment:
# 1. Create experiment directory
mkdir -p _experiments/{experiment-name}
# 2. Add to .gitignore (if not already present)
grep -qxF '_experiments/' .gitignore 2>/dev/null || echo '_experiments/' >> .gitignore
# 3. Create manifest (first time only)
# See MANIFEST.md template belowMANIFEST.md template (create at _experiments/MANIFEST.md):
# Experiment Log
## Active
### {experiment-name}
- **Date**: YYYY-MM-DD
- **Hypothesis**: What we're trying to learn
- **Status**: active
- **Result**: (pending)
## Completed
<!-- Move finished experiments here -->Rules:
- NEVER modify production files during tinkering
- ALL experiment code goes inside
_experiments/{name}/ - Copy source files into sandbox if you need to modify them
Phase 2: Hypothesize
Before writing any code, state clearly:
Question : What specific question are we answering?
Success : How will we know it works?
Time box : Maximum time to spend (default: 30 min)
Scope : Which files/areas are involved?Write this in _experiments/{name}/HYPOTHESIS.md or as a top comment.
Example:
Question : Can we replace moment.js with date-fns and reduce bundle size?
Success : Bundle decreases >20%, all date formatting still works
Time box : 20 minutes
Scope : src/utils/date.ts, package.jsonPhase 3: Experiment
Build freely in the sandbox.
Modifying existing code:
# Copy the file(s) you need to change
cp src/utils/date.ts _experiments/date-fns-migration/date.ts
# Edit the copy freely - zero risk to productionNew feature exploration:
# Create new files directly in sandbox
touch _experiments/websocket-poc/server.ts
touch _experiments/websocket-poc/client.tsLibrary evaluation:
# Minimal test script in sandbox
touch _experiments/redis-eval/test_redis.py
# Use isolated dependencies (venv, local node_modules)Multi-approach comparison:
_experiments/caching-spike/
approach-a-redis/
approach-b-memory/
approach-c-sqlite/
COMPARISON.md # Side-by-side evaluationRules during experimentation:
- Stay in sandbox - never touch production files
- Quick and dirty is fine - this is throwaway code
- Document learnings as you go
- Stop at time box, even if incomplete - partial answers are still answers
Phase 4: Evaluate
Assess results against the hypothesis.
Checklist:
- Did the experiment answer the original question?
- Does it meet the success criteria from Phase 2?
- Any unexpected side effects or constraints discovered?
- Is the approach feasible for production implementation?
- What's the estimated effort to implement properly?
Update MANIFEST.md:
- **Result**: SUCCESS - date-fns reduced bundle by 34%, all tests pass
- **Status**: graduated
- **Notes**: Need to handle timezone edge case in formatRelative()Decision:
- Positive result -> Phase 5, Path A (Graduate)
- Negative result -> Phase 5, Path B (Discard)
- Inconclusive -> Extend time box OR try different approach
Phase 5: Graduate or Discard
Path A: Graduate (success)
Load reference: references/graduation-checklist.md
Quick summary:
- Do NOT copy-paste experiment code directly into production
- Re-implement properly using what you learned
- Write proper tests for the production implementation
- Apply code standards (experiment was quick & dirty, production shouldn't be)
- Reference experiment in commit message for context
Path B: Discard (failed)
Failed experiments are valuable - they tell you what NOT to do.
- Update MANIFEST.md with failure reason and learnings
- Delete experiment files:
rm -rf _experiments/{name}/ - Or keep briefly if learnings are worth referencing
Phase 6: Cleanup
# Remove completed experiment
rm -rf _experiments/{experiment-name}/
# Update MANIFEST.md - move entry to "Completed" sectionMANIFEST.md after cleanup:
## Completed
### date-fns-migration (2025-01-15)
- GRADUATED - Implemented in commit abc123
- Learnings: date-fns 3x smaller, timezone handling needs explicit config
### graphql-evaluation (2025-01-10)
- DISCARDED - Too much overhead for our simple REST API
- Learnings: REST + OpenAPI better fit for <20 endpointsQuick Reference
Setup -> mkdir _experiments/{name}, add to .gitignore
Hypothesize -> Question + success criteria + time box
Experiment -> Build in sandbox (never touch production)
Evaluate -> Check against success criteria
Graduate -> Re-implement properly in production
Cleanup -> Remove files, update manifestEdge Cases
Needs database changes: Use separate test DB or schema prefix. Document in hypothesis.
Needs running server: Run from sandbox, use different port to avoid conflicts.
Multiple concurrent experiments: Each gets own subdirectory. MANIFEST tracks all.
Experiment grows into real feature: Graduate it. Don't let experiments become shadow production code.
Team member needs to see experiment: Push to feature branch (temporarily track _experiments/) or share via patch.