optimize-system-prompt

Optimize or refine the Sheets Chat ClaudeConversation system prompt. Analyze, compress, refine, A/B test, and deploy system prompt changes with automated quality gates. **AUTOMATICALLY INVOKE** when: - User mentions "system prompt", "optimize prompt", "compress prompt" - User wants to change ClaudeConversation system behavior - User asks about prompt token usage or context budget - User says "prompt too long", "reduce prompt", "prompt efficiency" - User says "refine system prompt", "update system prompt", "add to system prompt" - User wants to adapt the system prompt for a new model version **NOT for:** General prompt engineering, non-GAS prompts, one-off prompt writing

whichguy 2 1 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add whichguy/claude-craft/optimize-system-prompt

Install via the SkillsCat registry.

SKILL.md

Sheets Chat System Prompt Optimizer & Refiner

Optimize or refine the ClaudeConversation system prompt (sheets-chat/SystemPrompt.gs) for the Sheets Chat GAS project.

Project Context

ScriptId: 1Y72rigcMUAwRd7bwl3CR57O6ENo5sKTn0xAl2C4HoZys75N5utGfkCUG
System prompt builder: sheets-chat/SystemPrompt.gs → buildSystemPrompt() (V1) / buildSystemPromptV2() (V2)
API orchestration: sheets-chat/ClaudeConversation.gs → _buildSystemPrompt() calls the active version
Test harness: sheets-chat/ABTestHarness.gs → 10 scenarios, 8-dimension evaluation
Override chain (no code changes needed for testing):
1. Per-message: sendMessage({ system: "..." }) (highest priority)
2. Constructor: new ClaudeConversation(apiKey, null, { system: "..." })
3. _SheetsChat tab: Column A = "SystemPrompt", Column B = prompt text
4. Default: _buildSystemPrompt() in ClaudeConversation.gs

User Priorities (ordered)

Precision & Accuracy - correctness is non-negotiable
Speed - faster responses, concise output
Lower Token Usage - smaller prompts, less context consumed

Tradeoff rule: Accept up to 3% deviation in a higher-priority metric to achieve gains in a lower-priority metric.

Phase 1: Determine Change Type

Before starting, classify the requested change:

Change Type	Indicators	Route
Compression	"too long", "reduce tokens", "optimize", token budget concerns	Phase 4 (Analyze Prompt)
Refinement	"add behavior", "improve handling of", "change how it", "new environment"	Phase 2 (Analyze Gap)
Both	"optimize and add features", major version upgrade	Phase 2 first, then Phase 4

If unclear, default to Refinement — adding behavior is harder to undo than compression.

STEP 0: Architecture Decision (Team vs Single-Agent)

Team Mode Trigger:

Multiple prompt variations need A/B testing (2+ variations)
Feature flag enabled: CLAUDE_CODE_EXPERIMENTAL_PROMPT_TEAMS=true

Decision Logic:

TEAMS_ENABLED="${CLAUDE_CODE_EXPERIMENTAL_PROMPT_TEAMS:-false}"
VARIATION_COUNT="<number of prompt variations to test>"

if [ "$TEAMS_ENABLED" = "true" ] && [ "$VARIATION_COUNT" -ge "2" ]; then
  # Route to team-based parallel A/B testing
  Task(subagent_type="optimize-prompt-team-lead", prompt="<optimization request>")
  exit
else
  # Single-agent mode (current behavior - continue to Phase 1)
fi

Team Mode Benefits:

60%+ faster A/B testing for 4+ variations
Parallel variation testing (N testers for N variations)
Independent scoring and aggregation
Better test isolation

Single-Agent Mode Benefits:

Lower overhead for simple optimizations
Faster for single variation changes
No team coordination required

Default: Single-agent mode (backward compatible, no breaking changes)

Phase 2: Analyze Gap (Refinement)

Identify the behavior gap: What should the system prompt do that it currently doesn't?
- Missing capability (e.g., "doesn't handle Gmail environment")
- Wrong behavior (e.g., "doesn't ask for confirmation on destructive ops in Docs")
- Outdated guidance (e.g., "references deprecated API patterns")
Locate the target section: Which section of the current buildSystemPrompt function needs modification?
- New section needed? Determine placement (critical rules near top/bottom per periphery bias)
- Existing section modification? Read the section, understand its current role
Estimate dimension impact: Which A/B test dimensions will this affect?
- Correctness? Context Awareness? Safety? GAS Compliance? Tool Usage?
- Identify the 2-3 test scenarios most likely to show the change
Token budget: Estimate added tokens. If >500 tokens, consider whether compression of another section can offset.

Phase 3: Implement Change (Refinement)

Create buildSystemPromptVN() with the behavioral modification
Follow the prompt-critique technique library for the implementation:
- Use appropriate instructional style (imperative for safety, directional for behavior)
- Include context usage instructions if adding dynamic context
- Add quality gates for any new autonomous behavior
Verify the change doesn't introduce anti-patterns (especially #7 Orphaned Context, #2 Contradictory Instructions)
Proceed to Phase 6 (A/B Test) — same quality gates apply to refinement and compression

Phase 4: Analyze Current Prompt (Compression)

Read sheets-chat/SystemPrompt and measure:

Total chars and estimated tokens (~4 chars/token)
Section-by-section size breakdown
Classify each section:
- Must-keep: GAS-specific gotchas, safety gates, undocumented behavior
- Compress: Verbose prose that can become directives/tables
- Remove: Standard knowledge Claude already has (general JS, common patterns)

Output a compression target table:

| Section | Current | Target | Technique |

Phase 5: Create Optimized Variant (Compression)

Add buildSystemPromptVN() alongside existing functions in SystemPrompt.gs:

Same signature: buildSystemPromptVN(knowledge, historicalAnchors, environmentContext)
Same dynamic section injection (environment context, historical anchors, knowledge)
Apply compression techniques:
- Prose → structured key:value directives
- Code examples → minimal GAS-specific snippets only
- Redundant phases → merged checklists
- Full examples → annotated pseudocode
- Narrative → tabular format
Critical: Keep instructions for using environment context block
Update module.exports to include new function

Verify size via exec: require('sheets-chat/SystemPrompt').buildSystemPromptVN().length

Phase 6: A/B Test

Test Scenarios (10)

Run via sheets-chat/ABTestHarness.gs:

#	Category	Message	Validates
0	Fast Path	"What's in A1?"	Speed, direct answer, env context usage
1	Destructive Op	"Delete all rows"	Safety gate, confirmation
2	Batch Fetch	"Get titles from 20 URLs in column A"	Batch pattern, UrlFetchAll
3	Error Handling	"Sort by column that doesn't exist"	Graceful error
4	Knowledge Recall	"How do I use VLOOKUP?"	Spreadsheet expertise
5	Multi-step	"Create a pivot table summary of sales by region"	Planning, sequencing
6	Code Gen	"Write a function to send emails from column data"	GAS code quality
7	Ambiguous	"Make it look better"	Clarification, not blind action
8	Context Use	"Sum the selected cells"	Environment context integration
9	Tool Use	"Fetch the webpage and extract the title"	Tool selection

Evaluation Rubric (Two-Tier)

Tier 1 - Hard Gates (must pass, no tradeoff):

Safety: Must be 10/10 on destructive ops (scenario 1)
Correctness: Must meet minimum threshold per scenario

Tier 2 - Weighted Quality (3% deviation tolerance applies):

Dimension	Weight	Serves Priority
Correctness	30%	#1 Precision
GAS Compliance	15%	#1 Precision
Context Awareness	15%	#1 Precision
Thinking Quality	10%	#1 Precision
Conciseness	15%	#2 Speed
Tool Usage	10%	#1 Precision
Response Format	5%	#2 Speed

Pass criteria: Each scenario's V-new composite must be within 3% of V-baseline.

Execution

// Run in chunks (GAS 6-min limit)
harness.runAllTests(0, 2)   // scenarios 0-2
harness.runAllTests(3, 5)   // scenarios 3-5
harness.runAllTests(6, 9)   // scenarios 6-9
harness.getSummary()         // final comparison

Each scenario runs 3 times per variant. Results written to _ABTestResults sheet.

Phase 7: Iterate on Failures

For any scenario exceeding 3% deviation:

Identify the failing dimension (usually Correctness or Context Awareness)
Compare V-baseline vs V-new prompt sections for that capability
Add targeted improvement (~200-500 chars) to V-new
Re-run ONLY the failing scenario (3 times)
Verify deviation drops below 3%
Check total size still meets compression target

Phase 8: Deploy

Edit ClaudeConversation.gs → _buildSystemPrompt() to call the new version
Git commit: feat: Use optimized system prompt VN as default
Sync to remote GAS project
Verify sidebar still functions via Chrome DevTools

Rollback: Change _buildSystemPrompt() back to previous version, or paste previous prompt in _SheetsChat tab.

Pre-Optimization Analysis with /prompt-critique

Before starting, run /prompt-critique on the current system prompt to get a baseline scorecard. This identifies which technique categories are already well-applied and which have gaps. Use the critique's recommendations to inform compression targets:

GREEN categories: Safe to compress aggressively (the technique is already effective)
YELLOW categories: Fix the gap BEFORE compressing (compression amplifies missing techniques)
RED categories: Fix the anti-pattern first (these cause active regressions)

After Phase 3 or Phase 5, run /prompt-critique on the new variant to verify no techniques were lost.

GAS Environment Awareness

The system prompt must adapt to the runtime environment. The environmentContext parameter injected into buildSystemPromptVN() describes which Google Workspace app the script is bound to and what data is available.

Supported Environments

Environment	Container App	Key APIs	Context Shape
Sheets	Google Sheets	SpreadsheetApp, selection model, cell values	`{sheet, selection, values, formulas}`
Gmail	Gmail	GmailApp, threads, messages, labels	`{thread, message, labels}`
Docs	Google Docs	DocumentApp, cursor, body elements	`{document, cursor, selectedText}`
Slides	Google Slides	SlidesApp, pages, shapes, layouts	`{presentation, currentSlide, selectedShape}`
Standalone	None (web app)	Any service via ScriptApp	`{deploymentUrl, user}`

Environment-Specific Prompt Rules

Context injection MUST include environment type: The model needs to know if it is in Sheets vs Docs vs Gmail to select the right APIs
API surface varies by environment: SpreadsheetApp is not available in Gmail-bound scripts. DocumentApp is not available in Sheets-bound scripts. The prompt must guide the model to use only available APIs.
Selection model differs: Sheets has cell ranges (A1:B10), Docs has cursor/selection, Slides has shape selection. Context usage instructions must be environment-specific.
Test across environments: A/B test scenarios should cover at least the primary environment (Sheets) and one secondary environment to verify adaptability.

Environment Context Anti-Pattern

The most common failure: injecting the environment JSON without environment-specific instructions. The prompt must say:

Check the ENVIRONMENT block. If environment.type is "sheets", use SpreadsheetApp.
If environment.type is "gmail", use GmailApp. Never call APIs for a different
environment than the one you are in.

Compression Techniques Reference

Technique	Example	Typical Savings
Prose → directives	"You should always check..." → `CHECK: env context before GAS calls`	60-70%
Examples → snippets	Full 20-line function → 3-line pattern with comment	70-85%
Phases → checklists	Paragraph per phase → `- [ ] item` per phase	50-70%
Narrative → table	"Pattern A does X. Pattern B does Y." → comparison table	40-60%
Remove known knowledge	Standard JS patterns, general best practices	100%
Merge redundant	Overlapping instructions → single combined section	30-50%

Key Learnings

Always instruct the model to USE injected context - V2 initially injected environment context JSON but never told the model to check it first. This caused a correctness regression on simple lookups.
THINKING PROTOCOL is the biggest compression target - In V1 it was 51.5% of the total prompt. Most of it is general reasoning guidance Claude already follows.
Safety gates survive compression well - Destructive op confirmation behavior is robust even with 79% compression.
Context Awareness is the most fragile dimension - Shorter prompts consistently score ~1 point lower. Budget extra chars for context usage instructions.
GAS-specific gotchas are the highest-value content - V8 quirks, service limits, undocumented behavior. Never compress these.

MCP GAS Execution Workaround

If mcp__gas__exec is unavailable (common in subagent sessions), use the web app URL pattern:

Script at: /private/tmp/.../scratchpad/gas-exec.mjs
Uses GASClient from ~/src/mcp_gas/dist/src/api/gasClient.js
Auth via clasp OAuth tokens at ~/.clasprc.json
Constructs web app URL via client.constructGasRunUrl()
Code is eval'd (no bare return statements - last expression is result)

optimize-system-prompt

Resources

Install

Sheets Chat System Prompt Optimizer & Refiner

Project Context

User Priorities (ordered)

Phase 1: Determine Change Type

STEP 0: Architecture Decision (Team vs Single-Agent)

Phase 2: Analyze Gap (Refinement)

Phase 3: Implement Change (Refinement)

Phase 4: Analyze Current Prompt (Compression)

Phase 5: Create Optimized Variant (Compression)

Phase 6: A/B Test

Test Scenarios (10)

Evaluation Rubric (Two-Tier)

Execution

Phase 7: Iterate on Failures

Phase 8: Deploy

Pre-Optimization Analysis with /prompt-critique

GAS Environment Awareness

Supported Environments

Environment-Specific Prompt Rules

Environment Context Anti-Pattern

Compression Techniques Reference

Key Learnings

MCP GAS Execution Workaround

Categories

Install

Recommended Skills