Resources
1Install
npx skillscat add olino3/forge/database-optimizer Install via the SkillsCat registry.
Purpose
[TODO: Add purpose description]
MANDATORY WORKFLOW (MUST FOLLOW EXACTLY)
⚠️ STEP 1: Identify Optimization Target (REQUIRED)
YOU MUST:
- Ask the user about the optimization scope:
- Database system and version (PostgreSQL, MySQL, SQL Server, Oracle, SQLite)
- Specific slow queries or general workload tuning?
- Access to EXPLAIN/EXPLAIN ANALYZE output?
- Access to slow query logs or performance monitoring data?
- Production vs staging environment?
- Gather query context:
- The SQL queries to optimize (exact text)
- Table sizes and row counts for involved tables
- Current index definitions on involved tables
- Expected vs actual execution time
- Frequency of execution (queries/second)
- Identify constraints:
- Acceptable downtime for index creation?
- Read-heavy vs write-heavy workload?
- Disk space budget for new indexes?
- Locking tolerance during optimization?
DO NOT PROCEED WITHOUT IDENTIFYING TARGET
⚠️ STEP 2: Load Project Memory & Context (REQUIRED)
YOU MUST:
CHECK PROJECT MEMORY FIRST:
- Identify the project name from the repository root or ask the user
- Use
memoryStore.getSkillMemory("database-optimizer", "{project-name}")to load existing project memory. See MemoryStore Interface. - If memory exists, review previously optimized queries, index changes, and project-specific tuning history
- If no memory exists, you will create it later in this process
USE CONTEXT INDEXES FOR EFFICIENT LOADING:
- Use
contextProvider.getDomainIndex("database")to discover available database context files. See ContextProvider Interface. - Use
contextProvider.getAlwaysLoadFiles("database")to load foundational optimization concepts - Use
contextProvider.getConditionalContext("database", detection)to load engine-specific tuning guides - If analyzing application-level patterns, use
contextProvider.getCrossDomainContext("database", {"application": true})for ORM context
- Use
Ask clarifying questions in Socratic format:
- What is the primary performance goal (latency, throughput, resource reduction)?
- Is this a one-time slow query fix or ongoing workload tuning?
- Are there SLA targets or latency budgets for the queries?
- What monitoring tools are in place (pg_stat_statements, Performance Schema, DMVs)?
- Has anything changed recently (data volume growth, schema changes, traffic patterns)?
DO NOT PROCEED WITHOUT COMPLETING THIS STEP
⚠️ STEP 3: Extract Performance Data (REQUIRED)
YOU MUST:
Collect execution plans using the appropriate method:
For PostgreSQL:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) <query>; SELECT * FROM pg_stat_user_indexes WHERE idx_scan = 0; SELECT * FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;For MySQL:
EXPLAIN FORMAT=JSON <query>; SHOW STATUS LIKE 'Handler_%'; SELECT * FROM sys.statements_with_full_table_scans;For SQL Server:
SET STATISTICS IO ON; SET STATISTICS TIME ON; -- Execute query SELECT * FROM sys.dm_exec_query_stats ORDER BY total_worker_time DESC; SELECT * FROM sys.dm_db_missing_index_details;Collect index metadata:
- All indexes on tables involved in the query
- Index usage statistics (scans, reads, writes)
- Index sizes and fragmentation levels
- Unused and duplicate index candidates
Collect table statistics:
- Row counts and table sizes
- Column cardinality and data distribution
- Statistics freshness (last ANALYZE/UPDATE STATISTICS)
- Partitioning scheme if applicable
DO NOT PROCEED WITHOUT EXTRACTING PERFORMANCE DATA
⚠️ STEP 4: Deep Optimization Analysis (REQUIRED)
YOU MUST perform deep analysis covering ALL these aspects:
4.1 Execution Plan Interpretation
- Identify expensive operators: Sequential scans, nested loops on large sets, hash joins with spills
- Check row estimate accuracy: Compare estimated vs actual rows at each operator
- Evaluate operator ordering: Is the planner choosing the optimal join order?
- Detect plan regressions: Has the plan changed from a known-good state?
- Assess parallelism: Is parallel execution used where beneficial?
4.2 Index Optimization
- Missing index identification: Columns in WHERE, JOIN, ORDER BY, GROUP BY without supporting indexes
- Covering index opportunities: Include columns to enable index-only scans
- Composite index column ordering: Selectivity-based ordering, equality before range
- Partial index candidates: Filtered indexes for skewed data distributions
- Redundant index detection: Indexes that are prefixes of other indexes or never used
- Index type selection: B-tree vs hash vs GIN vs GiST based on operator usage
4.3 Query Rewriting
- Subquery to JOIN conversion: Correlated subqueries that can be flattened
- EXISTS vs IN vs JOIN: Choose the optimal semi-join strategy
- CTE materialization: Force or prevent CTE materialization (PostgreSQL 12+)
- UNION vs UNION ALL: Eliminate unnecessary deduplication
- Predicate pushdown: Move filters closer to base tables
- Pagination optimization: Keyset pagination vs OFFSET for large result sets
- Selective column retrieval: Replace SELECT * with explicit column lists
4.4 Statistics and Planner Accuracy
- Stale statistics: Tables with outdated statistics causing plan misestimates
- Custom statistics: Multi-column statistics for correlated columns
- Histogram boundaries: Distribution skew not captured by default histograms
- Planner hints: When to guide the optimizer (pg_hint_plan, USE INDEX, query hints)
4.5 Resource and Configuration Tuning
- Memory allocation: work_mem, sort_buffer_size, memory grants
- I/O patterns: Sequential vs random read costs, effective_cache_size
- Temp space usage: Queries spilling to disk, temp tablespace sizing
- Connection pooling: Pool size relative to active query concurrency
- Lock contention: Queries blocked by long-running transactions
4.6 Workload-Level Analysis
- Hot query identification: Top queries by total time, frequency, and I/O
- Read/write contention: Write-heavy tables with too many indexes
- Peak-hour patterns: Queries that degrade during high traffic
- Batch vs OLTP optimization: Different tuning strategies per workload type
DO NOT PROCEED WITHOUT COMPREHENSIVE ANALYSIS
⚠️ STEP 5: Generate Optimization Report & Update Memory (REQUIRED)
YOU MUST:
Generate optimization report including:
- Executive summary with estimated performance improvement
- Original query and execution plan
- Optimized query (if rewritten) with new execution plan
- Index recommendations (CREATE/DROP statements)
- Configuration changes (with before/after values)
- Risk assessment for each recommendation
Prioritize recommendations using impact/effort matrix:
- Quick wins: Index additions, statistics refresh (minutes)
- Medium effort: Query rewrites, configuration changes (hours)
- Long-term: Schema changes, partitioning, materialized views (days/weeks)
Provide implementation scripts:
- Index creation DDL with CONCURRENTLY option where supported
- Query rewrites with performance comparison
- Configuration change commands
- Rollback scripts for each change
UPDATE PROJECT MEMORY:
- Use
memoryStore.update(layer="skill-specific", skill="database-optimizer", project="{project-name}", ...)to store: - Optimized queries and their before/after metrics
- Index changes applied and their measured impact
- Configuration tuning history
- Workload patterns and baselines
- Timestamps and staleness tracking are handled automatically by MemoryStore. See MemoryStore Interface.
- Use
Define validation criteria:
- Expected execution time improvement per query
- Metrics to monitor post-implementation
- Regression detection approach
- Follow-up analysis schedule
MEMORY UPDATE IS MANDATORY - DO NOT SKIP
Step 6: Generate Output
Create deliverables and save to /claudedocs/:
- Follow OUTPUT_CONVENTIONS.md naming:
database-optimizer_{project}_{YYYY-MM-DD}.md - Include all required sections
- Provide clear, actionable recommendations
Output Requirements
Optimization Report Must Include:
Performance Summary
- Current vs target execution time
- Queries analyzed and optimized
- Overall improvement estimate
Execution Plan Analysis
- Annotated EXPLAIN output
- Bottleneck identification
- Operator cost breakdown
Index Recommendations
- New indexes with DDL statements
- Indexes to drop (unused/redundant)
- Estimated space impact
- Creation strategy (online vs offline)
Query Rewrites
- Original and optimized SQL side by side
- Explanation of each transformation
- Expected plan change
Configuration Changes
- Parameter name, current value, recommended value
- Scope (session, database, global)
- Restart requirements
Implementation Plan
- Ordered steps with rollback procedures
- Timing recommendations (off-peak)
- Monitoring checkpoints
Risk Assessment
- Impact on write performance (new indexes)
- Regression potential
- Lock/blocking duration estimates
Socratic Prompting Guidelines
When interacting with users, ask clarifying questions such as:
Understanding the Problem:
- "What is the current execution time and what is your target?"
- "Is this query run interactively or as a batch process?"
- "How many times per second/minute is this query executed?"
Scope Definition:
- "Should I focus on a single query or analyze the full workload?"
- "Do you have EXPLAIN ANALYZE output, or should I suggest how to collect it?"
- "Are there write-performance concerns alongside read optimization?"
Context Understanding:
- "What is the table size and how fast is it growing?"
- "Have you recently changed schema, added data, or upgraded the database?"
- "Are there existing monitoring dashboards or query performance baselines?"
Technical Details:
- "Do you use an ORM, and can queries be modified at the SQL level?"
- "Is online (non-blocking) index creation acceptable, or is downtime available?"
- "Are there partitioning or sharding strategies already in place?"
Quality Standards
Your analysis MUST:
- ✅ Base recommendations on actual execution plan data, not guesswork
- ✅ Provide exact DDL statements for every index recommendation
- ✅ Show before/after comparisons for every query rewrite
- ✅ Consider write performance impact of new indexes
- ✅ Include rollback procedures for every change
- ✅ Validate statistics freshness before blaming the planner
- ✅ Use database-specific syntax matching the target engine
- ✅ Update project memory for future reference
Your analysis MUST NOT:
- ❌ Recommend indexes without analyzing the execution plan
- ❌ Ignore write-side impact when adding indexes to write-heavy tables
- ❌ Suggest configuration changes without understanding the workload
- ❌ Provide generic advice not grounded in the specific query and data
- ❌ Overlook locking implications of index creation or schema changes
Integration with Other Skills
Combine with:
database-schema-analysis: For full schema understanding before optimizationpython-code-review: When optimizing SQLAlchemy or Django ORM queriesdotnet-code-review: When optimizing Entity Framework or Dapper queriesfile-schema-analysis: For API query patterns that drive database load
Supported Database Systems
Fully Supported
- ✅ PostgreSQL (all versions, including EXPLAIN ANALYZE, pg_stat_statements)
- ✅ MySQL / MariaDB (EXPLAIN FORMAT=JSON, Performance Schema, sys schema)
- ✅ Microsoft SQL Server (execution plans, DMVs, Query Store)
- ✅ Oracle Database (EXPLAIN PLAN, AWR, ASH, SQL Tuning Advisor)
- ✅ SQLite (EXPLAIN QUERY PLAN, ANALYZE)
Partial Support
- ⚠️ Amazon Aurora (PostgreSQL/MySQL compatible, Aurora-specific metrics)
- ⚠️ Google Cloud SQL (engine-specific with cloud console metrics)
- ⚠️ Azure SQL Database (SQL Server compatible with Azure DMVs)
- ⚠️ CockroachDB (PostgreSQL-compatible with distributed plan analysis)
Version History
- v1.0.0 (2025-02-06): Initial release
- Execution plan analysis workflow
- Index optimization recommendations
- Query rewriting guidance
- Configuration tuning support
- Project memory integration
- Interface-based patterns (ContextProvider + MemoryStore)
Last Updated: 2025-02-06
Maintained by: The Forge