Pentest Validation
When validating security findings:
1. REQUIRE explicit authorization for target URL
2. SCAN with qe-security-scanner (SAST + dependency + secrets)
3. ANALYZE with qe-security-reviewer + qe-security-auditor (parallel)
4. VALIDATE with qe-pentest-validator (graduated exploitation, parallel per vuln type)
5. REPORT only confirmed findings with PoC evidence ("No Exploit, No Report")
6. UPDATE exploit playbook with new patterns
Quality Gates:
- Authorization confirmed before ANY exploitation
- Target URL is staging/dev (NOT production)
- Budget cap enforced ($15 default)
- Time cap enforced (30 min default)
- All exploitation attempts logged</default_to_action>
Quick Reference Card
The 4-Phase Pipeline
| Phase |
Agent(s) |
Purpose |
Parallelism |
| 1. Recon |
qe-security-scanner |
SAST, DAST, dependency scan, secrets |
Internal parallel |
| 2. Analysis |
qe-security-reviewer + qe-security-auditor |
Code review + compliance check |
Both in parallel |
| 3. Validation |
qe-pentest-validator |
Graduated exploit validation |
Per-vuln-type parallel |
| 4. Report |
qe-quality-gate |
"No Exploit, No Report" filter |
Sequential |
Graduated Exploitation Tiers
| Tier |
Handler |
Cost |
Latency |
Use When |
| 1 |
Agent Booster (WASM) |
$0 |
<1ms |
Code pattern is conclusive (eval, innerHTML, hardcoded creds) |
| 2 |
Haiku |
$0.0002 |
~500ms |
Need payload test against live target |
| 3 |
Sonnet/Opus |
$0.003-$0.015 |
2-5s |
Full exploit chain with data proof |
When to Use This Skill
| Scenario |
Tier |
Estimated Cost |
| PR security review (source only) |
1 |
$0 |
| Pre-release validation (staging) |
1-2 |
$1-5 |
| Full pentest validation |
1-3 |
$5-15 |
| Compliance audit evidence |
1-3 |
$5-15 |
Configuration
pentest:
target_url: https://staging.app.com # REQUIRED for Tier 2-3
source_repo: ./src # REQUIRED for Tier 1+
exploitation_tier: 2 # 1=pattern-only, 2=payload-test, 3=full-exploit
vuln_types: # Which pipelines to run
- injection # SQL, NoSQL, command injection
- xss # Reflected, stored, DOM XSS
- auth # Auth bypass, session, JWT
- ssrf # URL scheme abuse, metadata
max_cost_usd: 15 # Budget cap per run
timeout_minutes: 30 # Time cap per run
require_authorization: true # MUST confirm target ownership
no_production: true # Block production URLs
production_patterns: # URL patterns to block
- "*.prod.*"
- "api.*"
- "www.*"
Safeguards (Mandatory)
Authorization Gate
Every pentest validation run MUST:
- Display target URL and exploitation tier to user
- Require explicit confirmation: "I own/authorized testing of this target"
- Log authorization with timestamp
- Block if target URL matches production patterns
What This Skill Does NOT Do
- Full autonomous reconnaissance (Nmap, Subfinder)
- Zero-day exploit development
- Attack targets without explicit authorization
- Test production systems
- Store actual exfiltrated data (only proof of access)
- Social engineering or phishing simulation
- Port scanning or service discovery
Validation Pipelines
Injection Pipeline
| Attack |
Tier 1 (Pattern) |
Tier 2 (Payload) |
Tier 3 (Full) |
| SQL injection |
String concat in query |
' OR '1'='1 response diff |
UNION SELECT data extraction |
| NoSQL injection |
$where, $gt in query |
Operator injection test |
Collection enumeration |
| Command injection |
exec(), system() calls |
Command delimiter test |
Reverse shell proof |
| LDAP injection |
String concat in filter |
Wildcard injection |
Directory enumeration |
XSS Pipeline
| Attack |
Tier 1 (Pattern) |
Tier 2 (Payload) |
Tier 3 (Full) |
| Reflected XSS |
No output encoding |
<img onerror> reflection |
Browser JS execution via Playwright |
| Stored XSS |
innerHTML assignment |
Payload stored + retrieved |
Cookie theft PoC |
| DOM XSS |
document.write(location) |
Fragment injection |
DOM manipulation proof |
Auth Pipeline
| Attack |
Tier 1 (Pattern) |
Tier 2 (Payload) |
Tier 3 (Full) |
| JWT none |
No algorithm validation |
Modified JWT accepted |
Admin access with forged token |
| Session fixation |
No session rotation |
Pre-set session reused |
Cross-user session hijack |
| Credential stuffing |
No rate limiting |
100 attempts unblocked |
Valid credential discovery |
| IDOR |
No authorization check |
Access other user data |
Full CRUD on foreign resources |
SSRF Pipeline
| Attack |
Tier 1 (Pattern) |
Tier 2 (Payload) |
Tier 3 (Full) |
| Internal URL |
User-controlled URL fetch |
http://169.254.169.254 |
Cloud metadata extraction |
| DNS rebinding |
URL validation bypass |
Rebind to internal IP |
Internal service access |
| Protocol smuggling |
URL scheme not restricted |
file:///etc/passwd |
File content in response |
Agent Coordination
Orchestration Pattern
// Phase 1: Recon (parallel scans)
await Task("Security Scan", {
target: "./src",
layers: { sast: true, dast: true, dependencies: true, secrets: true }
}, "qe-security-scanner");
// Phase 2: Analysis (parallel review)
await Promise.all([
Task("Code Security Review", {
findings: phase1Results,
depth: "comprehensive"
}, "qe-security-reviewer"),
Task("Compliance Audit", {
findings: phase1Results,
frameworks: ["owasp-top-10"]
}, "qe-security-auditor")
]);
// Phase 3: Validation (graduated exploitation)
await Task("Exploit Validation", {
findings: [...phase1Results, ...phase2Results],
target_url: "https://staging.app.com",
exploitation_tier: 2,
vuln_types: ["injection", "xss", "auth", "ssrf"],
max_cost_usd: 15,
timeout_minutes: 30
}, "qe-pentest-validator");
// Phase 4: Report ("No Exploit, No Report" gate)
await Task("Security Quality Gate", {
findings: phase3Results.confirmedFindings,
gate: "no-exploit-no-report",
require_poc: true
}, "qe-quality-gate");
Finding Classification
| Status |
Meaning |
Action |
confirmed-exploitable |
Exploitation succeeded with PoC |
Report with evidence |
likely-exploitable |
Partial exploitation, defenses detected |
Report with caveats |
not-exploitable |
All exploitation attempts failed |
Filter from report |
inconclusive |
WAF/defense blocked, unclear if vulnerable |
Report for manual review |
Exploit Playbook Memory
Namespace Structure
aqe/pentest/
playbook/
exploit/{vuln_type}/{tech_stack}/{technique}
bypass/{defense_type}/{technique}
payload/{vuln_type}/{variant}
results/
validation-{timestamp}
poc/
{finding_id}-poc
Learning Loop
- Before validation: Query playbook for known patterns matching findings
- During validation: Try known payloads first (higher success rate)
- After validation: Store new successful patterns with confidence scores
- Over time: Agent converges on most effective payloads per tech stack
Cost Optimization
Estimated Cost by Scenario
| Scenario |
Tier Mix |
Findings |
Est. Cost |
Est. Time |
| PR check (source only) |
100% Tier 1 |
5 |
$0 |
<5s |
| Sprint validation |
70% T1, 30% T2 |
15 |
$2-5 |
5-10 min |
| Release validation |
40% T1, 40% T2, 20% T3 |
25 |
$8-15 |
15-30 min |
| Full pentest |
20% T1, 30% T2, 50% T3 |
40 |
$15-30 |
30-60 min |
Cost vs Shannon Comparison
| Metric |
Shannon |
AQE Pentest Validation |
| Cost per run |
~$50 |
$5-15 (graduated tiers) |
| Runtime |
60-90 min |
15-30 min (parallel pipelines) |
| False positive rate |
Low (exploit-proven) |
Low (same principle) |
| Learning |
None (static prompts) |
ReasoningBank playbook |
Success Metrics
| Metric |
Target |
Measurement |
| False positive reduction |
>60% of findings eliminated |
Pre/post validator comparison |
| Exploit confirmation rate |
>80% of confirmed findings truly exploitable |
Manual PoC verification |
| Cost per run |
<$15 USD |
Token tracking per pipeline |
| Time per run |
<30 minutes |
Execution time metrics |
| Playbook growth |
100+ patterns after 6 months |
Memory namespace count |
Related Skills
Remember
"No Exploit, No Report." A vulnerability scanner that can't prove exploitation delivers uncertain value. This skill transforms security findings from theoretical risks into proven vulnerabilities with evidence. Every confirmed finding comes with a reproducible proof-of-concept. Every false positive is eliminated before it reaches the report.
Think proof, not prediction. Don't report what MIGHT be vulnerable. Prove what IS vulnerable.