Advanced Quant Trading Platform - Orchestration Guide

```

maminul007 1 Updated 5mo ago

GitHub

Install

npx skillscat add maminul007/trading-platform

Install via the SkillsCat registry.

SKILL.md

Advanced Quant Trading Platform - Orchestration Guide

Quick Reference

Workflow	Command	Success Criteria
Deploy Strategy	`python scripts/operations/pre_deploy_checklist.py --env production`	All 50 checks pass
Kill Switch Test	`python scripts/operations/circuit_breaker_test.py --timing-only`	L1 < 1ms, L2-L4 < 10ms
Chaos Test	`python scripts/operations/chaos_engineering.py redis-fail --duration 10 --env staging`	Auto-recovery confirmed
Surveillance	`python scripts/compliance/trade_surveillance.py --dry-run`	No false positives
Incident Response	See 01-incident-response.md	P1 acknowledged < 2min

Core Workflows

1. Alpha Research to Production Pipeline

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│   Research  │───▶│  Backtest   │───▶│   Paper     │───▶│   Shadow    │───▶│    Live     │
│   (Idea)    │    │ Validation  │    │  Trading    │    │  Trading    │    │  Trading    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
     Gate 1            Gate 2            Gate 3            Gate 4            Gate 5

Stage 1: Research (Gate 1)

Entry: Strategy hypothesis documented
Validation:
- Theoretical Sharpe > 1.5
- Capacity estimation > $100K
- Data requirements identified
Exit: Research approved for backtest
Command: python scripts/run_backtest.py --strategy <name> --mode research

Stage 2: Backtest Validation (Gate 2)

Entry: Research gate passed
Validation:
- Backtest Sharpe > 1.2 (after costs)
- Max drawdown < 15%
- Win rate > 45%
- Profit factor > 1.3
Exit: Strategy approved for paper trading
Command: python scripts/run_backtest.py --strategy <name> --validate

Stage 3: Paper Trading (Gate 3)

Entry: Backtest gate passed
Duration: Minimum 2 weeks
Validation:
- Live Sharpe within 80% of backtest
- Execution slippage < 5bps
- Fill rate > 95%
Exit: Strategy approved for shadow trading
Runbook: docs/runbooks/04-deployment.md

Stage 4: Shadow Trading (Gate 4)

Entry: Paper trading gate passed
Duration: Minimum 1 week
Validation:
- Tracking error < 2%
- No adverse selection detected
- Risk metrics within limits
Exit: Strategy approved for live trading
Risk Limits: See services/risk/risk_manager.py:58-83

Stage 5: Live Trading (Gate 5)

Entry: All previous gates passed + pre-deploy checklist
Pre-Deploy: python scripts/operations/pre_deploy_checklist.py --env production
Monitoring: Continuous via Grafana dashboards
Runbook: docs/runbooks/01-incident-response.md

2. ML Model Lifecycle

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│    Train    │───▶│  Validate   │───▶│   Deploy    │───▶│   Monitor   │
│             │    │  (Offline)  │    │  (Canary)   │    │  (Drift)    │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘
                         │                                      │
                         │                                      │
                         └──────────────── Retrain ◀────────────┘

Training

Command: ./scripts/auto_train.sh
Data: Minimum 6 months historical data
Validation Split: 70/15/15 (train/val/test)
Metrics: Track loss, accuracy, feature importance

Validation (Offline)

Out-of-sample testing: Last 3 months
Walk-forward analysis: Monthly windows
Criteria:
- IC > 0.03
- IC decay < 20% over 5 days
- Feature stability > 0.8

Deployment (Canary)

Initial allocation: 10% of signals
Ramp schedule: 10% → 25% → 50% → 100%
Rollback trigger: Sharpe < 0.5 over 3 days

Monitoring (Drift Detection)

Feature drift: PSI > 0.2 triggers alert
Prediction drift: KL divergence > 0.1
Performance decay: Rolling Sharpe < min threshold
Action: Auto-disable model if thresholds breached

3. RL Agent Development (FinRL)

┌─────────────┐    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
│  Env Setup  │───▶│   Train     │───▶│  Constrain  │───▶│   Deploy    │
│  (FinRL)    │    │   Agent     │    │  (Safety)   │    │  (Sandbox)  │
└─────────────┘    └─────────────┘    └─────────────┘    └─────────────┘

FinRL-Specific Controls (from `services/risk/risk_manager.py:70-83`)

Control	Threshold	Description
`max_trades_per_day`	50	Prevents overtrading
`cooldown_seconds`	60	Global cooldown between trades
`symbol_cooldown_seconds`	300	Per-symbol cooldown (5 min)
`min_sharpe_ratio`	-1.0	Blocks if rolling Sharpe below
`min_total_return_pct`	-10.0	Blocks if total return below
`max_consecutive_wins`	10	Triggers greed cooldown
`greed_cooldown_seconds`	600	Cooldown after win streak

Safety Constraints

Action space clipping: Limit position changes to 10% per step
Reward shaping: Include risk-adjusted rewards
Episode termination: End on drawdown > 5%
Ensemble: Use multiple agents with voting

4. Risk Event Response

┌──────────────────────────────────────────────────────────────────┐
│                      Risk Event Detected                          │
└───────────────────────────┬──────────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
        ┌─────────┐   ┌─────────┐   ┌─────────┐
        │   P1    │   │   P2    │   │  P3/P4  │
        │Critical │   │  High   │   │Med/Low  │
        └────┬────┘   └────┬────┘   └────┬────┘
             │             │             │
             ▼             ▼             ▼
      Kill Switch     Investigate    Log/Monitor
        + Page          + Alert        + Track

Severity Matrix (from docs/runbooks/01-incident-response.md)

Severity	Response Time	Escalation	Examples
P1	Immediate	Page on-call + lead	Kill switch triggered, system down
P2	< 15 min	Page on-call	Single exchange down, high latency
P3	< 1 hour	Slack alert	Elevated error rates
P4	Next day	Email	Minor issues

Automatic Kill Switch Triggers (from docs/runbooks/02-kill-switch-operations.md)

Trigger	Threshold	Level
Daily loss	> $50,000	L1 Global
Drawdown	> 7.5%	L1 Global
Error rate	> 10% for 60s	L1 Global
Consecutive losses	> 5 trades	L3 Strategy
Position limit	> $100,000/symbol	L4 Symbol
Order rate spike	> 10x normal	L1 Global

5. Execution Optimization

┌─────────────────────────────────────────────────────────────────┐
│                    Execution Quality Loop                        │
│                                                                  │
│   ┌──────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐ │
│   │ Measure  │───▶│ Analyze  │───▶│ Optimize │───▶│ Validate │ │
│   │ Metrics  │    │ Causes   │    │ Params   │    │ Impact   │ │
│   └──────────┘    └──────────┘    └──────────┘    └──────────┘ │
│        ▲                                               │        │
│        └───────────────────────────────────────────────┘        │
└─────────────────────────────────────────────────────────────────┘

Key Metrics

Metric	Target	Alert Threshold
Order latency	< 10ms	> 50ms
Slippage	< 2bps	> 5bps
Fill rate	> 98%	< 95%
Cancel rate	< 5%	> 15%
Round-trip latency	< 100μs	> 500μs

Optimization Parameters

Order sizing: TWAP, VWAP, Implementation Shortfall
Timing: Market microstructure analysis
Venue selection: Smart order routing
Queue position: Limit order placement

Decision Trees

Decision Tree 1: Strategy Underperforming

Strategy Underperforming?
         │
         ▼
┌────────────────────┐
│ Check Regime Match │
└─────────┬──────────┘
          │
    ┌─────┴─────┐
    ▼           ▼
 Regime      Regime
 Changed?    Same
    │           │
    ▼           ▼
┌────────┐  ┌────────────┐
│Reduce  │  │Check Alpha │
│Position│  │   Decay    │
└────────┘  └─────┬──────┘
                  │
            ┌─────┴─────┐
            ▼           ▼
         Decayed    Stable
            │           │
            ▼           ▼
        ┌────────┐  ┌──────────┐
        │Retrain │  │ Check    │
        │ Model  │  │Execution │
        └────────┘  └────┬─────┘
                        │
                  ┌─────┴─────┐
                  ▼           ▼
              Slippage    Fill Rate
              High?       Low?
                  │           │
                  ▼           ▼
             ┌────────┐  ┌────────┐
             │Optimize│  │Adjust  │
             │ Timing │  │ Sizing │
             └────────┘  └────────┘
                  │           │
                  └─────┬─────┘
                        ▼
                 ┌────────────┐
                 │ Check      │
                 │Correlation │
                 │ Breakdown  │
                 └─────┬──────┘
                       │
                 ┌─────┴─────┐
                 ▼           ▼
             Correlated  Independent
                 │           │
                 ▼           ▼
            ┌────────┐  ┌────────┐
            │Diversify│ │Continue│
            │Signals │  │Monitor │
            └────────┘  └────────┘

Decision Tree 2: Model Decay Detected

Model Decay Detected
         │
         ▼
┌────────────────────┐
│ Validate Detection │
│ (False positive?)  │
└─────────┬──────────┘
          │
    ┌─────┴─────┐
    ▼           ▼
 False       True
 Positive    Decay
    │           │
    ▼           ▼
 Adjust     ┌────────────┐
 Threshold  │Check Feature│
            │Distribution │
            └─────┬───────┘
                  │
            ┌─────┴─────┐
            ▼           ▼
         Feature     Feature
         Drift       Stable
            │           │
            ▼           ▼
       ┌────────┐  ┌────────────┐
       │Update  │  │Check Target│
       │Features│  │Distribution│
       └────────┘  └─────┬──────┘
                        │
                  ┌─────┴─────┐
                  ▼           ▼
               Target      Target
               Drift       Stable
                  │           │
                  ▼           ▼
             ┌────────┐  ┌────────┐
             │Retrain │  │Retrain │
             │+ New   │  │Same    │
             │Data    │  │Features│
             └────────┘  └────────┘

Decision Tree 3: Execution Quality Issues

Execution Quality Issue
         │
         ▼
┌────────────────────┐
│ Identify Issue Type│
└─────────┬──────────┘
          │
    ┌─────┼─────────┐
    ▼     ▼         ▼
 Latency  Slippage  Fill Rate
 High     High      Low
    │       │         │
    ▼       ▼         ▼
┌──────┐ ┌──────┐ ┌──────┐
│Check │ │Check │ │Check │
│Infra │ │Order │ │Order │
│      │ │Size  │ │Type  │
└──┬───┘ └──┬───┘ └──┬───┘
   │        │        │
   ▼        ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐
│Redis │ │Too   │ │Limit │
│Slow? │ │Large?│ │vs Mkt│
└──┬───┘ └──┬───┘ └──┬───┘
   │        │        │
   ▼        ▼        ▼
Optimize  Reduce   Adjust
Pipeline  Size     Aggression
   │        │        │
   ▼        ▼        ▼
┌──────┐ ┌──────┐ ┌──────┐
│Check │ │Check │ │Check │
│HFT   │ │Timing│ │Queue │
│Core  │ │      │ │      │
└──────┘ └──────┘ └──────┘

Runbook Links

Category	Runbook	Description
Incidents	01-incident-response.md	P1-P4 response procedures
Kill Switch	02-kill-switch-operations.md	L1-L4 kill switch operations
Troubleshooting	03-troubleshooting.md	Common issue resolution
Deployment	04-deployment.md	Production deployment guide
DR	05-disaster-recovery.md	Disaster recovery procedures

Operations Scripts

Script	Purpose	Usage
`scripts/operations/pre_deploy_checklist.py`	50-point production readiness	`--env staging\|production`
`scripts/operations/circuit_breaker_test.py`	Kill switch timing validation	`--timing-only --env staging`
`scripts/operations/chaos_engineering.py`	Controlled failure injection	`redis-fail --duration 10`
`scripts/compliance/trade_surveillance.py`	Market manipulation detection	`--dry-run`

Compliance

Trade Surveillance Patterns

Pattern	Detection Criteria	Severity
Wash Trading	Same symbol, opposite sides, < 5s window	HIGH
Spoofing	Large order cancelled < 100ms	CRITICAL
Layering	3+ price levels cancelled sequentially	HIGH

See: scripts/compliance/trade_surveillance.py

Emergency Procedures

Immediate Actions

Kill Switch Activation

# L1 Global (fastest)
echo "1" > /dev/shm/hft_kill_switch

# Via Redis
redis-cli SET hft:kill_switch "ACTIVE"

# Via API
curl -X POST http://localhost:8000/api/v1/kill-switch/activate

Network Isolation

sudo iptables -A OUTPUT -d api.binance.com -j DROP

Emergency Contacts
- See docs/runbooks/01-incident-response.md#escalation-contacts

Health Checks

# All services
for svc in api executor risk market-ingest strategy-generator; do
  curl -s http://localhost:800X/health
done

# Pre-deployment
python scripts/operations/pre_deploy_checklist.py --env production

# Circuit breaker
python scripts/operations/circuit_breaker_test.py --timing-only

Advanced Quant Trading Platform - Orchestration Guide

Install

Advanced Quant Trading Platform - Orchestration Guide

Quick Reference

Core Workflows

1. Alpha Research to Production Pipeline

Stage 1: Research (Gate 1)

Stage 2: Backtest Validation (Gate 2)

Stage 3: Paper Trading (Gate 3)

Stage 4: Shadow Trading (Gate 4)

Stage 5: Live Trading (Gate 5)

2. ML Model Lifecycle

Training

Validation (Offline)

Deployment (Canary)

Monitoring (Drift Detection)

3. RL Agent Development (FinRL)

FinRL-Specific Controls (from services/risk/risk_manager.py:70-83)

Safety Constraints

4. Risk Event Response

Severity Matrix (from docs/runbooks/01-incident-response.md)

Automatic Kill Switch Triggers (from docs/runbooks/02-kill-switch-operations.md)

5. Execution Optimization

Key Metrics

Optimization Parameters

Decision Trees

Decision Tree 1: Strategy Underperforming

Decision Tree 2: Model Decay Detected

Decision Tree 3: Execution Quality Issues

Runbook Links

Operations Scripts

Compliance

Trade Surveillance Patterns

Emergency Procedures

Immediate Actions

Health Checks

Categories

Install

Recommended Skills

FinRL-Specific Controls (from `services/risk/risk_manager.py:70-83`)