adriannutiu

nightshift

Overnight repo quality audit — dead code, doc drift, test gaps, security, debt, infra drift. Run after big work sessions. Produces a morning report.

adriannutiu 9 Updated 3mo ago
GitHub

Install

npx skillscat add adriannutiu/nightshift/claude-code-nightshift

Install via the SkillsCat registry.

SKILL.md

Nightshift — Overnight Repo Audit

You are running an unattended overnight quality sweep. The user has gone to sleep. Be thorough, be accurate, and produce a report they can act on in the morning.

Quality bar: Opus-level analysis. This is a deep audit, not a surface scan. Think critically about every finding. False positives waste the user's morning — every finding must be worth reading.

Parse Arguments

Extract from $ARGUMENTS:

Argument Effect
(none) Core mode, standard budget, current repo
deep Core + deep checks
--budget fast|standard|max Runtime budget (default: standard)
--since <ref> Scope to changes since git ref/date (default: all files)
--focus <glob> Limit to matching paths
--infra Force SSH infrastructure drift checks on
--no-infra Force SSH infrastructure drift checks off

Budget modes — per-check behavior:

Check fast standard (default) max
1. Dead code Grep changed files only Grep full repo, read flagged files Grep full repo, read all source files
2. Doc drift Check paths in changed .md files Check all .md files All .md + verify commands still work
3. Test gaps Flag changed files without tests Same + run test suite if quick Same + run with coverage
4. Dependencies npm audit / pip-audit only Same + npm outdated / secrets scan Same + lockfile drift check
5. Code quality 5a toolchain only (lint/type) 5a + grep patterns + read flagged 5a + read ALL source files for 5b-5h
6. Tech debt Count TODOs (no blame) Blame flagged TODOs (30+ days) Blame all TODOs
7. Refactoring Skip entirely Read core business files Read every non-trivial file
8-10. Deep Skip even if deep Normal depth Exhaustive (300-commit churn)
11-14. Infra Normal (SSH is cheap) Normal Normal
Typical runtime 5-10 min 15-30 min 30-60 min
When to use Quick check after small PR Default overnight Pre-release, quarterly

The key lever is how many files Claude reads (expensive context) vs greps (cheap). fast greps and counts. standard reads the important ones. max reads everything.

Always record chosen budget and which checks were skipped/reduced due to budget in run.log.

Infra auto-detection: If neither --infra nor --no-infra is passed, auto-detect: enable infra checks when .nightshift-infra.yaml exists at repo root AND at least one SSH target defined in it is reachable (ssh -o ConnectTimeout=3 -o BatchMode=yes -- <host> true 2>/dev/null). Validate <host> against ssh_alias rules before probing. This makes infra opt-in via config file — repos without it skip infra entirely.

Phase 0: Setup

0.1 Detect Environment

pwd
git rev-parse --show-toplevel 2>/dev/null
git branch --show-current 2>/dev/null
git log --oneline -5 2>/dev/null

Identify:

  • Repo root and repo name
  • Tech stacks present (scan for: package.json, requirements.txt, pyproject.toml, Cargo.toml, go.mod, composer.json, Gemfile, docker-compose.yml, Dockerfile)
  • Project conventions (read CLAUDE.md at root + any subdirectory CLAUDE.md files)
  • Test patterns (where tests live, how they're run)
  • Whether knowledge/learnings/ exists (for learning capture)

0.2 Preflight: Sleep Prevention

Prevent the machine from sleeping during the overnight run.

macOS:

caffeinate -dims &
CAFFEINATE_PID=$!

At the end of the run (Phase 5), kill it: kill $CAFFEINATE_PID 2>/dev/null

Linux: No universal sleep prevention command. If systemd-inhibit is available:

systemd-inhibit --what=idle --who=nightshift --why="Overnight audit" --mode=block sleep infinity &
INHIBIT_PID=$!

At the end of the run: kill $INHIBIT_PID 2>/dev/null. If neither command exists, log a warning in run.log and continue — the audit still works, the machine may just sleep.

Windows:

powershell -Command "Add-Type -MemberDefinition '[DllImport(\"kernel32.dll\")] public static extern uint SetThreadExecutionState(uint esFlags);' -Name Win32 -Namespace API; [API.Win32]::SetThreadExecutionState(0x80000003)"

At the end of the run (Phase 5), reset with a self-contained invocation (the type from the first call doesn't persist across processes):

powershell -Command "Add-Type -MemberDefinition '[DllImport(\"kernel32.dll\")] public static extern uint SetThreadExecutionState(uint esFlags);' -Name Win32 -Namespace API; [API.Win32]::SetThreadExecutionState(0x80000000)"

Detect platform with uname -s 2>/dev/null — use caffeinate on Darwin, systemd-inhibit on Linux, PowerShell on Windows/MINGW/MSYS.

0.3 Create Run Directory

.nightshift/runs/YYYY-MM-DD_HH-mm-ss/
├── run.log              # Full execution trace (append throughout)
├── findings.md          # Technical detail per check
├── executive-report.md  # Prioritized morning brief
├── summary.json         # Machine-readable results
└── raw/                 # Tool outputs, grep results, etc.

Create the directory structure. Initialize run.log with:

Nightshift started: [ISO8601 timestamp]
Repo: [name] at [path]
Mode: [core|deep]
Args: [raw arguments]

Append to run.log at the start and end of each check:

[HH:MM:SS] CHECK 1 START: Dead Code & Unused Artifacts
[HH:MM:SS] CHECK 1 END: 3 findings (0 critical, 1 high, 2 medium) [12s]

If .nightshift/ doesn't exist in .gitignore, note it in the report — the user should add it.

0.4 Load Previous Run (Trend Tracking)

If a previous run exists in .nightshift/runs/, load its summary.json. This enables classifying each finding as:

  • new: Not present in previous run
  • existing: Was present before, still present
  • resolved: Was present before, no longer found

Prioritize report ordering: new critical/high findings first, then severity regressions, then unresolved backlog, then resolved items.

0.5 Load .nightshift-ignore (Suppression Rules)

If .nightshift-ignore exists at repo root, parse suppression rules:

Rule format (one per line, semicolon-separated key/value pairs):

id=AUTH-REFRESH-DUP; reason=Known backlog item; expires=2026-03-31
check=refactoring; path=legacy/**; reason=Queued for rewrite; expires=never
check=doc-drift; path=docs/archive/**; reason=Archived docs; expires=never
id=SEC-OLD-123; reason=Accepted until migration; expires=2026-04-15; allow_critical=true

Keys: id (exact finding ID), check (check name), path (glob), reason (required), expires (YYYY-MM-DD or never), allow_critical (true|false, default false).

Safety rules:

  • Suppressed findings stay visible in a dedicated report section — never deleted
  • Critical findings cannot be suppressed unless rule has allow_critical=true
  • Expired rules are ignored and reported
  • Rules missing reason are invalid and reported
  • Record rule counts in run.log: total / valid / invalid / expired

0.6 Scope Detection

If --since was provided, get the changed file list:

git diff --name-only <ref>..HEAD

If --focus was provided, filter to matching paths.

Otherwise, scope is the entire repo (respecting .gitignore).


Phase 1: Deterministic Checks

Non-mutating rule (CRITICAL): Nightshift never modifies project source files, configs, or dependencies. Before running any project script (npm run lint, npm test, etc.), inspect the script command first. Skip anything containing mutating flags: --fix, --write, format, prettier --write, eslint --fix, ruff --fix, codegen, or migration commands. When in doubt, run the direct tool command instead of the project script.

Note on toolchain side effects: Some read-only checks (build validation, test runners) may create cache artifacts in their own directories (e.g., node_modules/.cache/, .pytest_cache/, dist/). This is standard toolchain behavior. The "read-only" guarantee means Nightshift never edits, deletes, or overwrites your source code, configs, or committed files — not that zero bytes are written to transient caches.

Run these sequentially. For each check, write raw output to raw/ and structured findings to memory. Be conservative — only flag things you're confident about.

Finding classification: Tag every finding with:

  • severity: critical / high / medium / low / info
  • confidence: high / medium / low (low confidence → normally downgrade one severity level, except confirmed critical security/data-loss)
  • kind: defect (correctness, security, operational risk) or improvement (DRY, refactor, maintainability)
  • state: new / existing / resolved (if previous run data available)

Check 1: Dead Code & Unused Artifacts

Goal: Find files, exports, and imports that nothing references.

  1. Orphan files: Find files not imported/required by anything else:

    • For JS/TS: grep for import.*from and require( patterns, cross-reference with all source files
    • For Python: grep for import and from...import patterns
    • Exclude: test files, config files, entry points, scripts meant to be run directly
  2. Unused exports: Find exported symbols not imported anywhere else in the codebase.

  3. Dead config: Look for config entries referencing files/paths/modules that don't exist.

  4. Stale scripts: Check package.json scripts, Makefile targets, or shell scripts that reference missing files or commands.

Confidence labeling:

  • High: File has zero inbound references and isn't an entry point → flag
  • Medium: Export has zero external references but file is imported → note
  • Low: Might be used dynamically (string interpolation, reflection) → skip unless deep mode

Check 2: Documentation Drift

Goal: Find docs that reference stale state — wrong paths, removed features, outdated instructions.

  1. Path references: Extract all file paths mentioned in .md files. Check each exists:

    Grep for patterns like `path/to/file`, backtick-wrapped paths, and markdown links
    Verify each referenced path exists on disk
  2. Code references: Find function/class/variable names mentioned in docs. Verify they still exist in the codebase.

  3. Instruction drift: For CLAUDE.md and README files, check if:

    • Referenced commands still work (e.g., npm run X — does script X exist?)
    • Referenced URLs/endpoints are still present in code
    • Version numbers match actual installed versions
  4. Stale dates: Flag dates in docs older than 90 days that suggest the section hasn't been reviewed.

  5. Size check: Flag .md files over 500 lines (likely need splitting or pruning).

Check 3: Test Gap Detection

Goal: Find production code that lacks test coverage signals.

  1. File-level gaps: For each source file, check if a corresponding test file exists:

    • src/foo.jstests/foo.test.js or test/test_foo.py or similar patterns
    • Weight by file complexity (larger files without tests are higher priority)
  2. Recent changes without tests: If --since was used, flag changed production files where no test file was also changed.

  3. Test health:

    • Run test suite if safe to do (detect with: npm test, pytest, cargo test, etc.)
    • If tests exist but fail, flag as High (Critical only if the failure is in a critical path — auth, billing, data persistence)
    • If no test runner detected, note as info
  4. Coverage gaps (if coverage tool available):

    • Look for .coverage, coverage/, lcov.info, etc.
    • If found, parse for files below 50% coverage

Check 4: Dependency & Security Scan

Goal: Find known vulnerabilities and outdated dependencies.

  1. Ecosystem audit tools (run whichever apply):

    • Node.js: npm audit --json 2>/dev/null
    • Python: pip-audit --format json 2>/dev/null or safety check --json 2>/dev/null
    • Rust: cargo audit --json 2>/dev/null
    • Go: govulncheck ./... 2>/dev/null

    If the tool isn't installed, skip gracefully and note it.

  2. Outdated dependencies:

    • npm outdated --json 2>/dev/null
    • pip list --outdated --format json 2>/dev/null
  3. Hardcoded secrets scan:

    • Grep for patterns: API keys, tokens, passwords, connection strings
    • Patterns: (?i)(api[_-]?key|secret|password|token|credential|auth)\s*[:=]\s*['"][^'"]{8,}
    • Exclude: .env.example, test fixtures, documentation examples
    • Flag any matches in committed files as Critical
  4. Lockfile drift: Check if lockfiles (package-lock.json, poetry.lock, etc.) are committed and up to date.

Check 5: Code Quality, Correctness & Static Analysis

Goal: Comprehensive code quality sweep — lint violations, type errors, build issues, static bug patterns, and logic defects. This is the deepest check and where Opus-level analysis matters most.

5a. Run Existing Toolchain (if available)

Before manual analysis, run whatever tooling the repo already has configured. Record all output in raw/:

Linters:

  • JS/TS: npx eslint . --format json 2>/dev/null or check for .eslintrc* and run
  • Python: ruff check . --output-format json 2>/dev/null or flake8 --format json 2>/dev/null or pylint --output-format json 2>/dev/null
  • Shell: shellcheck -f json *.sh 2>/dev/null (check all .sh files)
  • YAML/JSON: yamllint . 2>/dev/null

Type checkers:

  • TS: npx tsc --noEmit --pretty 2>/dev/null (if tsconfig.json exists)
  • Python: mypy . --ignore-missing-imports 2>/dev/null or pyright 2>/dev/null (if py.typed or type hints detected)

Build validation:

  • Node: npm run build 2>/dev/null (if build script exists in package.json)
  • Python: python -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" <file> for each .py file (catches syntax errors without writing bytecode)
  • Docker: docker compose config -q 2>/dev/null (if docker-compose.yml exists — validates YAML + variable refs)

Test runners (run but don't fail the audit if tests fail — record results):

  • npm test 2>/dev/null (if test script exists)
  • python -m pytest --tb=short -q 2>/dev/null (if pytest available and test files exist)

For each tool: if not installed or not applicable, skip with reason in run.log. If the tool runs and finds issues, capture them as findings. Don't double-count — if ESLint flags an issue, don't also flag it manually.

5b. Error Handling & Failure Modes

  1. Silent error swallowing:

    • try/catch with empty catch body or catch that only logs (no re-throw, no return)
    • catch (e) { /* ignore */ } or catch (e) { console.log(e) } without recovery logic
    • Python bare except: or except Exception: with pass
    • .catch(() => {}) on Promises
  2. Missing error handling:

    • await calls without try/catch in functions that don't propagate errors
    • File I/O without error handling (especially in scripts that run unattended)
    • HTTP/fetch calls without checking response status
    • JSON.parse without try/catch on external data
    • Missing error callbacks in Node.js streams/event emitters
  3. Shell script robustness:

    • Missing set -e or set -euo pipefail at the top
    • Unquoted variables ($VAR instead of "$VAR") — word splitting/globbing risk
    • Missing exit code checks after critical commands
    • cd without checking success (cd /dir && ... vs bare cd /dir)
    • Heredocs/temp files without cleanup on exit (missing trap)

5c. Security & Injection Patterns

  1. SQL injection:

    • String concatenation/interpolation in SQL queries (f-strings, template literals, +)
    • cursor.execute(f"SELECT ... WHERE x = '{var}'") patterns
    • Raw SQL in ORMs bypassing parameterization
  2. Command injection:

    • subprocess / child_process.exec with shell=True and variable input
    • os.system() with string formatting
    • Template literals in shell command strings
  3. Path traversal:

    • User-influenced paths without sanitization (../ not stripped)
    • os.path.join(base, user_input) without verifying result stays under base
  4. Code execution:

    • eval(), exec(), Function() with any dynamic input
    • pickle.load() / yaml.load() (without Loader=SafeLoader)
    • importlib.import_module() with dynamic strings from external input
  5. Hardcoded sensitive data:

    • API keys, tokens, passwords, connection strings in source code
    • Pattern: (?i)(api[_-]?key|secret|password|token|credential|auth|private[_-]?key)\s*[:=]\s*['"][^'"]{8,}
    • Also check for: AWS access keys (AKIA...), private keys (-----BEGIN), JWT tokens (eyJ...)
    • Exclude: .env.example, test fixtures, documentation examples, gitignored files
  6. Crypto misuse:

    • MD5/SHA1 for anything security-related (use SHA256+)
    • Hardcoded IV/nonce in encryption
    • ECB mode usage

5d. Logic Defects & Bug Patterns

  1. Dead/unreachable code:

    • Code after unconditional return, break, continue, raise, throw, sys.exit()
    • Conditions that are always true/false based on types
    • Catch clauses that can never trigger (wrong exception type)
  2. Copy-paste bugs:

    • Identical if and else branches
    • Switch/case fallthrough without break (JS) or duplicate case values
    • Repeated conditions in if/elif chains
  3. Null/undefined safety:

    • Property access on potentially null/undefined values without guards
    • Optional chaining (?.) mixed with non-optional access on same variable
    • Array.find() result used without null check
    • Dict/object key access without in check or .get() with default
  4. Off-by-one and bounds:

    • <= vs < in loop bounds near .length or len()
    • Array index at .length (one past end)
    • Fence-post errors in pagination/slicing
  5. Assignment vs comparison:

    • = instead of ==/=== in conditions (JS especially)
    • is vs == misuse in Python (comparing values vs identity)
  6. Floating point:

    • Direct equality comparison of floats (0.1 + 0.2 === 0.3)
    • Currency/financial values stored as floats instead of ints/Decimal
  7. Shadowing & scope:

    • Variable shadowing outer scope variables (same name, different binding)
    • var instead of let/const in modern JS
    • Python mutable default arguments (def foo(items=[]))

5e. Async & Concurrency

  1. Missing await:

    • Async function called without await (fire-and-forget that loses errors)
    • Promise returned but not awaited in async context
  2. Race conditions:

    • Shared mutable state modified across async operations without coordination
    • Check-then-act patterns without atomicity (TOCTOU)
    • Multiple concurrent writes to same file/resource
  3. Promise anti-patterns:

    • Mixing .then() and await in same function
    • new Promise() wrapping an already-async operation (unnecessary wrapping)
    • .then(x => x) identity chains (no-op)
    • Missing .catch() on promise chains that aren't awaited
  4. Event loop blocking:

    • Synchronous file I/O (fs.readFileSync) in async/server code paths
    • CPU-intensive loops in event handlers without yielding
    • JSON.parse on unbounded input without size limits

5f. Resource Management

  1. Leaks:

    • File handles opened without with (Python) or without .close() in finally
    • Database connections/cursors not released
    • Event listeners added without corresponding removal
    • setInterval without clearInterval path
    • Child processes spawned without cleanup on parent exit
  2. Memory concerns:

    • Unbounded arrays/lists that grow without limit (logs, caches, history)
    • Closures capturing large scope unnecessarily
    • Circular references preventing garbage collection

5g. Type Safety (language-specific)

TypeScript:

  • any type annotations (grep for : any, as any)
  • @ts-ignore / @ts-expect-error comments
  • Non-null assertions (!) on values that could genuinely be null
  • Type assertions (as Type) that narrow unsafely

Python:

  • Missing type hints on public functions (when other functions have them — inconsistency)
  • # type: ignore comments
  • cast() calls that may be incorrect
  • Union types handled without narrowing

JavaScript:

  • Loose equality (==, !=) in non-trivial comparisons (not == null idiom)
  • typeof checks that miss edge cases (typeof null === "object")
  • Implicit type coercion in arithmetic/comparisons

5h. Configuration & Build Issues

  1. Config consistency:

    • .env.example vars that don't match actual .env usage in code
    • Docker compose environment variables referenced but not defined
    • Config file references to nonexistent files/paths
  2. Dependency issues:

    • Import of a module not in package.json/requirements.txt (missing dependency)
    • Circular imports between modules
    • Importing from devDependencies in production code
  3. Build & deploy:

    • Dockerfile COPY of files that don't exist or are gitignored
    • Missing build artifacts that the deploy process expects
    • Environment-specific code without proper guards (process.env.NODE_ENV)

Approach: For each source file, read the code and analyze it. Don't just grep for patterns — understand the logic flow. Use grep/glob for high-signal patterns first (e.g., eval(, shell=True, empty catch blocks), then deep-read the flagged files + all core business logic files. Prioritize:

  • Recently changed files (if --since is used)
  • Core business logic (not config/boilerplate)
  • Large and complex files (more surface area for bugs)
  • Files in hot paths (request handlers, data pipelines, backup scripts)

Confidence labeling:

  • High: Pattern is definitively a bug, vulnerability, or will cause runtime failure → flag
  • Medium: Pattern is suspicious but might be intentional → flag with note
  • Low: Style preference or minor concern → skip in core mode, include in deep mode

Check 6: Tech Debt Signals

Goal: Surface accumulating debt markers.

  1. TODO/FIXME/HACK aging:

    • Find all TODO, FIXME, HACK, XXX, TEMP, WORKAROUND comments
    • Use git blame to date them
    • Flag any older than 30 days as stale (medium)
    • Flag any older than 90 days as high (critical only if in auth, billing, data persistence, or backup paths)
  2. Suppression abuse:

    • // eslint-disable, # noqa, # type: ignore, @SuppressWarnings, # nosec
    • Count total and flag files with 3+ suppressions
  3. Large files: Flag source files over 500 lines (likely need splitting).

  4. Complex functions: Look for functions over 50 lines (heuristic: count lines between function definition and closing brace/dedent).

  5. Dependency count: Flag if package.json has 50+ dependencies or requirements.txt has 30+ entries.


Check 7: Refactoring & DRY Opportunities

Goal: Find code that should be consolidated, abstracted, or restructured for maintainability. This goes beyond "duplicate logic" (deep mode) — it looks for structural improvements even without exact duplication.

  1. Repeated patterns across files:

    • Similar error handling blocks (e.g., try/catch with same structure repeated 3+ times)
    • Similar data transformation pipelines (fetch → parse → validate → store)
    • Similar config/setup boilerplate that could be extracted to a shared helper
    • Same validation logic implemented differently in multiple places
  2. Extract-worthy blocks:

    • Functions over 30 lines that do multiple distinct things (should be split)
    • Nested conditionals 3+ levels deep (should be flattened or extracted)
    • Long parameter lists (5+ params) suggesting a config object/class is needed
    • Repeated inline constants (magic numbers/strings used 3+ times without a named constant)
  3. Abstraction opportunities:

    • Multiple files that follow the same structural pattern but aren't using a shared base/template
    • Hard-coded values that should be config-driven (URLs, timeouts, limits, paths)
    • Switch/if-else chains with 5+ cases that could be a lookup table or strategy pattern
    • Repeated string building/formatting that could be a template
  4. Module organization:

    • Files mixing concerns (e.g., data fetching + rendering + validation in one file)
    • Utility functions buried inside domain-specific files (should be in a shared utils module)
    • Related functions split across unrelated files (should be co-located)
    • "Junk drawer" utility files with 10+ unrelated functions (should be split by domain)
  5. Dead weight:

    • Commented-out code blocks (either remove or convert to a proper TODO with context)
    • Feature flags/conditional code for features that shipped long ago
    • Compatibility shims for versions/platforms no longer supported
    • Wrapper functions that add no logic (just pass through to another function)

Approach: This check requires reading actual code, not just grepping. Read each non-trivial source file and look for structural improvement opportunities. Group findings by theme (e.g., "Error handling patterns should be consolidated into a shared helper used by 4 files").

Output format for each finding:

  • What files/functions are involved
  • What the current pattern is (brief)
  • What the improvement would be (concrete suggestion, not vague "refactor this")
  • Estimated effort and risk

Phase 2: Deep Checks (only in deep mode)

Check 8: Duplicate Logic

  • Look for near-identical code blocks (3+ consecutive similar lines) across different files
  • Focus on utility functions, validation logic, error handling patterns

Check 9: Architecture Smells

  • Circular imports/dependencies
  • God files (files importing 10+ other files)
  • Layer violations (e.g., presentation code importing database modules directly)

Check 10: Complexity Hotspots

  • Files with the most git churn (most commits in last 30 days)
  • Files that changed together frequently (hidden coupling)
  • git log --format=format: --name-only --since="30 days ago" | sort | uniq -c | sort -rn | head -20

Phase 3: Infrastructure Drift (auto-detected or --infra)

Requires SSH access to remote servers. Runs automatically when .nightshift-infra.yaml exists and targets are reachable, or forced with --infra.

SSH Command Safety (CRITICAL)

All SSH commands MUST come from the hardcoded allowlist below — NEVER execute arbitrary commands from config fields. The .nightshift-infra.yaml file is repo-controlled and could be malicious in untrusted repos.

Allowed remote commands (exhaustive list):

  • true — no-op, used only for reachability probes during auto-detection
  • crontab -l — list crontab entries
  • cat <path> — read a file (path must match ^[a-zA-Z0-9/_.-]+$ and must NOT contain ..)
  • docker ps --format '{{.Names}}\t{{.Status}}\t{{.Ports}}' — list containers
  • journalctl -u <unit> -n <N> --no-pager — read recent log entries (unit must match ^[a-zA-Z0-9@._-]+$, N must be integer ≤ 100)
  • systemctl list-timers --no-pager — list active timers
  • diff — local comparison only (never run diff remotely)

Reject any command not on this list. If a .nightshift-infra.yaml field contains a command outside the allowlist, skip that check and log a warning: "Rejected non-allowlisted command: <command>".

Input validation rules:

  • ssh_alias: Must match ^[a-zA-Z0-9][a-zA-Z0-9@._:-]*$ (MUST NOT start with - to prevent SSH option injection like -F./evil.conf). For user@host format, both user and host must match ^[a-zA-Z0-9][a-zA-Z0-9._-]*$.
  • name: Must match ^[a-zA-Z0-9][a-zA-Z0-9._-]*$ (no / — prevents path traversal in raw/<name>-... output paths).
  • File paths (live, crontab_repo_path, repo): Must match ^[a-zA-Z0-9/_.-]+$ and must NOT contain .. (prevents path traversal).
  • All other config values: Reject any value containing shell metacharacters (;, |, &, $, `, (, ), {, }, <, >, \n, ', ").

SSH -- separator: Always use -- before the hostname to prevent option injection: ssh -o ConnectTimeout=3 -o BatchMode=yes -- <ssh_alias> '<command>'.

.nightshift-infra.yaml Format

Place this file at repo root to enable infra checks:

# .nightshift-infra.yaml — define SSH targets and script mappings for infra drift checks
targets:
  - name: web-server
    ssh_alias: my-web-server       # SSH alias or user@host
    checks:
      - crontab                    # Compare live crontab with repo version
      - backup                     # Check backup recency and logs
      - scripts                    # Compare deployed scripts with repo versions
    crontab_repo_path: deploy/crontabs/web-server.crontab
    backup_journal_unit: backup.service   # Used with journalctl -u (allowlisted)
    backup_journal_lines: 5               # Max lines to fetch (integer, ≤ 100)
    script_mappings:
      - repo: deploy/scripts/backup.sh
        live: /opt/scripts/backup.sh
      - repo: deploy/scripts/health-check.sh
        live: /opt/scripts/health-check.sh

  - name: app-server
    ssh_alias: my-app-server
    checks:
      - crontab
      - containers               # Check Docker container health
      - scripts
    crontab_repo_path: deploy/crontabs/app-server.crontab
    script_mappings:
      - repo: deploy/scripts/db-backup.sh
        live: /opt/scripts/db-backup.sh

Check 11: Crontab Drift

For each target with crontab check enabled:

ssh -- <ssh_alias> 'crontab -l' > raw/<name>-crontab-live.txt
diff <crontab_repo_path> raw/<name>-crontab-live.txt

Flag any differences.

Check 12: Container Health

For each target with containers check enabled, use the hardcoded allowlisted command:

ssh -- <ssh_alias> "docker ps --format '{{.Names}}\t{{.Status}}\t{{.Ports}}'"
  • Flag any containers not running or in restart loops
  • Flag containers with uptime < 1 hour (recent restarts)

Check 13: Backup Verification

For each target with backup check enabled, construct the command from validated config fields:

ssh -- <ssh_alias> "journalctl -u <backup_journal_unit> -n <backup_journal_lines> --no-pager"

Where backup_journal_unit matches ^[a-zA-Z0-9@._-]+$ and backup_journal_lines is an integer ≤ 100. Reject otherwise.

  • Flag if last backup was >26 hours ago
  • Flag if backup logs show errors

Check 14: Script Drift

For each target with scripts check enabled, iterate over script_mappings:

ssh -- <ssh_alias> "cat <live>" > raw/<name>-<script-basename>-live.sh
diff <repo> raw/<name>-<script-basename>-live.sh

Where <live> path matches ^[a-zA-Z0-9/_.-]+$. Reject paths with shell metacharacters.

Flag any differences. Focus on backup and security scripts — drift here is highest risk.


Phase 4: AI Synthesis

After all deterministic checks complete, synthesize findings.

4.1 Deduplicate and Cluster

  • Group related findings (e.g., multiple stale TODOs in the same file → one finding)
  • Remove low-confidence items that are likely false positives
  • Merge overlapping findings from different checks

4.2 Apply Suppression Rules

If .nightshift-ignore was loaded:

  • Match findings against rules (by id, check+path, or check alone)
  • Suppress matching findings with non-expired rules
  • Keep suppressed findings in a dedicated "Suppressed Findings" section — never hide them
  • Do NOT suppress critical findings unless the matching rule has allow_critical=true
  • Report expired rules and invalid rules

4.3 Trend Classification

If previous run data was loaded:

  • Compare findings by fingerprint (check + file + description hash)
  • Mark each as new, existing, or resolved
  • Prioritize: new critical/high → severity regressions → existing → resolved

4.4 Severity Classification

Severity Criteria Action Timeline
Critical Exploitable security issue, data loss/corruption risk, broken backups, release blocker that causes outage Fix today
High Correctness gate failures in critical paths, high-confidence bug patterns in production, major infra drift Fix this week
Medium Non-critical gate failures with regression risk, meaningful test gaps, substantial doc drift, significant refactor with reliability benefit Fix this sprint
Low Localized maintainability debt, minor test/doc gaps, low-risk refactor opportunities Backlog
Info Observational findings, useful for trend tracking only, no immediate action Note

Confidence adjustment: If confidence is low, downgrade severity by one level — except confirmed critical security/data-loss findings which stay critical regardless.

Kind separation: defect findings (bugs, security, operational risk) always rank above improvement findings (DRY, refactor) at the same severity level.

4.5 Effort Estimation

For each finding, estimate:

  • S (< 30 min): Config fix, delete dead code, update doc reference
  • M (30 min - 2 hours): Write missing tests, refactor function, update deps
  • L (2+ hours): Major refactor, architecture change, security overhaul

Phase 5: Write Reports

5.1 findings.md (Technical Detail)

# Nightshift Findings — [REPO] — [DATE]

**Mode:** core|deep | **Scope:** [all files | since X | focus Y]
**Duration:** [X minutes] | **Checks run:** [N/N]

---

## Check 1: Dead Code & Unused Artifacts
### Findings
- [Finding with file:line references, evidence, confidence level]

### Skipped / Not Applicable
- [Reason if check was skipped]

## Check 2: Documentation Drift
...

## Check N: ...

5.2 executive-report.md (Morning Brief)

# Nightshift Report — [REPO] — [DATE]

## What Was Checked
[1-2 sentences: scope, mode, what checks ran]

## Needs Attention First

| # | Finding | Severity | Effort | File(s) |
|---|---------|----------|--------|---------|
| 1 | [Concise description] | Critical | S | `path/to/file` |
| 2 | ... | High | M | ... |

## Quick Wins (effort: S)
[Bullet list of effort=S findings that can be fixed in < 30 minutes each]

## Can Wait
[Bullet list of medium/low findings — backlog material]

## Risk Snapshot

| Severity | Count | New | Existing |
|----------|-------|-----|----------|
| Critical | N | N | N |
| High | N | N | N |
| Medium | N | N | N |
| Low | N | N | N |
| Info | N | N | N |

## Resolved Since Last Run
[Findings from previous run that are no longer present — omit section if no prior run]

## Suppressed Findings
[Findings matched by `.nightshift-ignore` rules — omit section if no rules exist or no matches]

| Finding | Suppression Reason | Expires |
|---------|-------------------|---------|
| ... | ... | ... |

## Checks Skipped
[Any checks that couldn't run and why — e.g., "pip-audit not installed", "skipped due to budget=fast"]

## Confidence Notes
[Anything the AI isn't sure about — "Check 1 may have false positives for dynamically imported modules"]

5.3 summary.json

Schema contract (v1) — shared across Claude Code and Codex nightshift skills. Downstream tooling can rely on these enums:

Field Valid Values
engine "claude", "codex"
checks[].status "pass" (ran, no issues), "warn" (ran, non-blocking issues), "fail" (ran, blocking issues), "skipped" (did not run)
checks[].name Core: "dead-code", "doc-drift", "test-gaps", "dependency-security", "code-quality", "tech-debt", "refactoring". Deep: "duplicate-logic", "architecture-smells", "complexity-hotspots". Infra: "crontab-drift", "container-health", "backup-verification", "script-drift"
severity "critical", "high", "medium", "low", "info"
effort "S" (< 30 min), "M" (30 min - 2 hours), "L" (2+ hours)
kind "defect", "improvement"
confidence "high", "medium", "low"
state (per finding) "new", "existing", "resolved"
mode "core", "deep"
budget "fast", "standard", "max"

Conditional fields (include when applicable, omit otherwise):

  • code_health: Include when any toolchain gate (lint/typecheck/build/tests) was run. Each gate has status (pass|warn|fail|skipped), scope (changed|full), first_failure, evidence.
  • infra[]: Include when infra checks ran. Each entry has target, check, status, details.
  • previous_run: Include when a prior run was found. Has run_id, available.

Always present: schema_version, engine, run_id, repo, repo_root, branch, mode, budget, scope, since, infra_enabled, timestamp_start, timestamp_end, duration_sec, checks[], counts, trend, ignore_rules, top_actions, suppressed_findings.

{
  "schema_version": "1",
  "engine": "claude",
  "run_id": "YYYY-MM-DD_HH-mm-ss",
  "repo": "repo-name",
  "repo_root": "/path/to/repo",
  "branch": "main",
  "mode": "core",
  "budget": "standard",
  "scope": ".",
  "since": null,
  "infra_enabled": true,
  "previous_run": { "run_id": "...", "available": true },
  "timestamp_start": "2026-02-17T02:00:00Z",
  "timestamp_end": "2026-02-17T02:18:00Z",
  "duration_sec": 1080,
  "checks": [
    {
      "name": "dead-code",
      "status": "pass",
      "findings_count": 0,
      "severity_counts": { "critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0 },
      "duration_sec": 45,
      "notes": "",
      "skip_reason": null
    },
    {
      "name": "code-quality",
      "status": "warn",
      "findings_count": 3,
      "severity_counts": { "critical": 0, "high": 1, "medium": 2, "low": 0, "info": 0 },
      "duration_sec": 120,
      "notes": "Typecheck warnings in auth scope",
      "skip_reason": null
    },
    {
      "name": "dependency-security",
      "status": "skipped",
      "findings_count": 0,
      "severity_counts": { "critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0 },
      "duration_sec": 0,
      "notes": "",
      "skip_reason": "pip-audit not installed; no package.json in scope"
    }
  ],
  "counts": { "critical": 0, "high": 1, "medium": 2, "low": 0, "info": 0, "skipped": 1, "suppressed": 1 },
  "trend": {
    "new_findings": 2,
    "existing_findings": 1,
    "resolved_findings": 0,
    "net_risk": "up"
  },
  "ignore_rules": {
    "file_present": true,
    "loaded": 2,
    "valid": 2,
    "expired": 0,
    "invalid": 0,
    "applied": 1,
    "critical_blocked": 0
  },
  "code_health": {
    "lint": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#lint" },
    "typecheck": { "status": "warn", "scope": "changed", "first_failure": "src/config.js:42 missing type", "evidence": "run.log#typecheck" },
    "build_check": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#build" },
    "tests": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#tests" }
  },
  "top_actions": [
    {
      "title": "Fix hardcoded API key in config.js",
      "severity": "critical",
      "effort": "S",
      "kind": "defect",
      "state": "new",
      "confidence": "high",
      "file": "src/config.js",
      "check": "dependency-security",
      "why": "Hardcoded API key committed to source control.",
      "next_step": "Move key to environment variable and rotate the exposed key."
    }
  ],
  "suppressed_findings": [
    {
      "id": "DRY-004",
      "check": "refactoring",
      "severity": "low",
      "reason": "Legacy module queued for rewrite",
      "rule": "check=refactoring;path=src/legacy/**",
      "expires": "never"
    }
  ],
  "infra": [
    {
      "target": "web-server",
      "check": "crontab-drift",
      "status": "pass",
      "details": "Live crontab matches repo."
    }
  ]
}

5.4 Knowledge Note (conditional)

If knowledge/learnings/ exists in the repo, write a concise note:

knowledge/learnings/nightshift-YYYY-MM-DD.md

Content:

# Nightshift Audit — [DATE]

**[N] findings** ([critical], [high], [medium], [low]) across [N] checks.

Top actions:
1. [Most important finding]
2. [Second most important]
3. [Third most important]

Full report: `.nightshift/runs/YYYY-MM-DD_HH-mm-ss/executive-report.md`

Execution Rules

  1. No modifications. Nightshift is read-only. Never edit, fix, commit, or deploy anything. The only files you write are in .nightshift/ and optionally knowledge/learnings/.

  2. Infra follows auto-detection. SSH checks run when .nightshift-infra.yaml exists and targets are reachable, or forced with --infra. Suppressed with --no-infra. Never SSH without meeting one of these conditions.

  3. Conservative findings. When uncertain, note your confidence level. A finding you're 60% sure about should say so. Never present a guess as a fact.

  4. Skip gracefully with reason. If a check tool isn't installed or a check can't run, record it in run.log AND in the "Checks Skipped" report section with the specific reason (e.g., "pip-audit: command not found", "no Python files in scope"). Never fail the whole run because one check fails. Each check is independent — an error in Check 3 must not prevent Check 4 from running.

  5. Time awareness. Log start/end times to run.log. If a single check takes more than 5 minutes, note it. The whole run should complete in under 30 minutes for a medium repo.

  6. Respect .gitignore. Never scan node_modules/, __pycache__/, .git/, or other ignored directories.

  7. No network requests except SSH. Don't fetch URLs, hit APIs, or check external services (except VPS SSH when infra checks are active).

  8. Subagent usage. For large repos, you MAY spawn haiku subagents for parallel grep/glob work. But synthesis MUST be done by Opus. Quality of analysis is the entire point.

  9. Be direct. No filler, no encouragement, no "great codebase!" platitudes. The report is a tool, not a conversation.

  10. Reset sleep prevention — restore normal sleep behavior after writing reports. On macOS: kill $CAFFEINATE_PID 2>/dev/null. On Linux: kill $INHIBIT_PID 2>/dev/null. On Windows: run the PowerShell SetThreadExecutionState(0x80000000) reset from Phase 0.2.

  11. Print final summary to stdout after writing all files:

    Nightshift complete. [N] findings ([critical] critical, [high] high).
    Report: .nightshift/runs/YYYY-MM-DD_HH-mm-ss/executive-report.md