Overnight repo quality audit — dead code, doc drift, test gaps, security, debt, infra drift. Run after big work sessions. Produces a morning report.
Install
npx skillscat add adriannutiu/nightshift/claude-code-nightshift Install via the SkillsCat registry.
Nightshift — Overnight Repo Audit
You are running an unattended overnight quality sweep. The user has gone to sleep. Be thorough, be accurate, and produce a report they can act on in the morning.
Quality bar: Opus-level analysis. This is a deep audit, not a surface scan. Think critically about every finding. False positives waste the user's morning — every finding must be worth reading.
Parse Arguments
Extract from $ARGUMENTS:
| Argument | Effect |
|---|---|
| (none) | Core mode, standard budget, current repo |
deep |
Core + deep checks |
--budget fast|standard|max |
Runtime budget (default: standard) |
--since <ref> |
Scope to changes since git ref/date (default: all files) |
--focus <glob> |
Limit to matching paths |
--infra |
Force SSH infrastructure drift checks on |
--no-infra |
Force SSH infrastructure drift checks off |
Budget modes — per-check behavior:
| Check | fast |
standard (default) |
max |
|---|---|---|---|
| 1. Dead code | Grep changed files only | Grep full repo, read flagged files | Grep full repo, read all source files |
| 2. Doc drift | Check paths in changed .md files | Check all .md files | All .md + verify commands still work |
| 3. Test gaps | Flag changed files without tests | Same + run test suite if quick | Same + run with coverage |
| 4. Dependencies | npm audit / pip-audit only |
Same + npm outdated / secrets scan |
Same + lockfile drift check |
| 5. Code quality | 5a toolchain only (lint/type) | 5a + grep patterns + read flagged | 5a + read ALL source files for 5b-5h |
| 6. Tech debt | Count TODOs (no blame) | Blame flagged TODOs (30+ days) | Blame all TODOs |
| 7. Refactoring | Skip entirely | Read core business files | Read every non-trivial file |
| 8-10. Deep | Skip even if deep |
Normal depth | Exhaustive (300-commit churn) |
| 11-14. Infra | Normal (SSH is cheap) | Normal | Normal |
| Typical runtime | 5-10 min | 15-30 min | 30-60 min |
| When to use | Quick check after small PR | Default overnight | Pre-release, quarterly |
The key lever is how many files Claude reads (expensive context) vs greps (cheap). fast greps and counts. standard reads the important ones. max reads everything.
Always record chosen budget and which checks were skipped/reduced due to budget in run.log.
Infra auto-detection: If neither --infra nor --no-infra is passed, auto-detect: enable infra checks when .nightshift-infra.yaml exists at repo root AND at least one SSH target defined in it is reachable (ssh -o ConnectTimeout=3 -o BatchMode=yes -- <host> true 2>/dev/null). Validate <host> against ssh_alias rules before probing. This makes infra opt-in via config file — repos without it skip infra entirely.
Phase 0: Setup
0.1 Detect Environment
pwd
git rev-parse --show-toplevel 2>/dev/null
git branch --show-current 2>/dev/null
git log --oneline -5 2>/dev/nullIdentify:
- Repo root and repo name
- Tech stacks present (scan for:
package.json,requirements.txt,pyproject.toml,Cargo.toml,go.mod,composer.json,Gemfile,docker-compose.yml,Dockerfile) - Project conventions (read
CLAUDE.mdat root + any subdirectoryCLAUDE.mdfiles) - Test patterns (where tests live, how they're run)
- Whether
knowledge/learnings/exists (for learning capture)
0.2 Preflight: Sleep Prevention
Prevent the machine from sleeping during the overnight run.
macOS:
caffeinate -dims &
CAFFEINATE_PID=$!At the end of the run (Phase 5), kill it: kill $CAFFEINATE_PID 2>/dev/null
Linux: No universal sleep prevention command. If systemd-inhibit is available:
systemd-inhibit --what=idle --who=nightshift --why="Overnight audit" --mode=block sleep infinity &
INHIBIT_PID=$!At the end of the run: kill $INHIBIT_PID 2>/dev/null. If neither command exists, log a warning in run.log and continue — the audit still works, the machine may just sleep.
Windows:
powershell -Command "Add-Type -MemberDefinition '[DllImport(\"kernel32.dll\")] public static extern uint SetThreadExecutionState(uint esFlags);' -Name Win32 -Namespace API; [API.Win32]::SetThreadExecutionState(0x80000003)"At the end of the run (Phase 5), reset with a self-contained invocation (the type from the first call doesn't persist across processes):
powershell -Command "Add-Type -MemberDefinition '[DllImport(\"kernel32.dll\")] public static extern uint SetThreadExecutionState(uint esFlags);' -Name Win32 -Namespace API; [API.Win32]::SetThreadExecutionState(0x80000000)"Detect platform with uname -s 2>/dev/null — use caffeinate on Darwin, systemd-inhibit on Linux, PowerShell on Windows/MINGW/MSYS.
0.3 Create Run Directory
.nightshift/runs/YYYY-MM-DD_HH-mm-ss/
├── run.log # Full execution trace (append throughout)
├── findings.md # Technical detail per check
├── executive-report.md # Prioritized morning brief
├── summary.json # Machine-readable results
└── raw/ # Tool outputs, grep results, etc.Create the directory structure. Initialize run.log with:
Nightshift started: [ISO8601 timestamp]
Repo: [name] at [path]
Mode: [core|deep]
Args: [raw arguments]Append to run.log at the start and end of each check:
[HH:MM:SS] CHECK 1 START: Dead Code & Unused Artifacts
[HH:MM:SS] CHECK 1 END: 3 findings (0 critical, 1 high, 2 medium) [12s]If .nightshift/ doesn't exist in .gitignore, note it in the report — the user should add it.
0.4 Load Previous Run (Trend Tracking)
If a previous run exists in .nightshift/runs/, load its summary.json. This enables classifying each finding as:
new: Not present in previous runexisting: Was present before, still presentresolved: Was present before, no longer found
Prioritize report ordering: new critical/high findings first, then severity regressions, then unresolved backlog, then resolved items.
0.5 Load .nightshift-ignore (Suppression Rules)
If .nightshift-ignore exists at repo root, parse suppression rules:
Rule format (one per line, semicolon-separated key/value pairs):
id=AUTH-REFRESH-DUP; reason=Known backlog item; expires=2026-03-31
check=refactoring; path=legacy/**; reason=Queued for rewrite; expires=never
check=doc-drift; path=docs/archive/**; reason=Archived docs; expires=never
id=SEC-OLD-123; reason=Accepted until migration; expires=2026-04-15; allow_critical=trueKeys: id (exact finding ID), check (check name), path (glob), reason (required), expires (YYYY-MM-DD or never), allow_critical (true|false, default false).
Safety rules:
- Suppressed findings stay visible in a dedicated report section — never deleted
- Critical findings cannot be suppressed unless rule has
allow_critical=true - Expired rules are ignored and reported
- Rules missing
reasonare invalid and reported - Record rule counts in
run.log: total / valid / invalid / expired
0.6 Scope Detection
If --since was provided, get the changed file list:
git diff --name-only <ref>..HEADIf --focus was provided, filter to matching paths.
Otherwise, scope is the entire repo (respecting .gitignore).
Phase 1: Deterministic Checks
Non-mutating rule (CRITICAL): Nightshift never modifies project source files, configs, or dependencies. Before running any project script (npm run lint, npm test, etc.), inspect the script command first. Skip anything containing mutating flags: --fix, --write, format, prettier --write, eslint --fix, ruff --fix, codegen, or migration commands. When in doubt, run the direct tool command instead of the project script.
Note on toolchain side effects: Some read-only checks (build validation, test runners) may create cache artifacts in their own directories (e.g., node_modules/.cache/, .pytest_cache/, dist/). This is standard toolchain behavior. The "read-only" guarantee means Nightshift never edits, deletes, or overwrites your source code, configs, or committed files — not that zero bytes are written to transient caches.
Run these sequentially. For each check, write raw output to raw/ and structured findings to memory. Be conservative — only flag things you're confident about.
Finding classification: Tag every finding with:
severity: critical / high / medium / low / infoconfidence: high / medium / low (low confidence → normally downgrade one severity level, except confirmed critical security/data-loss)kind:defect(correctness, security, operational risk) orimprovement(DRY, refactor, maintainability)state:new/existing/resolved(if previous run data available)
Check 1: Dead Code & Unused Artifacts
Goal: Find files, exports, and imports that nothing references.
Orphan files: Find files not imported/required by anything else:
- For JS/TS: grep for
import.*fromandrequire(patterns, cross-reference with all source files - For Python: grep for
importandfrom...importpatterns - Exclude: test files, config files, entry points, scripts meant to be run directly
- For JS/TS: grep for
Unused exports: Find exported symbols not imported anywhere else in the codebase.
Dead config: Look for config entries referencing files/paths/modules that don't exist.
Stale scripts: Check
package.jsonscripts,Makefiletargets, or shell scripts that reference missing files or commands.
Confidence labeling:
- High: File has zero inbound references and isn't an entry point → flag
- Medium: Export has zero external references but file is imported → note
- Low: Might be used dynamically (string interpolation, reflection) → skip unless
deepmode
Check 2: Documentation Drift
Goal: Find docs that reference stale state — wrong paths, removed features, outdated instructions.
Path references: Extract all file paths mentioned in
.mdfiles. Check each exists:Grep for patterns like `path/to/file`, backtick-wrapped paths, and markdown links Verify each referenced path exists on diskCode references: Find function/class/variable names mentioned in docs. Verify they still exist in the codebase.
Instruction drift: For CLAUDE.md and README files, check if:
- Referenced commands still work (e.g.,
npm run X— does script X exist?) - Referenced URLs/endpoints are still present in code
- Version numbers match actual installed versions
- Referenced commands still work (e.g.,
Stale dates: Flag dates in docs older than 90 days that suggest the section hasn't been reviewed.
Size check: Flag
.mdfiles over 500 lines (likely need splitting or pruning).
Check 3: Test Gap Detection
Goal: Find production code that lacks test coverage signals.
File-level gaps: For each source file, check if a corresponding test file exists:
src/foo.js→tests/foo.test.jsortest/test_foo.pyor similar patterns- Weight by file complexity (larger files without tests are higher priority)
Recent changes without tests: If
--sincewas used, flag changed production files where no test file was also changed.Test health:
- Run test suite if safe to do (detect with:
npm test,pytest,cargo test, etc.) - If tests exist but fail, flag as High (Critical only if the failure is in a critical path — auth, billing, data persistence)
- If no test runner detected, note as info
- Run test suite if safe to do (detect with:
Coverage gaps (if coverage tool available):
- Look for
.coverage,coverage/,lcov.info, etc. - If found, parse for files below 50% coverage
- Look for
Check 4: Dependency & Security Scan
Goal: Find known vulnerabilities and outdated dependencies.
Ecosystem audit tools (run whichever apply):
- Node.js:
npm audit --json 2>/dev/null - Python:
pip-audit --format json 2>/dev/nullorsafety check --json 2>/dev/null - Rust:
cargo audit --json 2>/dev/null - Go:
govulncheck ./... 2>/dev/null
If the tool isn't installed, skip gracefully and note it.
- Node.js:
Outdated dependencies:
npm outdated --json 2>/dev/nullpip list --outdated --format json 2>/dev/null
Hardcoded secrets scan:
- Grep for patterns: API keys, tokens, passwords, connection strings
- Patterns:
(?i)(api[_-]?key|secret|password|token|credential|auth)\s*[:=]\s*['"][^'"]{8,} - Exclude:
.env.example, test fixtures, documentation examples - Flag any matches in committed files as Critical
Lockfile drift: Check if lockfiles (
package-lock.json,poetry.lock, etc.) are committed and up to date.
Check 5: Code Quality, Correctness & Static Analysis
Goal: Comprehensive code quality sweep — lint violations, type errors, build issues, static bug patterns, and logic defects. This is the deepest check and where Opus-level analysis matters most.
5a. Run Existing Toolchain (if available)
Before manual analysis, run whatever tooling the repo already has configured. Record all output in raw/:
Linters:
- JS/TS:
npx eslint . --format json 2>/dev/nullor check for.eslintrc*and run - Python:
ruff check . --output-format json 2>/dev/nullorflake8 --format json 2>/dev/nullorpylint --output-format json 2>/dev/null - Shell:
shellcheck -f json *.sh 2>/dev/null(check all.shfiles) - YAML/JSON:
yamllint . 2>/dev/null
Type checkers:
- TS:
npx tsc --noEmit --pretty 2>/dev/null(iftsconfig.jsonexists) - Python:
mypy . --ignore-missing-imports 2>/dev/nullorpyright 2>/dev/null(ifpy.typedor type hints detected)
Build validation:
- Node:
npm run build 2>/dev/null(ifbuildscript exists inpackage.json) - Python:
python -c "import ast, sys; ast.parse(open(sys.argv[1]).read())" <file>for each.pyfile (catches syntax errors without writing bytecode) - Docker:
docker compose config -q 2>/dev/null(ifdocker-compose.ymlexists — validates YAML + variable refs)
Test runners (run but don't fail the audit if tests fail — record results):
npm test 2>/dev/null(iftestscript exists)python -m pytest --tb=short -q 2>/dev/null(ifpytestavailable and test files exist)
For each tool: if not installed or not applicable, skip with reason in run.log. If the tool runs and finds issues, capture them as findings. Don't double-count — if ESLint flags an issue, don't also flag it manually.
5b. Error Handling & Failure Modes
Silent error swallowing:
try/catchwith empty catch body or catch that only logs (no re-throw, no return)catch (e) { /* ignore */ }orcatch (e) { console.log(e) }without recovery logic- Python bare
except:orexcept Exception:withpass .catch(() => {})on Promises
Missing error handling:
awaitcalls without try/catch in functions that don't propagate errors- File I/O without error handling (especially in scripts that run unattended)
- HTTP/fetch calls without checking response status
- JSON.parse without try/catch on external data
- Missing error callbacks in Node.js streams/event emitters
Shell script robustness:
- Missing
set -eorset -euo pipefailat the top - Unquoted variables (
$VARinstead of"$VAR") — word splitting/globbing risk - Missing exit code checks after critical commands
cdwithout checking success (cd /dir && ...vs barecd /dir)- Heredocs/temp files without cleanup on exit (missing
trap)
- Missing
5c. Security & Injection Patterns
SQL injection:
- String concatenation/interpolation in SQL queries (f-strings, template literals,
+) cursor.execute(f"SELECT ... WHERE x = '{var}'")patterns- Raw SQL in ORMs bypassing parameterization
- String concatenation/interpolation in SQL queries (f-strings, template literals,
Command injection:
subprocess/child_process.execwithshell=Trueand variable inputos.system()with string formatting- Template literals in shell command strings
Path traversal:
- User-influenced paths without sanitization (
../not stripped) os.path.join(base, user_input)without verifying result stays under base
- User-influenced paths without sanitization (
Code execution:
eval(),exec(),Function()with any dynamic inputpickle.load()/yaml.load()(withoutLoader=SafeLoader)importlib.import_module()with dynamic strings from external input
Hardcoded sensitive data:
- API keys, tokens, passwords, connection strings in source code
- Pattern:
(?i)(api[_-]?key|secret|password|token|credential|auth|private[_-]?key)\s*[:=]\s*['"][^'"]{8,} - Also check for: AWS access keys (
AKIA...), private keys (-----BEGIN), JWT tokens (eyJ...) - Exclude:
.env.example, test fixtures, documentation examples, gitignored files
Crypto misuse:
- MD5/SHA1 for anything security-related (use SHA256+)
- Hardcoded IV/nonce in encryption
- ECB mode usage
5d. Logic Defects & Bug Patterns
Dead/unreachable code:
- Code after unconditional
return,break,continue,raise,throw,sys.exit() - Conditions that are always true/false based on types
- Catch clauses that can never trigger (wrong exception type)
- Code after unconditional
Copy-paste bugs:
- Identical
ifandelsebranches - Switch/case fallthrough without break (JS) or duplicate case values
- Repeated conditions in if/elif chains
- Identical
Null/undefined safety:
- Property access on potentially null/undefined values without guards
- Optional chaining (
?.) mixed with non-optional access on same variable Array.find()result used without null check- Dict/object key access without
incheck or.get()with default
Off-by-one and bounds:
<=vs<in loop bounds near.lengthorlen()- Array index at
.length(one past end) - Fence-post errors in pagination/slicing
Assignment vs comparison:
=instead of==/===in conditions (JS especially)isvs==misuse in Python (comparing values vs identity)
Floating point:
- Direct equality comparison of floats (
0.1 + 0.2 === 0.3) - Currency/financial values stored as floats instead of ints/Decimal
- Direct equality comparison of floats (
Shadowing & scope:
- Variable shadowing outer scope variables (same name, different binding)
varinstead oflet/constin modern JS- Python mutable default arguments (
def foo(items=[]))
5e. Async & Concurrency
Missing await:
- Async function called without
await(fire-and-forget that loses errors) - Promise returned but not awaited in async context
- Async function called without
Race conditions:
- Shared mutable state modified across async operations without coordination
- Check-then-act patterns without atomicity (TOCTOU)
- Multiple concurrent writes to same file/resource
Promise anti-patterns:
- Mixing
.then()andawaitin same function new Promise()wrapping an already-async operation (unnecessary wrapping).then(x => x)identity chains (no-op)- Missing
.catch()on promise chains that aren't awaited
- Mixing
Event loop blocking:
- Synchronous file I/O (
fs.readFileSync) in async/server code paths - CPU-intensive loops in event handlers without yielding
JSON.parseon unbounded input without size limits
- Synchronous file I/O (
5f. Resource Management
Leaks:
- File handles opened without
with(Python) or without.close()in finally - Database connections/cursors not released
- Event listeners added without corresponding removal
setIntervalwithoutclearIntervalpath- Child processes spawned without cleanup on parent exit
- File handles opened without
Memory concerns:
- Unbounded arrays/lists that grow without limit (logs, caches, history)
- Closures capturing large scope unnecessarily
- Circular references preventing garbage collection
5g. Type Safety (language-specific)
TypeScript:
anytype annotations (grep for: any,as any)@ts-ignore/@ts-expect-errorcomments- Non-null assertions (
!) on values that could genuinely be null - Type assertions (
as Type) that narrow unsafely
Python:
- Missing type hints on public functions (when other functions have them — inconsistency)
# type: ignorecommentscast()calls that may be incorrect- Union types handled without narrowing
JavaScript:
- Loose equality (
==,!=) in non-trivial comparisons (not== nullidiom) typeofchecks that miss edge cases (typeof null === "object")- Implicit type coercion in arithmetic/comparisons
5h. Configuration & Build Issues
Config consistency:
.env.examplevars that don't match actual.envusage in code- Docker compose environment variables referenced but not defined
- Config file references to nonexistent files/paths
Dependency issues:
- Import of a module not in
package.json/requirements.txt(missing dependency) - Circular imports between modules
- Importing from
devDependenciesin production code
- Import of a module not in
Build & deploy:
- Dockerfile
COPYof files that don't exist or are gitignored - Missing build artifacts that the deploy process expects
- Environment-specific code without proper guards (
process.env.NODE_ENV)
- Dockerfile
Approach: For each source file, read the code and analyze it. Don't just grep for patterns — understand the logic flow. Use grep/glob for high-signal patterns first (e.g., eval(, shell=True, empty catch blocks), then deep-read the flagged files + all core business logic files. Prioritize:
- Recently changed files (if
--sinceis used) - Core business logic (not config/boilerplate)
- Large and complex files (more surface area for bugs)
- Files in hot paths (request handlers, data pipelines, backup scripts)
Confidence labeling:
- High: Pattern is definitively a bug, vulnerability, or will cause runtime failure → flag
- Medium: Pattern is suspicious but might be intentional → flag with note
- Low: Style preference or minor concern → skip in core mode, include in deep mode
Check 6: Tech Debt Signals
Goal: Surface accumulating debt markers.
TODO/FIXME/HACK aging:
- Find all
TODO,FIXME,HACK,XXX,TEMP,WORKAROUNDcomments - Use
git blameto date them - Flag any older than 30 days as stale (medium)
- Flag any older than 90 days as high (critical only if in auth, billing, data persistence, or backup paths)
- Find all
Suppression abuse:
// eslint-disable,# noqa,# type: ignore,@SuppressWarnings,# nosec- Count total and flag files with 3+ suppressions
Large files: Flag source files over 500 lines (likely need splitting).
Complex functions: Look for functions over 50 lines (heuristic: count lines between function definition and closing brace/dedent).
Dependency count: Flag if
package.jsonhas 50+ dependencies orrequirements.txthas 30+ entries.
Check 7: Refactoring & DRY Opportunities
Goal: Find code that should be consolidated, abstracted, or restructured for maintainability. This goes beyond "duplicate logic" (deep mode) — it looks for structural improvements even without exact duplication.
Repeated patterns across files:
- Similar error handling blocks (e.g., try/catch with same structure repeated 3+ times)
- Similar data transformation pipelines (fetch → parse → validate → store)
- Similar config/setup boilerplate that could be extracted to a shared helper
- Same validation logic implemented differently in multiple places
Extract-worthy blocks:
- Functions over 30 lines that do multiple distinct things (should be split)
- Nested conditionals 3+ levels deep (should be flattened or extracted)
- Long parameter lists (5+ params) suggesting a config object/class is needed
- Repeated inline constants (magic numbers/strings used 3+ times without a named constant)
Abstraction opportunities:
- Multiple files that follow the same structural pattern but aren't using a shared base/template
- Hard-coded values that should be config-driven (URLs, timeouts, limits, paths)
- Switch/if-else chains with 5+ cases that could be a lookup table or strategy pattern
- Repeated string building/formatting that could be a template
Module organization:
- Files mixing concerns (e.g., data fetching + rendering + validation in one file)
- Utility functions buried inside domain-specific files (should be in a shared utils module)
- Related functions split across unrelated files (should be co-located)
- "Junk drawer" utility files with 10+ unrelated functions (should be split by domain)
Dead weight:
- Commented-out code blocks (either remove or convert to a proper TODO with context)
- Feature flags/conditional code for features that shipped long ago
- Compatibility shims for versions/platforms no longer supported
- Wrapper functions that add no logic (just pass through to another function)
Approach: This check requires reading actual code, not just grepping. Read each non-trivial source file and look for structural improvement opportunities. Group findings by theme (e.g., "Error handling patterns should be consolidated into a shared helper used by 4 files").
Output format for each finding:
- What files/functions are involved
- What the current pattern is (brief)
- What the improvement would be (concrete suggestion, not vague "refactor this")
- Estimated effort and risk
Phase 2: Deep Checks (only in deep mode)
Check 8: Duplicate Logic
- Look for near-identical code blocks (3+ consecutive similar lines) across different files
- Focus on utility functions, validation logic, error handling patterns
Check 9: Architecture Smells
- Circular imports/dependencies
- God files (files importing 10+ other files)
- Layer violations (e.g., presentation code importing database modules directly)
Check 10: Complexity Hotspots
- Files with the most git churn (most commits in last 30 days)
- Files that changed together frequently (hidden coupling)
git log --format=format: --name-only --since="30 days ago" | sort | uniq -c | sort -rn | head -20
Phase 3: Infrastructure Drift (auto-detected or --infra)
Requires SSH access to remote servers. Runs automatically when .nightshift-infra.yaml exists and targets are reachable, or forced with --infra.
SSH Command Safety (CRITICAL)
All SSH commands MUST come from the hardcoded allowlist below — NEVER execute arbitrary commands from config fields. The .nightshift-infra.yaml file is repo-controlled and could be malicious in untrusted repos.
Allowed remote commands (exhaustive list):
true— no-op, used only for reachability probes during auto-detectioncrontab -l— list crontab entriescat <path>— read a file (path must match^[a-zA-Z0-9/_.-]+$and must NOT contain..)docker ps --format '{{.Names}}\t{{.Status}}\t{{.Ports}}'— list containersjournalctl -u <unit> -n <N> --no-pager— read recent log entries (unit must match^[a-zA-Z0-9@._-]+$, N must be integer ≤ 100)systemctl list-timers --no-pager— list active timersdiff— local comparison only (never run diff remotely)
Reject any command not on this list. If a .nightshift-infra.yaml field contains a command outside the allowlist, skip that check and log a warning: "Rejected non-allowlisted command: <command>".
Input validation rules:
ssh_alias: Must match^[a-zA-Z0-9][a-zA-Z0-9@._:-]*$(MUST NOT start with-to prevent SSH option injection like-F./evil.conf). Foruser@hostformat, both user and host must match^[a-zA-Z0-9][a-zA-Z0-9._-]*$.name: Must match^[a-zA-Z0-9][a-zA-Z0-9._-]*$(no/— prevents path traversal inraw/<name>-...output paths).- File paths (
live,crontab_repo_path,repo): Must match^[a-zA-Z0-9/_.-]+$and must NOT contain..(prevents path traversal). - All other config values: Reject any value containing shell metacharacters (
;,|,&,$,`,(,),{,},<,>,\n,',").
SSH -- separator: Always use -- before the hostname to prevent option injection: ssh -o ConnectTimeout=3 -o BatchMode=yes -- <ssh_alias> '<command>'.
.nightshift-infra.yaml Format
Place this file at repo root to enable infra checks:
# .nightshift-infra.yaml — define SSH targets and script mappings for infra drift checks
targets:
- name: web-server
ssh_alias: my-web-server # SSH alias or user@host
checks:
- crontab # Compare live crontab with repo version
- backup # Check backup recency and logs
- scripts # Compare deployed scripts with repo versions
crontab_repo_path: deploy/crontabs/web-server.crontab
backup_journal_unit: backup.service # Used with journalctl -u (allowlisted)
backup_journal_lines: 5 # Max lines to fetch (integer, ≤ 100)
script_mappings:
- repo: deploy/scripts/backup.sh
live: /opt/scripts/backup.sh
- repo: deploy/scripts/health-check.sh
live: /opt/scripts/health-check.sh
- name: app-server
ssh_alias: my-app-server
checks:
- crontab
- containers # Check Docker container health
- scripts
crontab_repo_path: deploy/crontabs/app-server.crontab
script_mappings:
- repo: deploy/scripts/db-backup.sh
live: /opt/scripts/db-backup.shCheck 11: Crontab Drift
For each target with crontab check enabled:
ssh -- <ssh_alias> 'crontab -l' > raw/<name>-crontab-live.txt
diff <crontab_repo_path> raw/<name>-crontab-live.txtFlag any differences.
Check 12: Container Health
For each target with containers check enabled, use the hardcoded allowlisted command:
ssh -- <ssh_alias> "docker ps --format '{{.Names}}\t{{.Status}}\t{{.Ports}}'"- Flag any containers not running or in restart loops
- Flag containers with uptime < 1 hour (recent restarts)
Check 13: Backup Verification
For each target with backup check enabled, construct the command from validated config fields:
ssh -- <ssh_alias> "journalctl -u <backup_journal_unit> -n <backup_journal_lines> --no-pager"Where backup_journal_unit matches ^[a-zA-Z0-9@._-]+$ and backup_journal_lines is an integer ≤ 100. Reject otherwise.
- Flag if last backup was >26 hours ago
- Flag if backup logs show errors
Check 14: Script Drift
For each target with scripts check enabled, iterate over script_mappings:
ssh -- <ssh_alias> "cat <live>" > raw/<name>-<script-basename>-live.sh
diff <repo> raw/<name>-<script-basename>-live.shWhere <live> path matches ^[a-zA-Z0-9/_.-]+$. Reject paths with shell metacharacters.
Flag any differences. Focus on backup and security scripts — drift here is highest risk.
Phase 4: AI Synthesis
After all deterministic checks complete, synthesize findings.
4.1 Deduplicate and Cluster
- Group related findings (e.g., multiple stale TODOs in the same file → one finding)
- Remove low-confidence items that are likely false positives
- Merge overlapping findings from different checks
4.2 Apply Suppression Rules
If .nightshift-ignore was loaded:
- Match findings against rules (by
id,check+path, orcheckalone) - Suppress matching findings with non-expired rules
- Keep suppressed findings in a dedicated "Suppressed Findings" section — never hide them
- Do NOT suppress critical findings unless the matching rule has
allow_critical=true - Report expired rules and invalid rules
4.3 Trend Classification
If previous run data was loaded:
- Compare findings by fingerprint (check + file + description hash)
- Mark each as
new,existing, orresolved - Prioritize: new critical/high → severity regressions → existing → resolved
4.4 Severity Classification
| Severity | Criteria | Action Timeline |
|---|---|---|
| Critical | Exploitable security issue, data loss/corruption risk, broken backups, release blocker that causes outage | Fix today |
| High | Correctness gate failures in critical paths, high-confidence bug patterns in production, major infra drift | Fix this week |
| Medium | Non-critical gate failures with regression risk, meaningful test gaps, substantial doc drift, significant refactor with reliability benefit | Fix this sprint |
| Low | Localized maintainability debt, minor test/doc gaps, low-risk refactor opportunities | Backlog |
| Info | Observational findings, useful for trend tracking only, no immediate action | Note |
Confidence adjustment: If confidence is low, downgrade severity by one level — except confirmed critical security/data-loss findings which stay critical regardless.
Kind separation: defect findings (bugs, security, operational risk) always rank above improvement findings (DRY, refactor) at the same severity level.
4.5 Effort Estimation
For each finding, estimate:
- S (< 30 min): Config fix, delete dead code, update doc reference
- M (30 min - 2 hours): Write missing tests, refactor function, update deps
- L (2+ hours): Major refactor, architecture change, security overhaul
Phase 5: Write Reports
5.1 findings.md (Technical Detail)
# Nightshift Findings — [REPO] — [DATE]
**Mode:** core|deep | **Scope:** [all files | since X | focus Y]
**Duration:** [X minutes] | **Checks run:** [N/N]
---
## Check 1: Dead Code & Unused Artifacts
### Findings
- [Finding with file:line references, evidence, confidence level]
### Skipped / Not Applicable
- [Reason if check was skipped]
## Check 2: Documentation Drift
...
## Check N: ...5.2 executive-report.md (Morning Brief)
# Nightshift Report — [REPO] — [DATE]
## What Was Checked
[1-2 sentences: scope, mode, what checks ran]
## Needs Attention First
| # | Finding | Severity | Effort | File(s) |
|---|---------|----------|--------|---------|
| 1 | [Concise description] | Critical | S | `path/to/file` |
| 2 | ... | High | M | ... |
## Quick Wins (effort: S)
[Bullet list of effort=S findings that can be fixed in < 30 minutes each]
## Can Wait
[Bullet list of medium/low findings — backlog material]
## Risk Snapshot
| Severity | Count | New | Existing |
|----------|-------|-----|----------|
| Critical | N | N | N |
| High | N | N | N |
| Medium | N | N | N |
| Low | N | N | N |
| Info | N | N | N |
## Resolved Since Last Run
[Findings from previous run that are no longer present — omit section if no prior run]
## Suppressed Findings
[Findings matched by `.nightshift-ignore` rules — omit section if no rules exist or no matches]
| Finding | Suppression Reason | Expires |
|---------|-------------------|---------|
| ... | ... | ... |
## Checks Skipped
[Any checks that couldn't run and why — e.g., "pip-audit not installed", "skipped due to budget=fast"]
## Confidence Notes
[Anything the AI isn't sure about — "Check 1 may have false positives for dynamically imported modules"]5.3 summary.json
Schema contract (v1) — shared across Claude Code and Codex nightshift skills. Downstream tooling can rely on these enums:
| Field | Valid Values |
|---|---|
engine |
"claude", "codex" |
checks[].status |
"pass" (ran, no issues), "warn" (ran, non-blocking issues), "fail" (ran, blocking issues), "skipped" (did not run) |
checks[].name |
Core: "dead-code", "doc-drift", "test-gaps", "dependency-security", "code-quality", "tech-debt", "refactoring". Deep: "duplicate-logic", "architecture-smells", "complexity-hotspots". Infra: "crontab-drift", "container-health", "backup-verification", "script-drift" |
severity |
"critical", "high", "medium", "low", "info" |
effort |
"S" (< 30 min), "M" (30 min - 2 hours), "L" (2+ hours) |
kind |
"defect", "improvement" |
confidence |
"high", "medium", "low" |
state (per finding) |
"new", "existing", "resolved" |
mode |
"core", "deep" |
budget |
"fast", "standard", "max" |
Conditional fields (include when applicable, omit otherwise):
code_health: Include when any toolchain gate (lint/typecheck/build/tests) was run. Each gate hasstatus(pass|warn|fail|skipped),scope(changed|full),first_failure,evidence.infra[]: Include when infra checks ran. Each entry hastarget,check,status,details.previous_run: Include when a prior run was found. Hasrun_id,available.
Always present: schema_version, engine, run_id, repo, repo_root, branch, mode, budget, scope, since, infra_enabled, timestamp_start, timestamp_end, duration_sec, checks[], counts, trend, ignore_rules, top_actions, suppressed_findings.
{
"schema_version": "1",
"engine": "claude",
"run_id": "YYYY-MM-DD_HH-mm-ss",
"repo": "repo-name",
"repo_root": "/path/to/repo",
"branch": "main",
"mode": "core",
"budget": "standard",
"scope": ".",
"since": null,
"infra_enabled": true,
"previous_run": { "run_id": "...", "available": true },
"timestamp_start": "2026-02-17T02:00:00Z",
"timestamp_end": "2026-02-17T02:18:00Z",
"duration_sec": 1080,
"checks": [
{
"name": "dead-code",
"status": "pass",
"findings_count": 0,
"severity_counts": { "critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0 },
"duration_sec": 45,
"notes": "",
"skip_reason": null
},
{
"name": "code-quality",
"status": "warn",
"findings_count": 3,
"severity_counts": { "critical": 0, "high": 1, "medium": 2, "low": 0, "info": 0 },
"duration_sec": 120,
"notes": "Typecheck warnings in auth scope",
"skip_reason": null
},
{
"name": "dependency-security",
"status": "skipped",
"findings_count": 0,
"severity_counts": { "critical": 0, "high": 0, "medium": 0, "low": 0, "info": 0 },
"duration_sec": 0,
"notes": "",
"skip_reason": "pip-audit not installed; no package.json in scope"
}
],
"counts": { "critical": 0, "high": 1, "medium": 2, "low": 0, "info": 0, "skipped": 1, "suppressed": 1 },
"trend": {
"new_findings": 2,
"existing_findings": 1,
"resolved_findings": 0,
"net_risk": "up"
},
"ignore_rules": {
"file_present": true,
"loaded": 2,
"valid": 2,
"expired": 0,
"invalid": 0,
"applied": 1,
"critical_blocked": 0
},
"code_health": {
"lint": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#lint" },
"typecheck": { "status": "warn", "scope": "changed", "first_failure": "src/config.js:42 missing type", "evidence": "run.log#typecheck" },
"build_check": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#build" },
"tests": { "status": "pass", "scope": "full", "first_failure": "", "evidence": "run.log#tests" }
},
"top_actions": [
{
"title": "Fix hardcoded API key in config.js",
"severity": "critical",
"effort": "S",
"kind": "defect",
"state": "new",
"confidence": "high",
"file": "src/config.js",
"check": "dependency-security",
"why": "Hardcoded API key committed to source control.",
"next_step": "Move key to environment variable and rotate the exposed key."
}
],
"suppressed_findings": [
{
"id": "DRY-004",
"check": "refactoring",
"severity": "low",
"reason": "Legacy module queued for rewrite",
"rule": "check=refactoring;path=src/legacy/**",
"expires": "never"
}
],
"infra": [
{
"target": "web-server",
"check": "crontab-drift",
"status": "pass",
"details": "Live crontab matches repo."
}
]
}5.4 Knowledge Note (conditional)
If knowledge/learnings/ exists in the repo, write a concise note:
knowledge/learnings/nightshift-YYYY-MM-DD.mdContent:
# Nightshift Audit — [DATE]
**[N] findings** ([critical], [high], [medium], [low]) across [N] checks.
Top actions:
1. [Most important finding]
2. [Second most important]
3. [Third most important]
Full report: `.nightshift/runs/YYYY-MM-DD_HH-mm-ss/executive-report.md`Execution Rules
No modifications. Nightshift is read-only. Never edit, fix, commit, or deploy anything. The only files you write are in
.nightshift/and optionallyknowledge/learnings/.Infra follows auto-detection. SSH checks run when
.nightshift-infra.yamlexists and targets are reachable, or forced with--infra. Suppressed with--no-infra. Never SSH without meeting one of these conditions.Conservative findings. When uncertain, note your confidence level. A finding you're 60% sure about should say so. Never present a guess as a fact.
Skip gracefully with reason. If a check tool isn't installed or a check can't run, record it in
run.logAND in the "Checks Skipped" report section with the specific reason (e.g., "pip-audit: command not found", "no Python files in scope"). Never fail the whole run because one check fails. Each check is independent — an error in Check 3 must not prevent Check 4 from running.Time awareness. Log start/end times to
run.log. If a single check takes more than 5 minutes, note it. The whole run should complete in under 30 minutes for a medium repo.Respect .gitignore. Never scan
node_modules/,__pycache__/,.git/, or other ignored directories.No network requests except SSH. Don't fetch URLs, hit APIs, or check external services (except VPS SSH when infra checks are active).
Subagent usage. For large repos, you MAY spawn haiku subagents for parallel grep/glob work. But synthesis MUST be done by Opus. Quality of analysis is the entire point.
Be direct. No filler, no encouragement, no "great codebase!" platitudes. The report is a tool, not a conversation.
Reset sleep prevention — restore normal sleep behavior after writing reports. On macOS:
kill $CAFFEINATE_PID 2>/dev/null. On Linux:kill $INHIBIT_PID 2>/dev/null. On Windows: run the PowerShellSetThreadExecutionState(0x80000000)reset from Phase 0.2.Print final summary to stdout after writing all files:
Nightshift complete. [N] findings ([critical] critical, [high] high). Report: .nightshift/runs/YYYY-MM-DD_HH-mm-ss/executive-report.md