truth-shield

"Fact-verification layer v3 — catches hallucinations using 10 verification tiers, self-consistency sampling, isolated-context verification (CoVe), package existence checking (DepScope), and multi-judge arbitration. Cross-checks claims against local files, stored knowledge, live docs, web search, multi-model cross-check, and external fact-checkers. Every contradiction is persisted so the same lie is never repeated. MANDATORY TRIGGERS: 'verify this', 'truth-check this', 'fact-check this', 'shield this', 'truth shield', 'truth-shield', 'is this true', 'check your work'. STRONG TRIGGERS: 'are you sure', 'really?', 'source?', 'prove it', 'how do you know', 'is that right', 'double-check that'. PASSIVE MODE: 'shield on' / 'shield off'. Do NOT trigger on opinions, creative writing, brainstorming, or subjective preferences."

BAS-More 0 Updated 2mo ago

Resources

GitHub

Install

npx skillscat add bas-more/truth-shield

Install via the SkillsCat registry.

SKILL.md

Truth Shield v3 — Hallucination Defence Layer

Claude generates plausible text, not verified truth. Truth Shield checks every factual claim against real sources before presenting it.

v3 adds research-backed upgrades: self-consistency sampling detects uncertain claims before verification, isolated-context verification breaks confirmation bias, DepScope catches hallucinated packages, and multi-judge arbitration resolves conflicts without single-model bias.

Three modes

Mode 1: Verify After

User says "verify this" after Claude responds. Truth Shield retroactively checks every claim.

Mode 2: Shield On (passive)

User says "shield on". Every subsequent response is verified before the user sees it. Slower but catches errors before they matter.

Mode 3: Spot Check

User says "are you sure about X?" — checks that one claim only.

The verification tiers (0–9)

Claims are checked in this order. Each tier is independent — if unavailable, skip it. Early tiers are fast; later tiers catch more.

Tier 0:   fact-mcp cache         — instant (previously verified claims)
Tier 1:   Total Recall            — stored knowledge, past corrections
Tier 2:   Knowledge Graph         — code structure (symbols, call chains)
Tier 3:   Local files             — Grep/Read/Glob (ground truth for code)
Tier 3.5: DepScope                — package existence across 19 ecosystems
Tier 4:   Context7                — live library/API documentation
Tier 5:   Graphiti                — relationship memory (entity facts)
Tier 6:   WebSearch               — general knowledge, current events
Tier 7:   Multi-model cross-check — self-consistency + different model family
Tier 8:   MiniCheck               — external fact-checking model
Tier 9:   Multi-Judge Council     — FACTS-style conflict resolution

Step 1: Claim extraction

Parse the response and extract every factual claim — statements that are either true or false.

Hedged statements are not claims. "I think", "probably", "might" — these signal uncertainty. Only extract statements presented as definite facts.

Categorise each claim:

Category	Example	Best tiers
Code symbol	"Function X exists at line Y"	2 → 3
Code structure	"Function X calls function Y"	2 (cypher)
Package/library name	"Install lodash-utils"	3.5 (package check) → 4
API/library behaviour	"useEffect runs after render"	4 → 6
Past decision	"We chose JWT over sessions"	1 → 5
Entity relationship	"Service A depends on Service B"	5 → 2
General knowledge	"Python was created in 1991"	6 → 7
Current state	"Server runs on port 3001"	3 (Read config)

The "Best tiers" column shows where to START — skip tiers that are obviously irrelevant (e.g., don't check Knowledge Graph for a general knowledge claim). Always fall through to subsequent tiers if the best tier returns UNVERIFIED.

Step 2: Self-consistency pre-screen (v3)

Before full verification, flag claims that Claude itself is uncertain about. This catches hallucinations that sound confident but aren't stable.

For each high-risk claim (package names, version numbers, API signatures, dates):

1. Mentally re-derive the claim from scratch — would you give the same answer
   if asked independently, with no memory of what you just said?

2. Rate your genuine internal confidence (not the confidence you displayed):
   - HIGH: would bet on it, have seen it many times
   - MEDIUM: fairly sure but could be wrong
   - LOW: guessing, filling in from pattern-matching

3. Any claim rated LOW → auto-flag as UNCERTAIN, prioritize for verification
   Any claim rated MEDIUM → verify with extra scrutiny (check 2+ tiers)
   HIGH claims → normal verification pipeline

This is NOT a verification source — it's a triage step. Claude's self-assessment is unreliable, which is precisely why uncertain claims get escalated to real sources. The value is catching the claims Claude knows (at some level) it's guessing about.

Step 3: Verification pipeline

For each claim, work through tiers in order until you get a verdict:

VERIFIED or CONTRADICTED → stop checking this claim (verdict found)
UNVERIFIED → continue to next tier (no evidence yet)
CONFLICTED → stop, escalate to Tier 9

Exception — Tier 8 (MiniCheck): If available, run MiniCheck as a second opinion on VERIFIED claims from tiers 4-6 (library docs, web search). If MiniCheck disagrees, escalate to Tier 9. Skip MiniCheck for claims verified by local files (Tier 3) or stored knowledge (Tier 1) — those are ground truth.

If all tiers exhausted with no verdict → UNVERIFIED (not UNCERTAIN — self-consistency flags are triage hints, not final verdicts).

If zero factual claims extracted → report "No factual claims found — nothing to verify."

v3 rule: Isolated context for verification. When checking a claim, do NOT let the original response influence your interpretation of evidence. Read the source material as if you'd never seen the claim. This breaks confirmation bias — the #1 reason v2 missed contradictions.

Tier 0 — fact-mcp cache (instant)

Load: ToolSearch query "select:mcp__fact-mcp__fact_query,mcp__fact-mcp__fact_set"

Call: mcp__fact-mcp__fact_query
  key: "truth-shield:<normalized-claim>"

Hit → use cached verdict + source, skip remaining tiers
Miss → continue to Tier 1

Tier 1 — Total Recall (stored knowledge)

Load: ToolSearch query "select:mcp__total-recall__verify_claim,mcp__total-recall__recall_semantic,mcp__total-recall__recall_by_category"

Step A — corrections first:
  Call: mcp__total-recall__recall_by_category  category: "correction"
  Match criteria: result content references the same entity/concept as the claim
    (e.g., claim about "useEffect timing" matches correction about "useEffect runs after render")
  Match → CONTRADICTED (this lie was caught before)

Step B — verify against stored facts:
  Call: mcp__total-recall__verify_claim
    claim_type: <"version"|"hosting"|"status"|"technology"|etc.>
    value: <the claim>

Step C — semantic search:
  Call: mcp__total-recall__recall_semantic  query: <claim as question>

Tier 2 — Knowledge Graph (code structure)

Load: ToolSearch query "select:mcp__knowledge-graph__query,mcp__knowledge-graph__context,mcp__knowledge-graph__cypher"

"function X exists":
  Call: mcp__knowledge-graph__context  name: <function>
  Found → VERIFIED with path+signature
  Not found → fall through to Tier 3

"X calls Y":
  Call: mcp__knowledge-graph__cypher
    query: 'MATCH (a)-[:CodeRelation {type:"CALLS"}]->(b) WHERE a.name="X" AND b.name="Y" RETURN a,b'

Tier 3 — Local files (Grep / Read / Glob)

Always available — no ToolSearch needed.

1. Glob for the file
2. Read at claimed location
3. Grep for symbol across project
4. Compare actual vs claimed
5. Found + matches → VERIFIED, quote the line
6. Found + differs → CONTRADICTED, quote actual vs claimed
7. Not found → CONTRADICTED with "not found in codebase"

Always quote evidence. "VERIFIED — src/auth.ts:42: export function validateToken(...)" — not just "checked the file."

Tier 3.5 — Package existence verification (v3)

For any claim that references a package, library, or module by name — especially in install commands, import statements, or dependency recommendations.

Hallucinated package names are a top-5 hallucination category. Users who npm install a fake name risk installing malware that squatted it.

Method A — DepScope MCP (if installed):
  Load: ToolSearch query "+depscope"
  Call the package-check tool for the claimed package + ecosystem.
  EXISTS → continue (package is real)
  NOT FOUND → CONTRADICTED with "Package does not exist in <ecosystem>"

Method B — WebSearch fallback (if no DepScope):
  Load: ToolSearch query "select:WebSearch"
  Call: WebSearch
    query: "<package-name>" site:npmjs.com  (or pypi.org, crates.io, etc.)
  
  Registry page found → package exists, continue
  No results → CONTRADICTED — "No registry listing found for '<name>'.
    This may be a hallucinated package name."

Method C — Context7 fallback:
  If the package is a well-known library, Context7 can confirm it exists
  via resolve-library-id. No match → suspicious but not definitive.

Always check package names. Even if the rest of the claim is verified, a wrong package name can cause real harm (slopsquatting/typosquatting attacks).

Tier 4 — Context7 (live documentation)

Load: ToolSearch query "select:mcp__context7__resolve-library-id,mcp__context7__query-docs"

1. mcp__context7__resolve-library-id  libraryName: <library>
2. mcp__context7__query-docs  context7CompatibleLibraryID: <id>  query: <claim>
3. Match → VERIFIED, quote doc passage
   Contradiction → CONTRADICTED, quote what docs say
   No match → UNVERIFIED

Tier 5 — Graphiti (relationship memory)

Load: ToolSearch query "select:mcp__graphiti__search_memory_facts,mcp__graphiti__search_nodes"

Relationships: mcp__graphiti__search_memory_facts  query: <claim>
Entities: mcp__graphiti__search_nodes  query: <entity>

Interpreting results:
  Fact found + matches claim → VERIFIED with the stored fact as evidence
  Fact found + contradicts claim → CONTRADICTED with the stored fact
  Fact found but temporal metadata shows it's outdated → UNVERIFIED (stale data)
  No matching facts or entities → UNVERIFIED (not in relationship memory)

Tier 6 — WebSearch (general knowledge)

Load: ToolSearch query "select:WebSearch"

Call: WebSearch  query: <claim as verification question>

**Require 2+ agreeing sources before VERIFIED.**
Single source → UNVERIFIED (note "single source — verify independently")
Sources disagree → UNVERIFIED, present disagreement
WebSearch-only verdicts: persist at confidence 0.7 (not 0.9)

Never mark VERIFIED on a single web source or Claude's confidence alone.

Tier 7 — Multi-model cross-check with self-consistency (v3)

v3 upgrades this tier with self-consistency sampling: query multiple models (or the same model multiple times at high temperature) and check agreement.

Load: ToolSearch query "select:WebFetch"

Step A — Cross-model check:
  Call: WebFetch
    url: "http://localhost:20128/v1/chat/completions"
    headers: {"Content-Type":"application/json","Authorization":"Bearer 9router"}
    body: {
      "model": "gpt-4o",
      "messages": [{"role":"user","content":"Is this true or false? <claim>. Cite your source."}],
      "max_tokens": 300,
      "temperature": 0.0
    }

  These are defaults — change URL/auth to match your proxy.
  See ENHANCE.md Tier 7 for configuration.

Step B — Self-consistency (if proxy supports multiple models):
  Repeat the query with 2 additional models (e.g., gemini-2.5-flash, llama-3.3-70b)
  
  All 3 agree → strong signal (increases confidence)
  2/3 agree → moderate signal
  All 3 disagree → HIGH UNCERTAINTY, flag as CONFLICTED

If no proxy running → skip tier, note "multi-model unavailable"

Tier 8 — MiniCheck external fact-checker (v3)

A purpose-built fact-verification model (EMNLP 2024). Unlike LLMs that generate plausible text, MiniCheck was trained specifically to judge (document, claim) → true/false. On grounding-check benchmarks, it matches or exceeds GPT-4 for document-claim verification.

Load: ToolSearch query "select:WebFetch"

Call: WebFetch
  url: "http://localhost:11434/api/generate"
  method: POST
  body: {
    "model": "bespoke-minicheck",
    "prompt": "Document: <evidence from earlier tiers>\nClaim: <the claim>\nIs the claim supported by the document? Answer YES or NO.",
    "stream": false
  }

YES → reinforces VERIFIED verdict
NO → reinforces CONTRADICTED verdict (or downgrades UNVERIFIED to CONTRADICTED)

Default: Ollama at localhost:11434 with bespoke-minicheck model.
See ENHANCE.md Tier 8 for setup: `ollama pull bespoke-minicheck`

MiniCheck is most valuable as a second opinion on claims that passed earlier tiers. If Tier 4 (Context7) says VERIFIED but MiniCheck says NO, escalate to Tier 9.

If Ollama/MiniCheck unavailable → skip tier.

Tier 9 — Multi-Judge Council (v3, replaces old Tier 8)

Fires ONLY when tiers produce conflicting evidence. Uses FACTS-style multi-judge arbitration (DeepMind 2024) instead of single-model resolution.

Trigger: two or more tiers returned different verdicts for the same claim

When triggered:
  Option A — If /llm-council skill is available:
    Invoke with: "Sources disagree on: <claim>
      Source A: <verdict + evidence>
      Source B: <verdict + evidence>
      Arbitrate using FACTS framework: each judge scores independently,
      then average. Final verdict requires 2/3 majority."

  Option B — If multi-model proxy available but no /llm-council:
    Query 3 different models via WebFetch with the same arbitration prompt
    Each responds independently (isolated context)
    Tally: 2/3+ agree → that verdict wins
    No majority → CONFLICTED, present all positions to user

  Option C — No council, no proxy:
    Present both positions: "[CONFLICTED] Sources disagree: ..."

Step 4: Confidence scoring

Score	Meaning	Display
VERIFIED	Confirmed by authoritative source. Evidence quoted.	Presented normally
UNVERIFIED	No source could confirm or deny.	`[unverified]`
CONTRADICTED	Source directly contradicts. Correction provided.	`[CONTRADICTED — <correction>]`
CONFLICTED	Sources disagree. Both positions presented.	`[CONFLICTED — see details]`
UNCERTAIN	Self-consistency pre-screen flagged low confidence. Internal triage only — becomes UNVERIFIED if no tier confirms or contradicts. Never appears in final reports.	(not displayed — resolved before output)

Step 5: The learning loop

When a claim is CONTRADICTED:

Load: ToolSearch query "select:mcp__total-recall__correct,mcp__total-recall__remember,mcp__graphiti__add_memory"

1. PERSIST correction:
   mcp__total-recall__remember
     category: "correction"
     content: "Claude claimed '<wrong>'. Correct: '<right>'. Source: <source>"
     tags: ["truth-shield", "<topic>"]
     triggers: [<key phrases>]
     scope: "global"

2. CACHE correct answer:
   mcp__fact-mcp__fact_set
     key: "truth-shield:<normalized-claim>"
     value: '{"verdict":"CONTRADICTED","correction":"<answer>","source":"<source>"}'
     tier: "semi-dynamic"

3. ADD to relationship memory:
   mcp__graphiti__add_memory
     name: "Truth Shield correction"
     episode_body: "Corrected: '<wrong>' → '<right>'. Source: <source>."

4. If existing Total Recall entry was wrong:
   mcp__total-recall__correct
     original_id: <UUID>
     corrected_content: <correct info>
     reason: "Truth Shield: <source>"
     severity: "high"

When VERIFIED, store lighter weight:

mcp__total-recall__remember  category:"fact"  content:"<claim>. Source:<source>"  confidence:0.9
mcp__fact-mcp__fact_set  key:"truth-shield:<claim>"  value:'{"verdict":"VERIFIED","source":"<source>"}'

Step 6: Output

Verify After report

## Truth Shield Report (v3)

Claims checked: 7 | Verified: 5 | Unverified: 1 | Contradicted: 1

| # | Claim | Verdict | Evidence | Tier |
|---|---|---|---|---|
| 1 | useEffect runs after render | VERIFIED | Context7: React docs | 4 |
| 2 | validateToken in auth.ts | VERIFIED | Grep: src/auth.ts:42 | 3 |
| 3 | Install lodash-helpers | CONTRADICTED | DepScope: package not found in npm | 3.5 |
| 4 | Express defaults to port 3000 | VERIFIED | WebSearch (3 sources) | 6 |
| 5 | parseDate calls moment() | CONTRADICTED | KG: calls date-fns/parse | 2 |
| 6 | Python 3.12 released Mar 2024 | UNVERIFIED | Conflicting dates | 6 |
| 7 | We chose JWT for auth | VERIFIED | Total Recall: decision-2026-04 | 1 |

Corrections persisted: 2 (caught instantly in future sessions)
Self-consistency flags: 1 claim pre-screened as LOW confidence (#3)
Tiers used: Total Recall, Knowledge Graph, Grep, DepScope, Context7, WebSearch

Shield On footer

[shield: 5/7 verified, 2 contradicted | tiers: 1,2,3,3.5,4,6 | v3]

Zero-verified report

## Truth Shield Report (v3)

Claims checked: 4 | Verified: 0 | Unverified: 4

Available tiers found no evidence:
- Tier 3 (Grep/Read/Glob): checked — no relevant local files
- Tier 0-2, 4-9: unavailable

UNVERIFIED ≠ wrong — no source was available to confirm.

Graceful degradation

Every tier is independent. Missing tiers reduce coverage, never break the pipeline.

Tier	If unavailable	Impact
0 - fact-mcp	Check all sources fresh	Slower
1 - Total Recall	No past corrections	Errors may repeat
2 - Knowledge Graph	No structure verification	Code uses Grep only
3 - Grep/Read/Glob	No local verification	Code claims UNVERIFIED
3.5 - DepScope	No package checking	Fake packages undetected
4 - Context7	No live docs	Library claims use WebSearch
5 - Graphiti	No relationship memory	Entity claims UNVERIFIED
6 - WebSearch	No web verification	General knowledge UNVERIFIED
7 - Multi-model	No cross-check	Training blind spots undetected
8 - MiniCheck	No external fact-checker	Rely on tier agreement only
9 - Multi-Judge	Conflicts shown to user	User resolves manually

Minimum viable: Tier 3 alone covers code claims — the highest-value case.

What Truth Shield does NOT do

Verify opinions. "React is better than Vue" — not factual.
Verify predictions. "This will scale to 1M users" — speculation.
Verify recommendations. "You should use TypeScript" — advice.
Guarantee 100%. Sources can be wrong. The goal is evidence, not omniscience.

Important notes

Always quote evidence. Not "checked docs." Say what the docs say.
Confidence ≠ accuracy. Never skip verification because Claude feels sure.
Isolated context (v3). When reading evidence, pretend you never saw the original claim. This breaks confirmation bias.
Contradictions are permanent. Every CONTRADICTED claim enters the learning loop.
UNVERIFIED ≠ wrong. Many true statements will be UNVERIFIED.
Package names are high-risk. Always check Tier 3.5 for any package recommendation.

truth-shield

Resources

Install

Truth Shield v3 — Hallucination Defence Layer

Three modes

Mode 1: Verify After

Mode 2: Shield On (passive)

Mode 3: Spot Check

The verification tiers (0–9)

Step 1: Claim extraction

Step 2: Self-consistency pre-screen (v3)

Step 3: Verification pipeline

Tier 0 — fact-mcp cache (instant)

Tier 1 — Total Recall (stored knowledge)

Tier 2 — Knowledge Graph (code structure)

Tier 3 — Local files (Grep / Read / Glob)

Tier 3.5 — Package existence verification (v3)

Tier 4 — Context7 (live documentation)

Tier 5 — Graphiti (relationship memory)

Tier 6 — WebSearch (general knowledge)

Tier 7 — Multi-model cross-check with self-consistency (v3)

Tier 8 — MiniCheck external fact-checker (v3)

Tier 9 — Multi-Judge Council (v3, replaces old Tier 8)

Step 4: Confidence scoring

Step 5: The learning loop

Step 6: Output

Verify After report

Shield On footer

Zero-verified report

Graceful degradation

What Truth Shield does NOT do

Important notes

Categories

Install

Recommended Skills