jewzaam

cited-research

Produces citation-backed research documents with independent verification. Every claim traces to a web source visited in-session, and isolated sub-agents audit the output. Supports both new research and updating existing research topics (re-researching stale dimensions, adding new dimensions, refreshing citations). Use this skill whenever the user asks for research, analysis, comparisons, rubrics, updates to existing research, or any deliverable that must be grounded in cited sources. Trigger on phrases like "find out", "compare", "is X better than Y", "what are the tradeoffs", "give me the real numbers", "cite your sources", "verify this", "what does the research say", "update the research", "refresh this research", or any task where unsupported claims would undermine trust. Always use for non-code research requiring factual grounding, even if the user doesn't explicitly request citations.

jewzaam 0 1 Updated 5d ago

Resources

11
GitHub

Install

npx skillscat add jewzaam/claude-skill-cited-research

Install via the SkillsCat registry.

SKILL.md

Cited Research

A methodology for producing factual, citation-backed documents where every claim
traces to a verifiable web source. The core principle: nothing is true until a
source says it is, and a separate reviewer confirms the source actually says it.

This skill exists because LLMs hallucinate. The methodology makes hallucination
structurally difficult by separating research, writing, and verification into
distinct phases — and by making the writing phase aware that verification follows.

Output Location

Before starting any research, determine where the output will be written.

Detecting the cited-research repo

Run git rev-parse --show-toplevel, then basename on the result
(two separate commands — no $() or backticks; command substitution
triggers extra approval prompts). If the basename equals exactly
cited-research, you are inside the dedicated research monorepo —
use its repo root as the output location. Otherwise default to the
current repo root. Either way, each topic lives at
research/<topic-slug>/ using a kebab-case slug derived from the
subject. Do not use pwd, the remote URL, or file presence as
detection signals — only the basename check is correct.

Repo-level files (cited-research only)

  • README.md — explains the repository (a monorepo of
    citation-backed research documents). NOT a topic index. Create on
    first run if missing.
  • index.md — searchable topic index. See
    Phase 5: Index Maintenance.

Updating Existing Research

When the user wants to update an existing research topic rather than create a new
one, read references/update-workflow.md and follow its steps. The update
workflow replaces Phase 0 and then feeds into Phases 1-5.


Phase 0: Plan Mode (Required)

Enter plan mode before doing any research. The plan is the contract between you
and the user about what will be researched and how.

Step 1: Dimension Discovery

Decompose the user's request into research dimensions before any WebSearch calls.
Present these to the user and wait for approval.

Include both directly requested dimensions and recommended additions that would
strengthen the analysis. Explain why each recommended dimension adds value. The
user may add dimensions you didn't consider or remove ones they find out of scope.

Step 1b: Framing Challenge

When the user's request carries opinion or preference markers
("better", "best", "should we", "worth it", "recommend", "compare
and pick", "is X right for…"), read
references/framing-challenge.md and surface the embedded
assumptions to the user alongside the proposed dimensions. For
neutral how-does-X-work technical questions, skip this step — the
output would be empty anyway.

Step 2: Plan the File Structure

Once dimensions are approved, define the output file tree. The structure
depends on where the output is going.

The topic directory structure is the same regardless of output location:

research/<topic-slug>/
├── README.md                     # Short standalone summary (TL;DR + key tables)
├── <deliverable>.md              # Full analysis with methodology
├── citations.md                  # All sources, numbered
├── references/
│   ├── <dimension-1>.md          # One file per approved dimension
│   └── ...
└── audit/
    ├── citation-audit.md
    └── consistency-review.md

The README inside each topic directory is written last as a standalone
decision-making tool.

If a topic directory already exists (e.g., a prior research run on the same
subject), ask the user whether to revise the existing topic in place or
create a new directory with a disambiguating slug.

Step 3: Plan Data Points as Hypotheses

List expected data points and candidate sources, but treat them as hypotheses
to verify
, not commitments. Research agents must report what they actually find,
even when it contradicts the plan. Dropped or corrected claims are a sign the
methodology is working, not failing.

For each key data point, identify 2-3 candidate sources rather than relying
on a single source. This is not redundancy for its own sake — 20-30% of web
sources are inaccessible at any given time due to link rot, paywalls, and AI
crawler blocking (see references/research-basis.md §Source Inaccessibility).
Planning for multiple candidates per data point shifts inaccessibility handling
from reactive fallback to proactive coverage.

Step 4: Counter-perspective Handling

Before finalizing the plan, ask the user how to handle counter-perspectives
during research. Present three options:

  1. Find and include (default) — search for counter-perspectives alongside
    supporting evidence. Whatever is found merges into the citation pool.
  2. Find and gate — search for counter-perspectives; if few or none are
    found, pause and surface this to the user before proceeding.
  3. Skip — do not search for counter-perspectives. Appropriate for purely
    technical topics (e.g., "how does X work?") where counter-perspectives
    are unlikely to exist.

The user's choice governs whether Counter-Discovery agents are dispatched
in Phase 1 and how null results are handled.

Step 5: Get Plan Approval

Include in the plan: dimensions, file structure, the deliverable's
intended structure, counter-perspective handling choice, and a note
that two independent review agents will audit the output.
Exit plan mode only after the user approves.

Phase 1: Research

Principles

  1. Web sources only. Every number, measurement, date, or factual statement
    must come from a URL visited in-session via WebSearch or WebFetch.

  2. Parallel where possible. Launch one research sub-agent per dimension (or
    group of related dimensions). Four agents in parallel take the same wall-clock
    time as one.

  3. Primary sources preferred. Assign quality tiers to sources and prefer
    higher tiers when conflicts arise:

    • Tier 1: Peer-reviewed papers, government/institutional reports
    • Tier 2: Manufacturer specs, established reference sites, university
      publications
    • Tier 3: Industry blogs, conference talks, well-known practitioners
    • Tier 4: Forums, personal blogs, GitHub discussions, social media

    When a secondary source quotes a number, try to find the original study.
    See references/research-basis.md §Source Quality and Weighting for the
    evidence behind these tiers.

  4. Consider source recency relative to topic. For fast-moving domains
    (for example AI, cloud infrastructure, security), prefer sources published within the
    last 2 years — the landscape changes quickly enough that older findings may
    be superseded. For stable domains (for example physics, music theory, established
    engineering, mathematics), older sources are often authoritative.
    When mixing source ages, note the publication year alongside each claim so
    readers can assess currency.

  5. Record everything immediately. Research agents must include every URL,
    claim, and exact source wording in their structured response. The main
    thread cannot recover data that agents omit from their output.

  6. Acknowledge gaps. If a data point cannot be found after 3+ distinct
    queries, state explicitly that this data point is unavailable. Do not invent
    a plausible number.

  7. Welcome bonus sources. Research agents often discover relevant sources
    not in the plan. Encourage this — unanticipated sources frequently strengthen
    the deliverable.

Coordinator Protocol

Main thread owns every WebFetch call and every file write — that's
the security boundary (the user approves each outbound fetch).
Sub-agents return structured results; the main thread acts on them.
Iterative loop, capped at 3:

for each iteration (max 3):
    1. Main thread dispatches agents with available context
    2. Agents return structured results (findings + fetch requests)
    3. Main thread WebFetches requested URLs (user approves)
    4. If agents reported confidence > 0.8 with no new URLs: stop
    5. Otherwise: feed fetched content back to agents for next iteration

Model Assignment

Each agent's frontmatter (agents/*.md) declares its model. The
assignments balance task type against cost — mechanical search and
verification on sonnet, deep extraction and synthesis on opus.
See references/research-basis.md §Model Assignment by Agent Role
for the evidence-backed rationale per role.

Iteration 1 — Discovery:

  • Dispatch one research-discovery agent per dimension. The agent's
    frontmatter defines model, tools, and background mode. Provide
    DIMENSION, PROJECT_DESCRIPTION, and SEARCH_QUERIES in the invocation
    prompt
  • Agent returns: URL manifest, preliminary findings from search snippets,
    confidence score, open questions
  • Counter-Discovery (unless user chose "Skip" in Phase 0 Step 4):
    dispatch one research-counter-discovery agent per dimension alongside
    the Discovery agent. Provide DIMENSION, PROJECT_DESCRIPTION,
    RESEARCH_QUESTION, and COUNTER_SEARCH_QUERIES in the invocation prompt.
    Counter-Discovery agents seek contradicting evidence, failure cases, and
    minority viewpoints. Their URLs merge into the same manifest pool — no
    tagging distinguishes counter-sources from supporting sources. If the
    user chose "Find and gate" and a Counter-Discovery agent returns
    confidence < 0.3 with no URLs, surface this to the user before
    proceeding to iteration 2
  • Main thread collects all URL manifests across all agents (Discovery +
    Counter-Discovery)

Multi-engine augmentation (between iterations 1 and 2):

  • After collecting discovery agent URL manifests, the coordinator runs
    scripts/multi_search.py for each dimension's top search queries to
    pull results from DuckDuckGo (and any additional engines configured).
    The script lives with the installed skill; invoke it by absolute path
    so it works regardless of the current working directory:

    Linux/macOS:

    ~/.claude/skills/cited-research/.venv/bin/python \
        ~/.claude/skills/cited-research/scripts/multi_search.py \
        --query "..." --limit 10

    Windows (git-bash):

    ~/.claude/skills/cited-research/.venv/Scripts/python.exe \
        ~/.claude/skills/cited-research/scripts/multi_search.py \
        --query "..." --limit 10

    The skill's .venv is populated once via make install-dev inside
    ~/.claude/skills/cited-research/ — see the repo README for install
    steps. If the venv is missing, the skill still works with degraded
    coverage (sub-agents' WebSearch results only); report this to the user
    once per session and proceed.

  • Merge the script's URLs into the URL manifest pool alongside the
    agents' WebSearch results

  • Deduplicate the combined pool by exact URL before fetching

  • 84.9% of search results are unique to a single engine [§Multi-Engine
    Search Diversity in references/research-basis.md] — this step
    structurally reduces single-engine bias in the citation pool

  • The coordinator invokes the script, not the sub-agents. This preserves
    the security boundary where the user sees every outbound action

Iteration 2 — Deep read:

  • Main thread batch-fetches all URLs from all manifests via WebFetch
  • When fetching fails, attempt WebSearch fallbacks before passing results
    to agents — handle the 20-30% inaccessibility expectation at this layer
  • Dispatch one research-analysis agent per dimension. Provide
    DIMENSION, PROJECT_DESCRIPTION, FETCHED_DIR, and DATA_TYPE in the
    invocation prompt
  • Agent returns: extracted data with citations, follow-up URL requests
    (if any), updated confidence score

Source triage (between iteration 2 and 3):

After iteration 2, review the fetch results for high-priority sources that
failed. If any Tier 1-2 sources (peer-reviewed, institutional, government)
were inaccessible and the data they were expected to provide feeds into key
claims or calculations, present them to the user before proceeding. Users
often have institutional access, cached copies, or bookmarks that resolve
sources agents cannot reach. If the user provides content, write it to the
temp directory and include it in the next iteration's agent prompts.

See references/research-basis.md §Source Triage as Human Gate for evidence.

Iteration 3 — Gap-fill (conditional):

  • Only runs if any agent reported confidence < 0.8 or requested follow-up
    URLs after iteration 2
  • Main thread fetches follow-up URLs
  • Agents process remaining content, finalize findings
  • No further iterations regardless of confidence

Providing Fetched Content to Agents

When the main thread fetches URLs for iteration 2+ or for the
citation audit, persist each page's extracted text under
./.tmp-cited-research/<topic-slug>/ via the put_data.py
wrapper, then pass that directory to the agent prompt.

Bootstrap first. Before the first put_data.py call for a
topic, run bash ~/.claude/skills/cited-research/scripts/bootstrap_tmp.sh <topic-slug>. The bootstrap script provisions the parent
./.tmp-cited-research/ and its .gitignore of *, then wipes
and recreates the slug subdir. put_data.py refuses to create the
slug root itself — running it without the bootstrap returns a
fail-fast error pointing at this step. The hard failure exists so
the parent .gitignore protection cannot be silently bypassed,
which would risk accidentally committing fetched URLs.

Read references/data-persistence.md for the heredoc pattern,
the fetched-file header format, and the rationale for routing
every write through put_data.py rather than the Write tool.

Convergence Criteria

Stop iterating when:

  1. All agents report confidence > 0.8 and no agent requested follow-up
    URLs, OR
  2. Iteration 3 completes (hard cap regardless of confidence)

If any agent reports confidence < 0.5 after iteration 2, flag the dimension
to the user as potentially under-sourced before proceeding to iteration 3.

Structuring Research Agents

Agent definitions live in agents/ — one .md file per role with YAML
frontmatter specifying model, tools, and background mode. The coordinator
invokes agents by name and provides dimension-specific values in the
invocation prompt.

All research agent definitions include an accountability line ("a citation audit agent will
independently verify every claim you report"). The behavioral effect of this
line on LLM output is plausible but unvalidated by published research. The
real enforcement mechanism is structural: requiring inline citations forces
a model to fabricate both a false claim AND a false citation simultaneously
(the "dual-error" principle), making fabrication harder. The accountability
line reinforces the inline citation requirement. See
references/research-basis.md §Accountability Clause for evidence.

Capture Provenance, Not Just Data

Look for the chain of attribution — who originally claimed what, and through
whom. "Source X reports that Person A claimed Y, validated by Z" is far more
useful than "Source X says Y." Instruct research and verification agents to
report the full attribution chain.

Expect Source Failures

Expect 20-30% of sources to be inaccessible (403 errors, permission denials,
content mismatches, AI crawler blocking). The main thread handles WebSearch
fallbacks when URLs fail before passing results to agents. The 2-3 candidate
sources per data point planned in Phase 0 Step 3 provide the redundancy needed
to absorb this failure rate. Above 50% inaccessibility may indicate the topic
lacks accessible web sources — adjust the scope.

When PDFs fail to extract, search for the paper's title plus the specific data
point needed, check PubMed abstracts, or look for citing secondary sources.
Never silently drop a source — record the gap visibly in the citation entry.

Phase 2: Organization

Before writing the deliverable, organize all research into the reference files
and citations file. This forces you to confront what you actually found (vs.
what you think you found) and creates the audit trail reviewers will check.

Write citations.md and all reference files simultaneously — they're
independent. The deliverable depends on them, so it comes after.

Building citations.md

See references/citation-format.md for the entry format and rules. Key
principles: number sequentially, include the specific data extracted (not just
"useful article"), flag source quality concerns, and mark retracted sources
rather than deleting them (keeps citation numbers stable).

Building references/<topic>.md

Each reference file should:

  • State what dimension it covers
  • Link to `citations.md` for source details
  • Present data in tables where possible (easier to audit)
  • Quote sources directly when precision matters
  • Cite every fact with [N]
  • End with a "Gaps and Limitations" section
  • Flag interpolated or estimated values with "(est.)"

Calculation Discipline

  • Carry at least 2 significant digits through calculations
  • Show your math: "2.7 kg ÷ 0.73 kg = 3.70×"
  • Round consistently toward the conservative/safe side
  • Mark every interpolated or derived value with "(est.)"

Phase 3: Writing the Deliverable

The Accountability Clause

Two independent review agents will audit this document — one checks every
cited URL against source content, the other checks numerical and logical
consistency. The real protection is structural: every claim requires an inline
citation, and fabricating both a false claim and a matching false citation is
harder than fabricating either alone (the "dual-error" principle). The review
step catches what slips through.

Writing Rules

  1. Every factual claim gets a citation number. If you cannot cite it, find
    a source or explicitly mark it as inference/estimate with reasoning shown.

  2. Distinguish data from inference. Show calculations and cite both inputs
    for derived values. Label clearly: "Calculated from [3] and [7]."

  3. Qualify uncertainty precisely:

    • "Source [3] reports X" — verified single-source fact
    • "Based on [3] and [7], approximately X" — derived estimate
    • "No published data found; [3] suggests Y as a proxy" — acknowledged gap
    • "~3.3 kPa (est.)" — interpolated, flagged
  4. Do not round aggressively. If the source says 2.7, write 2.7, not
    "about 3." Rounding obscures precision and makes audit harder.

  5. Surface contradictions, don't suppress them. When sources disagree,
    state the disagreement explicitly with citations to both sides. Do not
    silently select one interpretation — the reader needs to see the conflict
    to assess which source to trust. See references/research-basis.md
    §Contradiction Transparency.

  6. State limitations prominently. The user trusts you more when you show
    what you don't know.

  7. Cross-file consistency is your responsibility. Verify every number in
    the deliverable matches the corresponding reference file before finalizing.

  8. Use relative links between files. When referencing another file in the
    topic directory (e.g., citations.md, a reference file, the README), use a
    markdown link ([citations](citations.md)) rather than just naming the
    file. This makes the documents navigable in any markdown viewer or when
    rendered as HTML.

  9. Review cross-source synthesis carefully. Claims that draw on multiple
    sources for a single conclusion are where LLMs are weakest — individual
    document analysis is strong, but narrative integration across papers is a
    documented limitation (see references/research-basis.md §Cross-Source
    Synthesis Limitation). Flag cross-document claims for operator review when
    they underpin key conclusions.

Reflection Before Finalizing

After assembling the draft deliverable and before writing the README,
perform one reflection pass. Ask yourself:

Is there anything I overlooked, a claim I stated with more confidence
than the source supports, an alternative interpretation I dismissed, or
a contradiction I suppressed? Revise accordingly.

This is a single pass, not a chain-of-thought expansion. LLM self-correction
fails 64.5% of the time, but a single reflection prompt reduces that
blind-spot rate by 89.3% [§Self-reflection Intervention in
references/research-basis.md]. Keep it to one turn to preserve the
"fast thinking" design principle.

Writing the README

The README is written last. It distills the deliverable into:

  • One-paragraph summary of what the document answers
  • The key table or result
  • A quick decision framework (3-5 steps)
  • Links to the supporting files for full methodology

It should stand alone — a reader who never opens the supporting files still gets an
actionable answer.

Phase 4: Verification

After all files are written, launch the citation-audit and
consistency-review agents in parallel. These agents receive NO context
from the research conversation — they read only the produced files. This
isolation prevents confirmation bias. Agent frontmatter defines model and
tools (see references/research-basis.md §Model Assignment by Agent Role).

Pre-Fetch for Citation Audit

Before dispatching the Citation Audit agent, the main thread pre-fetches all
cited URLs so the audit agent does not need WebFetch:

  1. Read citations.md and extract every cited URL
  2. Batch-fetch all URLs via WebFetch (the user approves once per batch)
  3. For URLs that fail, attempt WebSearch fallbacks from the main thread
  4. Persist fetched content to ./.tmp-cited-research/<topic-slug>/ via
    put_data.py (see Phase 1 §Providing Fetched Content for invocation)
  5. Dispatch the citation-audit agent with DELIVERABLE_DIR, FETCHED_DIR,
    and SLUG in the invocation prompt — the agent reads files via Read
    tool, does not fetch URLs itself, and persists its report via
    put_data.py to ./.tmp-cited-research/<SLUG>/audit/citation-audit.md

All web access happens in the main thread before the agents run. This
ensures the audit runs against the same source content snapshot and
avoids the sub-agent permission boundary. Agent tool restrictions (Read,
Glob, and Bash — Bash scoped to the put_data.py allowlist rule) are
enforced by agent definition frontmatter.

Dispatch the consistency-review agent with DELIVERABLE_DIR and SLUG in
the invocation prompt.

Promoting Audit Reports to the Deliverable Directory

After both audit agents complete, read each report back from
./.tmp-cited-research/<topic-slug>/audit/ and write it via the Write
tool to its final home under <deliverable-dir>/audit/:

  • <deliverable-dir>/audit/citation-audit.md
  • <deliverable-dir>/audit/consistency-review.md

The tmp copy is the agent's write surface; the deliverable copy is the
canonical artifact that ships with the research directory. This two-step
keeps the audit agents on a single allowlisted Bash rule
(put_data.py) while preserving the deliverable layout.

Handling Verification Results

After both sub-agents complete:

  1. For INACCURATE or NOT FOUND citations: correct the claim in all files to
    match what the source actually says, or remove the claim and note the gap.
  2. For INACCESSIBLE sources: the main thread attempts alternative WebSearch
    queries and re-fetches. If unsuccessful, downgrade the claim to "unverified"
    with a note.
  3. For FAIL consistency checks: reconcile across all files — fix every file
    that references the incorrect value.
  4. If corrections are significant (>3 items), re-run the affected sub-agent.
  5. Update audit reports after fixes. Add **Status: RESOLVED** to each
    fixed issue. Audit reports describing fixed issues as still open confuse
    future readers.

Present the verification summary to the user along with the deliverable.

Resolving Inaccessible Sources — Ask the User

High-priority inaccessible sources (Tier 1-2) should already have been
triaged with the user between iterations 2 and 3 (see Phase 1). For any
remaining inaccessible sources discovered during verification, ask the user
for help before marking claims as permanently unverified. Prioritize sources
with quantitative claims that feed into calculations or conclusions.

Re-Verification on Revisit

Source accessibility changes over time. When revisiting a project, re-check all
INACCESSIBLE and UNVERIFIED sources. Update citation entries, audit reports,
and summary counts for any newly accessible sources. Propagate any new
discoveries to the relevant reference files.

Adapting to Scale

Scale the methodology proportionally to the task:

  • Medium research (3-5 dimensions): Full methodology. Sub-agents can
    spot-check rather than exhaustively audit every citation.
  • Major research (6+ dimensions, decision-critical): Full methodology with
    exhaustive audit. Consider having the user review reference files before
    writing the summary.

The non-negotiable elements at every scale:

  1. Claims come from web sources visited in-session
  2. URLs are recorded
  3. Writing and verification are separate steps
  4. The writer knows verification will happen

Phase 5: Index Maintenance (cited-research repo only)

Skip this phase when output is not in the cited-research repo.
Otherwise, after Phase 4 verification and corrections, update
index.md at the repo root. Read
references/index-maintenance.md for the entry format and update
rules.