ocas-sift

Use when searching the web, synthesizing research across multiple sources, verifying facts, summarizing documents, or extracting structured entities. The system's general research engine for topic research, web lookups, fact-checking, comparisons, and deep multi-source sessions. Trigger phrases: 'search for', 'look up', 'research this topic', 'fact check', 'compare', 'summarize this', 'what is', 'find information about', 'update sift'. Do not use for person-focused OSINT investigations (use Scout) or image processing (use Look).

indigokarasu 0 Updated 3mo ago

Resources

GitHub

Install

npx skillscat add indigokarasu/sift

Install via the SkillsCat registry.

SKILL.md

Sift

Sift is the system's general research engine, retrieving and synthesizing information from the web across a tiered source hierarchy — internal knowledge first, then free web search, then rate-limited semantic research providers for deep work. It evaluates source reliability through cross-source agreement scoring, extracts structured entities from retrieved content, and emits enrichment candidates to Chronicle so researched knowledge accumulates over time.

When to use

Web search and research synthesis on any topic
Fact verification across multiple sources with consensus scoring
Document summarization and structured entity extraction
Comparison research across products, technologies, or options
Deep research sessions with multi-source threading

When not to use

OSINT investigations on individuals — use Scout
Image-to-action processing — use Look
Pattern analysis on the knowledge graph — use Corvus
Communications and message drafting — use Dispatch

Sift never performs OSINT investigations on individuals. If the primary entity of a query is a person, Scout should be invoked.

Responsibility boundary

Sift owns web research, fact verification, and structured entity extraction.

Sift does not own: person-focused OSINT (Scout), image processing (Look), knowledge graph writes (Elephas), pattern analysis (Corvus), social graph (Weave).

Commands

sift.search — execute a search query with automatic tier selection and query rewriting
sift.research — run a multi-source research session producing a structured research journal
sift.verify — fact-check a specific claim across multiple sources with consensus scoring
sift.summarize — summarize a document or URL with structured entity extraction
sift.extract — extract entities, claims, statistics, and relationships from content
sift.thread.list — list active research threads with entity overlap detection
sift.status — return current state: active threads, quota usage, source reputation summary
sift.journal — write journal for the current run; called at end of every run
sift.update — pull latest from GitHub source; preserves journals and data

Response modes

Sift classifies query depth automatically:

quick_answer — simple factual lookups, single-source sufficient
comparison — multi-source comparison with structured output
research — deep multi-session investigation with threading
document_analysis — URL or document-focused extraction

Users may override with phrases like "quick answer", "deep dive", "compare", or "summarize".

Search tier selection

Tier 1 — Internal Knowledge: LLM knowledge, conversation context, Chronicle if available.
Tier 2 — Free Web Search: Brave Search API, SearXNG, DuckDuckGo. Default for all queries.
Tier 3 — Semantic Research: Exa, Tavily. Deep research with sparse sources only. Quota-limited.

Read references/search_tiers.md for provider details and escalation criteria.

Source reputation model

Sift maintains per-domain trust scores based on: cross-source agreement, contradiction frequency, historical accuracy, structured data quality, citation frequency.

Structured extraction rules

When pages are retrieved, extract: entities (with type from shared ontology), claims, statistics, relationships, citations. Each extraction includes confidence level.

Extracted entities are emitted as enrichment candidates for Elephas.

Run completion

After every Sift command that produces results:

Persist session, entities, sources, and decisions to local JSONL files
For each extracted entity or relationship with confidence >= med: write a Signal file to ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json. Use Signal schema from spec-ocas-shared-schemas.md.
Write journal via sift.journal

Chronicle interaction

Sift never writes directly to Chronicle. It emits enrichment candidates via Signal files to ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json. Elephas decides promotion.

Inter-skill interfaces

Sift writes Signal files to Elephas intake: ~/openclaw/db/ocas-elephas/intake/{signal_id}.signal.json

Sift may read from Thread (when present) for recent browsing context to improve query rewriting. This is a cooperative read, not a dependency.

See spec-ocas-interfaces.md for signal format.

Storage layout

~/openclaw/data/ocas-sift/
  config.json
  sessions.jsonl
  threads.jsonl
  entities.jsonl
  sources.jsonl
  decisions.jsonl
  reports/

~/openclaw/journals/ocas-sift/
  YYYY-MM-DD/
    {run_id}.json

Default config.json:

{
  "skill_id": "ocas-sift",
  "skill_version": "2.3.0",
  "config_version": "1",
  "created_at": "",
  "updated_at": "",
  "search": {
    "default_tier": 2,
    "tier3_daily_limit": 50
  },
  "retention": {
    "days": 30,
    "max_records": 10000
  }
}

OKRs

Universal OKRs from spec-ocas-journal.md apply to all runs.

skill_okrs:
  - name: source_accuracy
    metric: fraction of extracted facts confirmed by cross-source agreement
    direction: maximize
    target: 0.85
    evaluation_window: 30_runs
  - name: tier3_quota_compliance
    metric: fraction of days where Tier 3 usage stays within daily limit
    direction: maximize
    target: 1.0
    evaluation_window: 30_runs
  - name: entity_extraction_precision
    metric: fraction of extracted entities with valid source reference
    direction: maximize
    target: 0.90
    evaluation_window: 30_runs

Optional skill cooperation

Elephas — emit Signal files for Chronicle promotion after every extraction
Thread — may read recent browsing context for query rewriting (cooperative, not required)
Weave — may use Weave for entity disambiguation
Chronicle — may read Chronicle (read-only) for entity context

Journal outputs

Observation Journal — search and extraction runs
Research Journal — structured multi-source research sessions

Initialization

On first invocation of any Sift command, run sift.init:

Create ~/openclaw/data/ocas-sift/ and subdirectories (reports/)
Write default config.json with ConfigBase fields if absent
Create empty JSONL files: sessions.jsonl, threads.jsonl, entities.jsonl, sources.jsonl, decisions.jsonl
Create ~/openclaw/journals/ocas-sift/
Ensure ~/openclaw/db/ocas-elephas/intake/ exists (create if missing)
Register cron job sift:update if not already present (check openclaw cron list first)
Log initialization as a DecisionRecord in decisions.jsonl

Background tasks

Job name	Mechanism	Schedule	Command
`sift:update`	cron	`0 0 * * *` (midnight daily)	`sift.update`

openclaw cron add --name sift:update --schedule "0 0 * * *" --command "sift.update" --sessionTarget isolated --lightContext true --timezone America/Los_Angeles

Self-update

sift.update pulls the latest package from the source: URL in this file's frontmatter. Runs silently — no output unless the version changed or an error occurred.

Read source: from frontmatter → extract {owner}/{repo} from URL
Read local version from skill.json
Fetch remote version: gh api "repos/{owner}/{repo}/contents/skill.json" --jq '.content' | base64 -d | python3 -c "import sys,json;print(json.load(sys.stdin)['version'])"
If remote version equals local version → stop silently

Download and install:

TMPDIR=$(mktemp -d)
gh api "repos/{owner}/{repo}/tarball/main" > "$TMPDIR/archive.tar.gz"
mkdir "$TMPDIR/extracted"
tar xzf "$TMPDIR/archive.tar.gz" -C "$TMPDIR/extracted" --strip-components=1
cp -R "$TMPDIR/extracted/"* ./
rm -rf "$TMPDIR"

On failure → retry once. If second attempt fails, report the error and stop.
Output exactly: I updated Sift from version {old} to {new}

Visibility

public

Support file map

File	When to read
`references/schemas.md`	Before creating sessions, threads, or extraction records
`references/search_tiers.md`	Before tier selection or escalation
`references/query_rewrite.md`	Before query rewriting
`references/journal.md`	Before sift.journal; at end of every run

ocas-sift

Resources

Install

Sift

When to use

When not to use

Responsibility boundary

Commands

Response modes

Search tier selection

Source reputation model

Structured extraction rules

Run completion

Chronicle interaction

Inter-skill interfaces

Storage layout

OKRs

Optional skill cooperation

Journal outputs

Initialization

Background tasks

Self-update

Visibility

Support file map

Categories

Install

Recommended Skills