Resources
19Install
npx skillscat add georgesalkhouri/docctl Install via the SkillsCat registry.
SKILL.md
docctl Skill
When to use
- Use for document-grounded Q&A over local corpora (
.pdf,.docx,.txt,.md). - Use when answers must include provenance (
source,title,chunk_id). - Use when the agent can execute shell commands.
Scope and non-scope
- In scope:
docctl ingest,search,show,stats,doctor, andsessionorchestration.- Full lifecycle behavior: bootstrap ingest plus retrieval loops.
- Metadata-constrained retrieval using
doc_id,source, andtitle. - Optional rerank controls and interpretation in retrieval workflows (
--rerank,--rerank-candidates, sessionrerank,rerank_candidates).
- Out of scope (agent-owned responsibilities):
- Query rewriting and query decomposition.
- Conversation context handling and prior-turn memory policy.
- Project-specific instruction interpretation and policy reasoning.
- Hybrid keyword/full-text retrieval design.
- Reranker model training/tuning and low-level scoring implementation.
Inputs and assumptions
- Expected inputs:
- user question,
- user-preferred response language when observable from the conversation,
- optional corpus path(s),
- optional retrieval filters (
doc_id,source,title), - optional index settings.
- Default CLI assumptions:
--index-path ./.docctl--collection default--jsonenabled for machine consumption.
- Safety assumption:
ingestis mutating and should only run under explicit lifecycle conditions.
- Language assumption:
- infer a working retrieval language from the strongest available signal in this order:
- explicit user instruction,
- language of the latest user turn,
- language used by retrieved evidence or cited answers in the active workflow.
- if the user asks in one language but the evidence and grounded answer are clearly in another, switch query rewriting and follow-up searches to the evidence language while preserving the user's requested answer language unless they ask to change it.
- infer a working retrieval language from the strongest available signal in this order:
Operational workflow (ordered)
- Run readiness checks.
- Default to
docctl catalogfor readiness and full index inventory. - Run
docctl statsonly when quick aggregate counts are specifically needed. - Run
docctl doctoronly when diagnostics are needed (for example command failures, config issues, or unexpected index behavior).
- Default to
- Apply bootstrap ingest rules (full lifecycle).
- If index is missing or empty, run
docctl ingest <path>. - Reingest only on explicit user intent or stale corpus signals (file updates/new files).
- If index is missing or empty, run
- Prepare retrieval query in the agent layer.
- Rewrite/expand/paraphrase outside
docctlif needed. - Choose the search language from the current working retrieval language, not only from the latest user wording.
- If the user asks in English but the relevant evidence and grounded answers are in German, reformulate the next retrieval attempts in German.
- Rewrite/expand/paraphrase outside
- Execute retrieval (session-first).
- Primary:
docctl sessionwithop:"search"for iterative loops. - For two or more read operations in one workflow, open one
docctl sessionand send multiple NDJSON requests in that session. - Do not run multiple sequential one-shot read commands via repeated tool calls when
sessionis available. - Secondary fallback: one-shot
docctl search.
- Primary:
- Run bounded evidence expansion loop.
- If no or weak results, broaden query and/or relax filters.
- If results indicate the corpus language differs from the current query language, retry in the corpus language before exhausting attempts.
- Increase
top_kper policy and retry up to max attempts.
- Inspect top evidence chunks.
- Call
showfor selected chunk IDs before synthesis when precision matters. - Treat high-value returned sentences/snippets as a lead and inspect the full returned chunk before final synthesis to capture qualifiers and surrounding context.
- Call
- Synthesize answer with explicit citations.
- Include provenance and state uncertainty when evidence is insufficient.
Tool guidance (docctl command contracts)
ingest:- Mutating operation.
- Use when index is uninitialized/empty or corpus is stale.
- Avoid repeated reingest unless needed.
search:- Use for one-shot retrieval.
- Do not chain multiple
search/show/stats/catalogcalls via separate tool invocations for the same workflow; switch tosession. - Relevant options:
--doc-id,--source,--title,--top-k,--min-score,--rerank,--rerank-candidates. - Rerank constraints: candidate depth must be in
[1, 100]and greater than or equal totop_k.
session:- Use for iterative retrieval workflows.
- Preferred default for multi-step work: keep one session open and submit all read operations (
search,show,stats,catalog,doctor) as NDJSON lines. - Supported operations:
search,show,stats,catalog,doctor. - Search request accepts optional fields:
doc_id,source,title,top_k,min_score,rerank,rerank_candidates.
show:- Use to inspect and quote exact chunk evidence by
chunk_id.
- Use to inspect and quote exact chunk evidence by
stats:- Do not run by default in retrieval loops.
- Use when quick aggregate counts are needed.
catalog:- Use to inspect per-document inventory (
doc_id,source,title,units,chunks) with summary stats.
- Use to inspect per-document inventory (
doctor:- Do not run by default in retrieval loops because it adds latency.
- Use only to diagnose environment/config failures or unexpected runtime behavior.
Retrieval policy defaults
- Attempt 1 (baseline):
top_k=5, user query as-is, apply user-specified filters.
- Attempt 2 (broaden):
top_k=10, relax restrictive filters unless user explicitly requires them.
- Attempt 3 (final):
- rewrite/broaden query in agent layer, keep only essential filters.
- Hard stop after 3 attempts.
- Evidence selection rule:
- prioritize chunks with clear provenance and direct semantic match.
Failure handling and recovery
- Missing corpus path and empty index:
- ask for corpus path or explicit permission to ingest known path.
- Empty index:
- ingest if allowed by lifecycle policy; otherwise return actionable instruction.
- Tool or schema errors:
- surface exact corrective action (for example invalid field type in session request).
- Cross-language mismatch:
- if the question language and evidence language diverge, state that the search language was switched to match the indexed evidence and keep citations in the original evidence language.
- No verifiable evidence after bounded retries:
- return
cannot verify from indexed documentsand list missing information.
- return
Output contract for downstream agents
Return a structured payload (or equivalent human-readable response) with:
answer: grounded response text.answer_language: language used for the final answer.search_language: language used for the final retrieval attempt.citations: list of objects with:sourcechunk_idtitle
confidence: one ofhigh,medium,low.limitations: explicit gaps or uncertainty.next_actions: concrete follow-up steps.
Minimal examples
CLI ingest/catalog/search/show:
uv run docctl --index-path ./.docctl --collection default --json ingest ./docs --recursive --allow-model-download
uv run docctl --index-path ./.docctl --collection default --json catalog
uv run docctl --index-path ./.docctl --collection default --json search "gateway diagnostics" --top-k 5 --title "operations-manual"
uv run docctl --index-path ./.docctl --collection default --json show <chunk_id>NDJSON session loop:
cat <<'EOF' | uv run docctl --index-path ./.docctl --collection default session
{"id":"q1","op":"search","query":"gateway diagnostics","top_k":5,"title":"operations-manual"}
{"id":"q2","op":"show","chunk_id":"<chunk_id-from-q1>"}
{"id":"q3","op":"catalog"}
EOFEvaluation checklist
- Trigger correctness:
- skill is used only for docctl-retrieval tasks, not unrelated workflows.
- Tool-use correctness:
- sequence follows readiness -> ingest-if-needed -> retrieval -> evidence inspection.
- Citation completeness:
- claims map to retrieved chunks with
sourceand chunk metadata.
- claims map to retrieved chunks with
- Boundary adherence:
- query rewriting is handled by the agent, not attributed to
docctl.
- query rewriting is handled by the agent, not attributed to
- Regression control:
- rerun workflow checks after any meaningful change to this skill text.
Accuracy note on ranking behavior
- Default
docctlretrieval ranking is vector-distance based when reranking is not enabled. docctlsupports opt-in two-stage reranking: vector retrieval for candidates, then local cross-encoder reranking.- When reranking is enabled, hits may include
vector_rankandrerank_scorein addition to base hit fields.