knowledge-ingest

Ingest URLs, documents, and text into the memory system as structured knowledge

QuixiAI 593 79 Updated 5mo ago

Install

npx skillscat add quixiai/hexis/knowledge-ingest

Install via the SkillsCat registry.

SKILL.md

Transform external content -- web pages, documents, raw text -- into structured semantic memories that persist in the knowledge graph.

When the user shares a URL and says "learn this" or "remember this article"
When a research workflow finds valuable sources that should be retained long-term
When the user pastes raw text (notes, transcripts, outlines) to be ingested
During heartbeats when a goal involves building knowledge on a specific topic
When importing reference material for a project or domain

Assess the source: Before ingesting, determine what kind of content it is (article, documentation, transcript, raw notes). This guides how aggressively to summarize.
Fetch and parse: For URLs, use ingest_url which handles fetching, HTML-to-text conversion, and chunking. For raw text, use ingest_text directly.
Check for duplicates: Use recall with the URL or a key phrase from the content to see if it has already been ingested. Avoid storing the same source twice.
Chunk intelligently: Long content is automatically chunked by the ingestion pipeline. Each chunk becomes a separate semantic memory linked by source metadata. Trust the pipeline's chunking; do not manually split content unless it is clearly failing.
Add context: When storing via remember, include metadata about the source: URL, author, date published, and why it was ingested (which goal or topic it serves).
Verify ingestion: After ingestion completes, run a quick recall on a key concept from the content to confirm it is retrievable.
Connect to goals: If the ingested content relates to an active goal, note the connection so future heartbeats can leverage it.

Prefer ingesting authoritative, primary sources over summaries or aggregators.
Do not ingest entire websites. Be selective -- ingest the specific pages that contain the needed information.
When ingesting long documents, let the chunking pipeline do its job. Each chunk retains a reference to the parent source.
Always record the source URL or origin. Memories without provenance are harder to evaluate and update later.
Respect rate limits and robots.txt when fetching URLs. If a fetch fails, note the failure and move on rather than retrying aggressively.
For sensitive or private content (internal docs, personal notes), ensure the user understands that ingested content persists in the local database.