QuixiAI

knowledge-ingest

Ingest URLs, documents, and text into the memory system as structured knowledge

QuixiAI 585 73 Updated 3mo ago
GitHub

Install

npx skillscat add quixiai/hexis/knowledge-ingest

Install via the SkillsCat registry.

SKILL.md

Knowledge Base Ingestion

Transform external content -- web pages, documents, raw text -- into structured semantic memories that persist in the knowledge graph.

When to Use

  • When the user shares a URL and says "learn this" or "remember this article"
  • When a research workflow finds valuable sources that should be retained long-term
  • When the user pastes raw text (notes, transcripts, outlines) to be ingested
  • During heartbeats when a goal involves building knowledge on a specific topic
  • When importing reference material for a project or domain

Step-by-Step Methodology

  1. Assess the source: Before ingesting, determine what kind of content it is (article, documentation, transcript, raw notes). This guides how aggressively to summarize.
  2. Fetch and parse: For URLs, use ingest_url which handles fetching, HTML-to-text conversion, and chunking. For raw text, use ingest_text directly.
  3. Check for duplicates: Use recall with the URL or a key phrase from the content to see if it has already been ingested. Avoid storing the same source twice.
  4. Chunk intelligently: Long content is automatically chunked by the ingestion pipeline. Each chunk becomes a separate semantic memory linked by source metadata. Trust the pipeline's chunking; do not manually split content unless it is clearly failing.
  5. Add context: When storing via remember, include metadata about the source: URL, author, date published, and why it was ingested (which goal or topic it serves).
  6. Verify ingestion: After ingestion completes, run a quick recall on a key concept from the content to confirm it is retrievable.
  7. Connect to goals: If the ingested content relates to an active goal, note the connection so future heartbeats can leverage it.

Quality Guidelines

  • Prefer ingesting authoritative, primary sources over summaries or aggregators.
  • Do not ingest entire websites. Be selective -- ingest the specific pages that contain the needed information.
  • When ingesting long documents, let the chunking pipeline do its job. Each chunk retains a reference to the parent source.
  • Always record the source URL or origin. Memories without provenance are harder to evaluate and update later.
  • Respect rate limits and robots.txt when fetching URLs. If a fetch fails, note the failure and move on rather than retrying aggressively.
  • For sensitive or private content (internal docs, personal notes), ensure the user understands that ingested content persists in the local database.