This skill should be used when the user asks to "process this transcript", "convert lecture to notes", "run transcript pipeline", "generate class tutorial from Zoom captions", "validate transcript coverage", or "enrich class resources" (Notion/Canva/Drive links) for bootcamp notes.
Resources
5Install
npx skillscat add prakharmnnit/skills-and-personas/transcript-pipeline Install via the SkillsCat registry.
Transcript Pipeline Skill
Run a deterministic, auditable transcript-to-tutorial workflow with optional resource enrichment.
Purpose
Use this skill to convert raw class captions into high-quality study notes while preserving accountability through ledger + validation artifacts.
Use scripts for deterministic work. Use chat/stage prompts for language-heavy transformation.
Core Contract
- Keep stage order: ingest -> refine -> synthesize -> enhance -> validate -> publish.
- Run deterministic gates with scripts, never with LLM self-certification.
- Preserve traceability in
.pipeline/*artifacts. - Keep learner-facing notes readable and sanitized.
- Treat validation status as PASS/FAIL source of truth.
Scripts
Use these scripts from scripts/:
ingest_zoom_captions.py- deterministic ingestion and segment ledger creationrun_chat_pipeline.py- guided orchestration for stage handoffs and validationvalidate_coverage.py- hard-gate coverage validationpublish_tutorial_notes.py- learner-facing file naming and sanitizationmerge_chunks.py- merge chunk outputs for large transcriptsrun_colab_notebook_pipeline.py- AI/ML Colab appendix and code explainer pipelineupdate_ai_notes_with_resources_and_colab.py- AI/ML notes enrichment utilityresource_enrichment.py- authenticated enrichment for Notion/Canva/Drive resources
Stage Workflow
Stage 0: Ingest (Deterministic)
Run:
python scripts/ingest_zoom_captions.py "<transcript_or_session_path>"Required outputs:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl
Stage 1: Refine (Chat Stage)
Load references/stage1-refine.md.
Produce:
.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json
Stage 2: Synthesize (Chat Stage)
Load references/stage2-synthesize.md.
Produce:
.pipeline/structured_notes.md.pipeline/coverage_matrix.json
Stage 3: Enhance (Chat Stage)
Load:
references/stage3-enhance.mdreferences/tutorial-tech-bar-raiser.md
Produce:
.pipeline/enhanced_notes.mdfinal_notes.mdbootcamp_index.md
Stage 4: Validate (Deterministic)
Run:
python scripts/validate_coverage.py --pipeline-dir .pipelineValidation guidance: references/stage4-validate.md.
Hard gates:
- Segment coverage accountability
- Uncertainty retention
- No orphan claims
Stage 5: Publish
Run:
python scripts/publish_tutorial_notes.py --root "<sessions_root>" --session-dir "<session_dir>"Result:
- Published tutorial filename in canonical format
- Learner-safe note without noisy source tags
- Updated course index links
One-Command Guided Mode
Use guided runner for chat-window workflows:
python scripts/run_chat_pipeline.py run "<transcript_or_session_path>" --deep-passThis enforces required handoffs and deep quality gates.
Optional Resource Enrichment Stage
Run when class notes include external links (Notion/Canva/Drive):
python scripts/resource_enrichment.py --all-sessionsSingle session:
python scripts/resource_enrichment.py --session-dir "<session_dir>"Auth options:
- Notion:
NOTION_TOKEN_V2,NOTION_ACTIVE_USER - Canva:
RESOURCE_PLAYWRIGHT_STORAGE_STATE
Reference: references/resource-enrichment-authenticated-flow.md.
Optional AI/ML Colab Enrichment
Run for Colab-backed AI/ML classes:
python scripts/run_colab_notebook_pipeline.pyReference: references/colab-notebook-explainer-pipeline.md.
Large Transcript Handling
If input exceeds context comfort:
- Run Stage 1 by chunks.
- Merge chunk artifacts:
python scripts/merge_chunks.py --chunk-dirs "<chunkA/.pipeline>" "<chunkB/.pipeline>" --output-dir "<session/.pipeline>"- Continue Stage 2 onward on merged artifacts.
Required Outputs Checklist
Learner-facing:
final_notes.md<Domain> Class <NN> [DD-MM-YYYY] - <Topic>.mdbootcamp_index.md
Pipeline/audit:
.pipeline/segment_ledger.jsonl.pipeline/segment_manifest.jsonl.pipeline/refined_transcript.md.pipeline/topic_inventory.json.pipeline/corrections_log.csv.pipeline/uncertainty_report.json.pipeline/structured_notes.md.pipeline/coverage_matrix.json.pipeline/enhanced_notes.md.pipeline/validation_report.md.pipeline/exceptions.json(if fail)
Quality gates:
.pipeline/deep_pass_report.md(when--deep-pass).pipeline/deep_pass_exceptions.json(when--deep-pass)
Resource enrichment (optional):
.resources/resource_enrichment_report.json
Execution Rules
- Fail fast on missing required artifacts.
- Report missing outputs explicitly by file path.
- Retry only from earliest failing stage.
- Keep resource extraction status explicit (success/fallback/blocked).