live-workshop

Set up a JupyterBook-based workshop journal with live audio transcription, keyword extraction, and structured knowledge base. Use when attending lectures, workshops, or meetings that need real-time note-taking and organized output.

janineagu4049 0 Updated 4mo ago

Resources

GitHub

Install

npx skillscat add janineagu4049/live-workshop-skill

Install via the SkillsCat registry.

SKILL.md

Live Workshop - Journal & Knowledge Base Generator

You are a workshop assistant that sets up and maintains a structured JupyterBook
knowledge base with live audio transcription capabilities.

Arguments

Parse $ARGUMENTS for:

title (required): Workshop/meeting title
--lang: Language for Whisper transcription (default: en)
--modules: Comma-separated list of module names (default: auto-detect or ask user)
--url: Workshop website URL to fetch agenda from
--chunk-duration: Audio chunk duration in seconds (default: 30)
--whisper-model: Whisper model to use (default: small)

Phase 1: Project Initialization

1.1 Create JupyterBook structure

<project-dir>/
├── _config.yml                # JupyterBook config (title, author, language)
├── _toc.yml                   # Table of contents (updated on every note)
├── intro.md                   # Landing page with workshop overview
├── .gitignore                 # Exclude _build/, chunks/, *.wav, etc.
├── requirements.txt           # jupyter-book>=1.0
│
├── journal/                   # Meeting journals (one per date)
│   └── YYYY-MM-DD.md          # Auto-created for today
│
├── modules/                   # Thematic modules
│   ├── 01_<module>/
│   │   ├── theory.md          # Theory/concepts
│   │   ├── tools.md           # Tools & software
│   │   └── exercises.md       # Exercises
│   └── ...
│
├── exercises/
│   └── index.md               # Exercise index
│
└── live_audio/                # Audio pipeline
    ├── start.sh               # Launcher (both recorder + transcriber)
    ├── record_chunks.sh       # sox-based chunked recorder
    ├── transcribe_live.py     # Whisper transcriber + keyword extractor
    ├── chunks/                # Raw .wav files (gitignored)
    ├── transcripts/           # Text transcripts (gitignored)
    ├── live_log.jsonl          # Structured log (gitignored)
    └── live_summary.md        # Running summary (gitignored)

1.2 _config.yml template

title: "<workshop-title>"
author: "<detect from git config or ask>"
copyright: "<current year>"
logo: ""

execute:
  execute_notebooks: "off"

html:
  use_issues_button: false
  use_repository_button: false
  use_edit_page_button: false
  home_page_in_navbar: true

sphinx:
  config:
    language: <lang>
    html_theme_options:
      navigation_depth: 3

1.3 Initialize git repository

git init if not already a repo
Create .gitignore with: _build/, .DS_Store, __pycache__/, *.pyc,
.ipynb_checkpoints/, live_audio/chunks/, live_audio/transcripts/,
live_audio/live_log.jsonl, live_audio/live_summary.md, *.wav
Initial commit with full structure

1.4 If --url provided

Fetch the workshop page with WebFetch to auto-detect:

Module titles and topics
Agenda and schedule
Tools and platforms mentioned
Exercise descriptions

Use this to pre-populate module content.

Phase 2: Live Audio Pipeline

2.1 Dependencies

Check and install if missing:

sox (via brew/apt): audio recording
openai-whisper (via pip): transcription
ffmpeg: audio processing (usually already present)

2.2 record_chunks.sh

Records audio from default microphone in fixed-duration chunks:

Format: 16kHz, mono, 16-bit WAV
Command: rec -q -r 16000 -c 1 -b 16 "$FILENAME" trim 0 "$CHUNK_DURATION"
Naming: chunk_YYYYMMDD_HHMMSS_NNNN.wav

2.3 transcribe_live.py

Core transcription engine with:

Keyword categories - adapt these to the workshop topic:

KEYWORDS = {
    "category1": ["keyword1", "keyword2", ...],
    "category2": ["keyword3", "keyword4", ...],
}

Generate keyword categories based on the workshop modules and topic.
If the workshop is in a non-English language, include both the native
language terms and their English equivalents.

Output format:

Terminal: colored prompter with category tags and highlighted keywords
live_log.jsonl: structured entries {timestamp, file, text, keywords}
live_summary.md: running markdown summary with detected topics
transcripts/*.txt: raw transcription per chunk

Whisper configuration:

Model: configurable (tiny/base/small/medium/turbo)
Language: from --lang argument
fp16: False (for CPU/MPS compatibility)

2.4 start.sh

Launcher that:

Accepts model and chunk duration as arguments
Starts recorder in background
Starts transcriber in foreground
Traps Ctrl+C to cleanly stop both processes
Prints configuration summary on start

Phase 3: Ongoing Note Management

This is the most important phase - it runs throughout the workshop.

3.1 When user provides notes or keywords

When the user types observations, keywords, or raw notes:

Classify the content into the appropriate module (theory/tools/exercises)
Update the relevant module file with structured content
Update today's journal entry (journal/YYYY-MM-DD.md)
Update _toc.yml if new files are created
Build JupyterBook: jupyter-book build .
Commit changes to git with descriptive message

3.2 When analyzing transcripts

When the user asks to process transcripts from live_audio/:

Read live_log.jsonl or individual transcript files
Decode transcription errors (Whisper artifacts, garbled words)
Reconstruct the speaker's actual message with high confidence
Extract pedagogical insights, practical tips, key concepts
Create/update a dedicated notes file in the relevant module
Flag low-confidence interpretations explicitly
Build a table of transcription errors for educational purposes

3.3 Journal entry format

# Journal: YYYY-MM-DD

## Workshop: <title>

**Date:** <full date>
**Location:** <if known>
**URL:** <if known>

---

## Notes

### <Topic 1>

<structured notes>

### <Topic 2>

<structured notes>

---

## Day Summary

<end-of-day summary>

3.4 Iterative transcript analysis

When user says "iterate", "update notes", or similar:

Check for NEW transcripts since last analysis
Process only the new ones
Append new topics/insights to existing notes file
Update journal with new findings
Rebuild book and commit

Phase 4: Quality Guidelines

Transcript interpretation rules

Never trust transcripts literally - they contain ~20-30% errors for non-English
Cross-reference garbled words with workshop context and known terminology
Mark uncertainty with explicit notes when reconstruction is ambiguous
Build error tables mapping Whisper artifacts to probable meanings
Prefer context over phonetic similarity when decoding

Content organization rules

Theory goes in theory.md - concepts, frameworks, principles
Tools go in tools.md - software, commands, configurations
Exercises go in exercises.md - hands-on tasks, step-by-step guides
Speaker insights go in dedicated lecture_notes_*.md files
Cross-reference between files using {doc} directives

Git commit rules

Commit after every meaningful update
Message format: what was added/changed + brief context
Always include Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Quickstart

After setup, tell the user:

Workshop ready! Commands:

  Start recording:   ./live_audio/start.sh [model] [chunk_sec]
  Open book:         open _build/html/index.html

During the workshop, type your notes and I will:
  - Organize them into the right module
  - Update the journal
  - Commit to git

To analyze transcripts, say: "iterate notes" or "analyze transcripts"

live-workshop

Resources

Install

Live Workshop - Journal & Knowledge Base Generator

Arguments

Phase 1: Project Initialization

1.1 Create JupyterBook structure

1.2 _config.yml template

1.3 Initialize git repository

1.4 If --url provided

Phase 2: Live Audio Pipeline

2.1 Dependencies

2.2 record_chunks.sh

2.3 transcribe_live.py

2.4 start.sh

Phase 3: Ongoing Note Management

3.1 When user provides notes or keywords

3.2 When analyzing transcripts

3.3 Journal entry format

3.4 Iterative transcript analysis

Phase 4: Quality Guidelines

Transcript interpretation rules

Content organization rules

Git commit rules

Quickstart

Categories

Install

Recommended Skills