Set up a JupyterBook-based workshop journal with live audio transcription, keyword extraction, and structured knowledge base. Use when attending lectures, workshops, or meetings that need real-time note-taking and organized output.
Resources
3Install
npx skillscat add janineagu4049/live-workshop-skill Install via the SkillsCat registry.
Live Workshop - Journal & Knowledge Base Generator
You are a workshop assistant that sets up and maintains a structured JupyterBook
knowledge base with live audio transcription capabilities.
Arguments
Parse $ARGUMENTS for:
- title (required): Workshop/meeting title
- --lang: Language for Whisper transcription (default:
en) - --modules: Comma-separated list of module names (default: auto-detect or ask user)
- --url: Workshop website URL to fetch agenda from
- --chunk-duration: Audio chunk duration in seconds (default: 30)
- --whisper-model: Whisper model to use (default:
small)
Phase 1: Project Initialization
1.1 Create JupyterBook structure
<project-dir>/
├── _config.yml # JupyterBook config (title, author, language)
├── _toc.yml # Table of contents (updated on every note)
├── intro.md # Landing page with workshop overview
├── .gitignore # Exclude _build/, chunks/, *.wav, etc.
├── requirements.txt # jupyter-book>=1.0
│
├── journal/ # Meeting journals (one per date)
│ └── YYYY-MM-DD.md # Auto-created for today
│
├── modules/ # Thematic modules
│ ├── 01_<module>/
│ │ ├── theory.md # Theory/concepts
│ │ ├── tools.md # Tools & software
│ │ └── exercises.md # Exercises
│ └── ...
│
├── exercises/
│ └── index.md # Exercise index
│
└── live_audio/ # Audio pipeline
├── start.sh # Launcher (both recorder + transcriber)
├── record_chunks.sh # sox-based chunked recorder
├── transcribe_live.py # Whisper transcriber + keyword extractor
├── chunks/ # Raw .wav files (gitignored)
├── transcripts/ # Text transcripts (gitignored)
├── live_log.jsonl # Structured log (gitignored)
└── live_summary.md # Running summary (gitignored)1.2 _config.yml template
title: "<workshop-title>"
author: "<detect from git config or ask>"
copyright: "<current year>"
logo: ""
execute:
execute_notebooks: "off"
html:
use_issues_button: false
use_repository_button: false
use_edit_page_button: false
home_page_in_navbar: true
sphinx:
config:
language: <lang>
html_theme_options:
navigation_depth: 31.3 Initialize git repository
git initif not already a repo- Create
.gitignorewith:_build/,.DS_Store,__pycache__/,*.pyc,.ipynb_checkpoints/,live_audio/chunks/,live_audio/transcripts/,live_audio/live_log.jsonl,live_audio/live_summary.md,*.wav - Initial commit with full structure
1.4 If --url provided
Fetch the workshop page with WebFetch to auto-detect:
- Module titles and topics
- Agenda and schedule
- Tools and platforms mentioned
- Exercise descriptions
Use this to pre-populate module content.
Phase 2: Live Audio Pipeline
2.1 Dependencies
Check and install if missing:
sox(via brew/apt): audio recordingopenai-whisper(via pip): transcriptionffmpeg: audio processing (usually already present)
2.2 record_chunks.sh
Records audio from default microphone in fixed-duration chunks:
- Format: 16kHz, mono, 16-bit WAV
- Command:
rec -q -r 16000 -c 1 -b 16 "$FILENAME" trim 0 "$CHUNK_DURATION" - Naming:
chunk_YYYYMMDD_HHMMSS_NNNN.wav
2.3 transcribe_live.py
Core transcription engine with:
Keyword categories - adapt these to the workshop topic:
KEYWORDS = {
"category1": ["keyword1", "keyword2", ...],
"category2": ["keyword3", "keyword4", ...],
}Generate keyword categories based on the workshop modules and topic.
If the workshop is in a non-English language, include both the native
language terms and their English equivalents.
Output format:
- Terminal: colored prompter with category tags and highlighted keywords
live_log.jsonl: structured entries{timestamp, file, text, keywords}live_summary.md: running markdown summary with detected topicstranscripts/*.txt: raw transcription per chunk
Whisper configuration:
- Model: configurable (tiny/base/small/medium/turbo)
- Language: from --lang argument
- fp16: False (for CPU/MPS compatibility)
2.4 start.sh
Launcher that:
- Accepts model and chunk duration as arguments
- Starts recorder in background
- Starts transcriber in foreground
- Traps Ctrl+C to cleanly stop both processes
- Prints configuration summary on start
Phase 3: Ongoing Note Management
This is the most important phase - it runs throughout the workshop.
3.1 When user provides notes or keywords
When the user types observations, keywords, or raw notes:
- Classify the content into the appropriate module (theory/tools/exercises)
- Update the relevant module file with structured content
- Update today's journal entry (
journal/YYYY-MM-DD.md) - Update
_toc.ymlif new files are created - Build JupyterBook:
jupyter-book build . - Commit changes to git with descriptive message
3.2 When analyzing transcripts
When the user asks to process transcripts from live_audio/:
- Read
live_log.jsonlor individual transcript files - Decode transcription errors (Whisper artifacts, garbled words)
- Reconstruct the speaker's actual message with high confidence
- Extract pedagogical insights, practical tips, key concepts
- Create/update a dedicated notes file in the relevant module
- Flag low-confidence interpretations explicitly
- Build a table of transcription errors for educational purposes
3.3 Journal entry format
# Journal: YYYY-MM-DD
## Workshop: <title>
**Date:** <full date>
**Location:** <if known>
**URL:** <if known>
---
## Notes
### <Topic 1>
<structured notes>
### <Topic 2>
<structured notes>
---
## Day Summary
<end-of-day summary>3.4 Iterative transcript analysis
When user says "iterate", "update notes", or similar:
- Check for NEW transcripts since last analysis
- Process only the new ones
- Append new topics/insights to existing notes file
- Update journal with new findings
- Rebuild book and commit
Phase 4: Quality Guidelines
Transcript interpretation rules
- Never trust transcripts literally - they contain ~20-30% errors for non-English
- Cross-reference garbled words with workshop context and known terminology
- Mark uncertainty with explicit notes when reconstruction is ambiguous
- Build error tables mapping Whisper artifacts to probable meanings
- Prefer context over phonetic similarity when decoding
Content organization rules
- Theory goes in
theory.md- concepts, frameworks, principles - Tools go in
tools.md- software, commands, configurations - Exercises go in
exercises.md- hands-on tasks, step-by-step guides - Speaker insights go in dedicated
lecture_notes_*.mdfiles - Cross-reference between files using
{doc}directives
Git commit rules
- Commit after every meaningful update
- Message format: what was added/changed + brief context
- Always include
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Quickstart
After setup, tell the user:
Workshop ready! Commands:
Start recording: ./live_audio/start.sh [model] [chunk_sec]
Open book: open _build/html/index.html
During the workshop, type your notes and I will:
- Organize them into the right module
- Update the journal
- Commit to git
To analyze transcripts, say: "iterate notes" or "analyze transcripts"