"Transcribes audio files (mp3, wav, ogg, m4a, flac, webm) using Gemini API via Portkey, saves transcripts as markdown, and supports follow-up analysis. Use when the user asks to transcribe audio, summarize a meeting recording, check a voice note, extract action items from a recording, asks what was discussed in an audio file, or mentions processing audio files in any way."
Resources
1Install
npx skillscat add markus1189/nixos-config/transcribe-audio Install via the SkillsCat registry.
Audio Transcription
Transcribe audio files to markdown and support post-processing (Q&A, action items, summaries).
Workflow
1. Identify Audio Files
Find audio files matching the user's request:
- Single file: user specifies path directly
- Batch:
find <dir> -maxdepth 1 -type f \( -name "*.mp3" -o -name "*.wav" -o -name "*.ogg" -o -name "*.m4a" -o -name "*.flac" -o -name "*.webm" \) | sort
2. Check for Existing Transcripts
For each audio file, check if a sibling .md file exists (e.g. meeting.mp3 → meeting.md):
- Exists + user wants transcription: Ask whether to re-transcribe or use existing
- Exists + user wants analysis: Read the existing
.mddirectly — no need to transcribe - Does not exist: Proceed with transcription
3. Transcribe
Run the script for each file:
./scripts/transcribe.sh <audio-file> [custom-prompt] > <output.md>- Output file: same name as audio, with
.mdextension, same directory - Default prompt handles speaker identification, timestamps, summary, action items
- Pass a custom prompt as second argument when the user requests different output or a focused transcription (see below)
The script outputs the transcript to stdout and progress to stderr. Capture stdout to the .md file.
Focused Transcription
When the user asks about a specific topic (e.g. "tell me about the Miro discussion", "what was said about budgets?"), pass a focused prompt as the second argument instead of doing a full transcription and then grep/reading:
./scripts/transcribe.sh <audio-file> "Focus on the parts of this audio that discuss <TOPIC>. Provide:
1. A detailed transcript of just those sections (with speaker labels and timestamps)
2. A summary of what was said about <TOPIC>
3. Any decisions, action items, or open questions related to <TOPIC>
Skip unrelated parts of the audio." > <output-focus.md>- Output file for focused transcripts: use a suffix to avoid overwriting the full transcript, e.g.
meeting.focus-miro.md - When to use: The user asks about a specific topic AND there is no existing full transcript to search, OR the user explicitly asks to re-transcribe with a focus
- When NOT to use: A full transcript already exists — just read it and answer the question directly
4. Post-Processing
After transcription (or when an existing transcript is available), support any follow-up:
- Read the
.mdfile and answer questions about the content - Extract action items or TODOs
- Provide additional summaries or analysis
- Compare across multiple transcripts
Key Details
- Supported formats:
.mp3,.wav,.ogg,.m4a,.flac,.webm - API: Gemini via Portkey (key from
pass api/portkey-claude) - Timeout: 600s per file — long recordings take time
- Max file size: 200MB per file
Script Execution: Scripts should be executed from the skill directory.
All scripts use Nix shebangs so no manual dependency installation is required.