Install
npx skillscat add trevors/dot-claude/skills-wav-analysis Install via the SkillsCat registry.
SKILL.md
WAV Audio Analysis Skill
Description
Analyze WAV audio files to debug audio generation pipelines. Provides statistical analysis, format validation, and quality metrics for diagnosing issues with generated speech.
Triggers: wav, audio, waveform, samples, amplitude, audio analysis, sound quality, audio debug
Analysis Capabilities
Basic Statistics
- Sample count and duration
- Min/max amplitude
- Standard deviation (expected ~3000-8000 for speech)
- Near-silent sample percentage
Quality Indicators
- Zero crossing rate (speech typically 50-200 per 1000 samples)
- Clipping detection (samples at ±32767)
- NaN/Inf detection (if processing raw floats)
- DC offset analysis
Format Validation
- Sample rate verification (24kHz for Qwen3-Omni TTS)
- Bit depth check
- Channel count
- RIFF header validation
Usage
To analyze a WAV file, provide the path and I'll run comprehensive diagnostics:
import numpy as np
with open("audio.wav", "rb") as f:
header = f.read(44)
data = f.read()
samples = np.frombuffer(data, dtype=np.int16)
print(f"Samples: {len(samples)}")
print(f"Duration: {len(samples)/24000:.2f} sec")
print(f"Min/Max: {samples.min()} / {samples.max()}")
print(f"Std dev: {np.std(samples):.1f}")
# Quality check
near_silent = np.sum(np.abs(samples) < 100)
print(f"Near-silent: {100*near_silent/len(samples):.1f}%")
# Zero crossings (voice activity indicator)
if len(samples) > 1000:
zc = np.sum(np.diff(np.sign(samples[:1000])) != 0)
print(f"Zero crossings (first 1000): {zc}")Typical Values for Good Speech Audio
| Metric | Expected Range | Meaning |
|---|---|---|
| Std dev | 3000-8000 | Audio energy level |
| Near-silent | <5% | Minimal silent padding |
| Zero crossings | 50-200/1000 | Voice frequency activity |
| Min/Max | ±20000-32000 | Healthy amplitude range |
Common Issues
99% Near-Silent
- Cause: NaN values converted to zeros
- Fix: Check for numerical overflow in pipeline
Low Std Dev (<1000)
- Cause: Values too quiet before output normalization
- Fix: Check gain stages, ensure proper scaling
Constant Value Runs
- Cause: Chunked processing with context overlap issues
- Fix: Verify chunk stitching logic
Clipping (values at ±32767)
- Cause: Overflow or missing tanh/clamp
- Fix: Add output clamping before int16 conversion