Resources
7Install
npx skillscat add lxgicstudios/ai-multimodal Install via the SkillsCat registry.
SKILL.md
Multimodal AI
Vision and audio AI integration. Image analysis, transcription, text-to-speech.
Quick Start
npx ai-multimodal vision ./image.png "Describe this"What It Does
- Analyze images with GPT-4 Vision
- Extract text from images (OCR)
- Transcribe audio with Whisper
- Generate speech from text
Usage
# Vision
npx ai-multimodal vision ./photo.jpg "What's in this?"
# OCR
npx ai-multimodal ocr ./screenshot.png
# Transcribe
npx ai-multimodal transcribe ./audio.mp3
# Text to speech
npx ai-multimodal tts "Hello" ./output.mp3Part of the LXGIC Dev Toolkit
One of 110+ free developer tools from LXGIC Studios.
- GitHub: https://github.com/lxgicstudios
- Twitter: https://x.com/lxgicstudios
- Website: https://lxgicstudios.com
License
MIT. Free forever.