lxgicstudios

Multimodal AI

MIT. Free forever.

lxgicstudios 0 1 Updated 4mo ago

Resources

7
GitHub

Install

npx skillscat add lxgicstudios/ai-multimodal

Install via the SkillsCat registry.

SKILL.md

Multimodal AI

Vision and audio AI integration. Image analysis, transcription, text-to-speech.

Quick Start

npx ai-multimodal vision ./image.png "Describe this"

What It Does

  • Analyze images with GPT-4 Vision
  • Extract text from images (OCR)
  • Transcribe audio with Whisper
  • Generate speech from text

Usage

# Vision
npx ai-multimodal vision ./photo.jpg "What's in this?"

# OCR
npx ai-multimodal ocr ./screenshot.png

# Transcribe
npx ai-multimodal transcribe ./audio.mp3

# Text to speech
npx ai-multimodal tts "Hello" ./output.mp3

Part of the LXGIC Dev Toolkit

One of 110+ free developer tools from LXGIC Studios.

License

MIT. Free forever.

Categories