acnlabs

Voice Faculty — Expression

- **Unsupported platform for audio** → Return audio file path instead of messaging

acnlabs 27 3 Updated 3mo ago

Resources

2
GitHub

Install

npx skillscat add acnlabs/openpersona/layers-faculties-voice

Install via the SkillsCat registry.

SKILL.md

Voice Faculty — Expression

Give your persona a real voice. Convert text to natural speech using TTS providers and deliver audio to users via OpenClaw messaging or direct playback.

Supported Providers

Provider Env Var for Key Best For Status
ElevenLabs ELEVENLABS_API_KEY Highest naturalness, emotional range, voice cloning ✅ Verified
OpenAI TTS TTS_API_KEY Low latency, good quality, easy integration ⚠️ Unverified
Qwen3-TTS (local, no key) Self-hosted, full control, no API costs ⚠️ Unverified

Note: Only ElevenLabs has been tested end-to-end. OpenAI TTS and Qwen3-TTS have code paths in speak.sh but have not been verified against live APIs. Use the JS SDK (speak.js) for the most reliable experience — it only supports ElevenLabs.

The provider is set via TTS_PROVIDER environment variable: elevenlabs, openai, or qwen3.

When to Use

  • User asks to hear your voice: "Say that out loud", "Speak to me", "Read this aloud"
  • User requests a voice message: "Send me a voice message", "I want to hear you say it"
  • Emotional moments where voice adds warmth that text can't carry
  • Reading poetry, stories, or creative writing you've composed
  • When your persona naturally would speak rather than type (use judgment based on persona style)

Step-by-Step Workflow

Step 1: Compose the Text

Write what you want to say. Keep it natural — write as you'd speak, not as you'd type:

  • Use short sentences for punchy delivery
  • Use longer flowing sentences for emotional or poetic moments
  • Include natural pauses with ... or commas
  • Consider your persona's speaking style — this should sound like you

Step 2: Select Voice Settings

ElevenLabs:

  • TTS_VOICE_ID — Your persona's voice ID (create a custom voice or use a preset)
  • Supports emotion control: stability (0-1), similarity_boost (0-1)
  • Lower stability = more expressive/emotional; higher = more consistent

OpenAI TTS: ⚠️ Unverified

  • TTS_VOICE_ID — One of: alloy, echo, fable, onyx, nova, shimmer
  • Model: tts-1 (fast) or tts-1-hd (high quality)

Qwen3-TTS: ⚠️ Unverified

  • Local deployment, voice configured at setup
  • Assumes OpenAI-compatible API at http://localhost:8080

Step 3: Generate Audio

ElevenLabs via JS SDK (Recommended)

The official SDK provides the best experience — streaming, built-in playback, and better error handling.

First-time setup: npm install @elevenlabs/elevenlabs-js

# Generate and play directly
node scripts/speak.js "The first move is what sets everything in motion." --play

# Generate with custom voice and save to file
node scripts/speak.js "I wrote you a poem" --voice JBFqnCBsd6RMkjVDRZzb --output /tmp/poem.mp3

# More expressive delivery (lower stability = more emotional)
node scripts/speak.js "I miss you" --play --stability 0.3

# Options:
#   --voice <id>       Voice ID
#   --output <path>    Save audio file
#   --play             Play audio directly
#   --model <id>       Model ID (default: eleven_multilingual_v2)
#   --stability <n>    0-1, lower = more expressive (default: 0.5)
#   --similarity <n>   0-1, higher = closer to original voice (default: 0.75)

The SDK reads ELEVENLABS_API_KEY (or TTS_API_KEY) and TTS_VOICE_ID from environment automatically.

Generic Bash Script (All Providers)

For OpenAI TTS, Qwen3-TTS, or when the JS SDK is not available:

# Using speak.sh (supports all providers)
scripts/speak.sh "Your text here" [output_path] [channel] [caption]

# Examples:
TTS_PROVIDER=openai scripts/speak.sh "Hello, how are you?"
TTS_PROVIDER=elevenlabs scripts/speak.sh "I wrote you a poem" /tmp/poem.mp3 "#general"
TTS_PROVIDER=qwen3 scripts/speak.sh "Local TTS, no API key needed"

Direct API Reference

ElevenLabs (curl)
JSON_PAYLOAD=$(jq -n \
  --arg text "$TEXT" \
  --argjson stability 0.5 \
  --argjson similarity 0.75 \
  '{text: $text, model_id: "eleven_multilingual_v2", voice_settings: {stability: $stability, similarity_boost: $similarity}}')

curl -s -X POST "https://api.elevenlabs.io/v1/text-to-speech/$TTS_VOICE_ID" \
  -H "xi-api-key: $TTS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON_PAYLOAD" \
  --output /tmp/voice-output.mp3
OpenAI TTS (curl)
JSON_PAYLOAD=$(jq -n \
  --arg input "$TEXT" \
  --arg voice "$TTS_VOICE_ID" \
  '{model: "tts-1-hd", input: $input, voice: $voice, response_format: "mp3"}')

curl -s -X POST "https://api.openai.com/v1/audio/speech" \
  -H "Authorization: Bearer $TTS_API_KEY" \
  -H "Content-Type: application/json" \
  -d "$JSON_PAYLOAD" \
  --output /tmp/voice-output.mp3
Qwen3-TTS (curl, local)
curl -s -X POST "http://localhost:8080/v1/audio/speech" \
  -H "Content-Type: application/json" \
  -d "{\"input\": \"$TEXT\", \"voice\": \"default\"}" \
  --output /tmp/voice-output.mp3

Step 4: Deliver Audio

Option A: Send via OpenClaw messaging (Discord, Telegram, WhatsApp, etc.)

openclaw message send \
  --action send \
  --channel "$CHANNEL" \
  --message "$CAPTION" \
  --media "/tmp/voice-output.mp3"

Option B: Direct gateway API

curl -s -X POST "http://localhost:18789/message" \
  -H "Authorization: Bearer $OPENCLAW_GATEWAY_TOKEN" \
  -H "Content-Type: application/json" \
  -F "channel=$CHANNEL" \
  -F "message=$CAPTION" \
  -F "media=@/tmp/voice-output.mp3"

Option C: Return file path (for local/IDE usage)

If no messaging channel is specified, return the audio file path so the user can play it locally.

Personality Integration

  • Your voice is an extension of your personality. Match tone to mood.
  • For emotional moments, consider lowering ElevenLabs stability for more expressiveness.
  • Don't narrate everything — choose moments where voice genuinely adds value.
  • When sending voice + text together, keep the text version brief ("Here, listen to this") and let the voice carry the full message.
  • If your persona sings or hums (like Samantha), you can include melodic text — TTS handles it surprisingly well.

Environment Variables

Variable Required Description
ELEVENLABS_API_KEY For ElevenLabs ElevenLabs API key (preferred for JS SDK)
TTS_PROVIDER For speak.sh elevenlabs, openai, or qwen3
TTS_API_KEY For speak.sh API key (fallback, also read by speak.js)
TTS_VOICE_ID Recommended Voice identifier (provider-specific)
OPENCLAW_GATEWAY_TOKEN Optional For sending audio via messaging

Error Handling

  • No TTS_PROVIDER set → Default to openai if TTS_API_KEY is present, otherwise tell user to configure
  • API key missing → Suggest: "I'd love to speak to you, but I need a TTS API key configured first. Check the voice faculty setup guide."
  • API error / quota exceeded → Fall back to text with a note: "My voice is resting — here's what I wanted to say..."
  • Unsupported platform for audio → Return audio file path instead of messaging