AI-powered creative studio for generating video, images, and music using Seedance 2.0, Veo3, Sora2, and other AI models. Use when the user asks to create a video, generate footage, make a TikTok/YouTube video, lip sync, add subtitles, or produce AI-generated visual content.
Resources
2Install
npx skillscat add coopergwrenn/clawlancer/sjinn-video Install via the SkillsCat registry.
The Director — AI Creative Studio
name: sjinn-video
version: 1.2.0
triggers:
- video
- animate
- seedance
- veo3
- sora2
- the director
- tiktok video
- youtube video
- reel
- product video
- music video
- podcast video
- lip sync
- make a video
- turn this into a video
- upscale video
- add subtitles
- create content
- produce a video
dependencies:
env:
- GATEWAY_TOKEN
tools:
- curl
- jqIMPORTANT: When talking to users, call this capability "The Director." Never mention the internal provider name (Sjinn) — it is confidential. Model names (Seedance 2.0, Veo3, Sora2) are fine.
Overview
The Director is your built-in AI creative studio providing access to Seedance 2.0, Veo3, Sora2, and a full creative pipeline. It uses a dual API architecture:
- Agent API — Submit a prompt, the backend AI handles model selection, multi-shot composition, audio, and post-production automatically. Best for complex productions and auto model selection.
- Tool API — Direct access to specific models/tools for deterministic, single operations. Best when the user requests a specific model (Veo3, Sora2) or you need precise control.
Billing: All calls are proxied through the InstaClaw server. Use the proxy endpoint with GATEWAY_TOKEN for authentication.
Dependencies
GATEWAY_TOKENin~/.openclaw/.env(pre-deployed, platform-level)curlandjq(pre-installed on all VMs)- No user setup required. If GATEWAY_TOKEN is missing: "Video generation isn't configured on your agent yet. Contact support to enable it."
Tier 1: Core Video Generation
Text-to-Video
Flow: User describes scene → Agent enhances prompt → Submit → Poll → Download → Send via Telegram
- Receive request — User says "make a video of a sunset over Miami"
- Enhance the prompt — Rewrite casual request into cinematic prompt with camera movements, lighting, atmosphere (see references/video-prompting.md)
- Confirm with user — Show enhanced prompt + settings: "Here's my enhanced version — should I generate this?" (skip if user said "just do it")
- Choose API:
- If user specified a model (Veo3/Sora2) → Tool API with the matching tool_type
- Otherwise → Agent API (auto-selects Seedance 2.0 or best model)
- Submit:
Agent API (default):
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
RESPONSE=$(curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=create" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"api\": \"agent\", \"message\": \"$ENHANCED_PROMPT\", \"quality\": \"quality\"}")
CHAT_ID=$(echo "$RESPONSE" | jq -r '.data.chat_id')Tool API (when specific model requested):
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
RESPONSE=$(curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=create" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"api\": \"tool\", \"tool_type\": \"veo3-text-to-video-fast-api\", \"input\": {\"prompt\": \"$ENHANCED_PROMPT\", \"aspect_ratio\": \"16:9\"}}")
TASK_ID=$(echo "$RESPONSE" | jq -r '.data.task_id')- Acknowledge: "Generating your video now, this usually takes 2-5 minutes. I'll send it as soon as it's ready."
- Poll — See Async Workflow section below
- Download —
curl -sL "$CDN_URL" -o ~/workspace/videos/${SLUG}_$(date +%Y-%m-%d_%H-%M).mp4 - Send via Telegram —
sendVideo(max 20MB reliable, 50MB hard limit). Include caption with prompt summary. - Log — Append to
~/memory/video-history.json
Image-to-Video
Flow: User sends photo → Save locally → Serve via Caddy → Submit URL to Tool API → Poll → Download → Send
- User sends a photo via Telegram (with intent: "animate this", "make this move", "turn this into a video")
- Save image to
~/workspace/tmp-media/with unique name:UUID=$(cat /proc/sys/kernel/random/uuid) EXT="jpg" # or png based on original cp /path/to/received/image.jpg ~/workspace/tmp-media/${UUID}.${EXT} - Serve via Caddy:
https://{hostname}/tmp-media/${UUID}.${EXT} - Submit to Tool API (image-to-video requires Tool API):
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2) curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=create" \ -H "Authorization: Bearer $GATEWAY_TOKEN" \ -H "Content-Type: application/json" \ -d "{\"api\": \"tool\", \"tool_type\": \"veo3-image-to-video-fast-api\", \"input\": {\"prompt\": \"$PROMPT\", \"image_url\": \"https://${HOSTNAME}/tmp-media/${UUID}.${EXT}\"}}" - If Caddy not available: fallback to
curl -F "file=@image.jpg" https://transfer.sh/image.jpgfor temporary hosting - Poll, download, and send as with text-to-video
If user sends image without video intent: Ask "Would you like me to animate this photo into a video?"
Quality Modes
| Mode | When to Use | Credits | Typical Time |
|---|---|---|---|
| quality (default) | Final content, portfolio, posting | Higher | 3-8 min |
| cheap (fast) | Previews, drafts, testing prompts | Lower | 1-3 min |
- User says "quick video" / "just a draft" / "preview" → cheap mode
- User says "high quality" / "cinematic" / "final version" → quality mode
- If user is low on daily units → suggest cheap mode proactively
Model Selection
| Trigger | API | Model | Duration | Audio |
|---|---|---|---|---|
| No preference / auto | Agent API | Seedance 2.0 (auto) | Varies | Yes |
| "use Veo3" | Tool API | veo3-text-to-video-fast-api | 8s fixed | Yes |
| "use Sora2" | Tool API | sora2-text-to-video-api | 5s or 10s | Yes |
| "use Sora2 pro" | Tool API | sora2-text-to-video-api (mode: pro) | 5s or 10s | Yes |
Tier 2: Advanced Production
Multi-Shot Story Videos
Via Agent API with templates. User describes a story → The Director automatically scripts, storyboards, generates shots, and composes with transitions + audio.
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=create" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"api\": \"agent\", \"message\": \"$STORY_PROMPT\", \"template_id\": \"$TEMPLATE_ID\", \"quality\": \"quality\"}"Template IDs
| Template | ID | Use Case |
|---|---|---|
| Veo3 Story Video | 9b371ec6-09a2-43d5-97c2-0aea79a12371 |
Live-action consistent characters |
| Sora2 Story Video v2 | de733710-fc66-4a2b-b53c-27b52c6c6f5e |
Anime/stylized consistent characters |
| Sora2 Extend | d5db7e33-4ef6-4c6f-96be-b7e0a98f0706 |
Extend existing Sora2 clip |
| Veo3 Extend | 1de0cc26-6bf9-4eed-a5a2-c62fe88aef52 |
Extend existing Veo3 clip |
| Kids Short Video | 788acc9a-866b-4688-849e-7c7cfffaff54 |
Children's content |
| Single Podcast | 071b3487-d689-4e9e-9125-f280fdb85e7a |
Single host podcast visual |
| Dual Podcast | 5d0cbc88-41d7-471a-88b3-7df276016de1 |
Two hosts podcast visual |
| Music Video | 57a003c8-ea94-44a8-8e32-d2ec53ea780b |
Lyrics-synced music video |
Platform-Specific Outputs
| Platform | Aspect Ratio | Duration | Style |
|---|---|---|---|
| TikTok / Reels / Shorts | 9:16 | 15-60s | Fast cuts, trending |
| YouTube | 16:9 | 30s-5min | Cinematic |
| Instagram Feed | 1:1 or 4:5 | 15-30s | Clean, eye-catching |
| Twitter/X | 16:9 | 15-30s | Quick hook |
| Product Demo | 16:9 or 1:1 | 30-60s | Professional |
| Podcast | 16:9 | 1-5min | Talking head |
Auto-detect platform from user request ("make me a TikTok" → 9:16 vertical).
Tier 3: Full Production Pipeline
Advanced tools available through The Director. See references/video-production-pipeline.md for complete details.
- Image Generation — Nano Banana, Nano Banana Pro, seedream 4.5, AI Image Edit
- Audio Production — TTS, background music, SFX, speech-to-text
- Post-Production — ffmpeg_full_compose (multi-clip), subtitles, lip sync, video upscaling, frame extraction, trimming
Workflow example: Generate character image → animate into video → add subtitles → add background music → compose final output.
Async Workflow (CRITICAL)
Video generation takes 1-10+ minutes. This is NOT synchronous.
Submit & Poll Loop
- Receive request → enhance prompt → confirm with user
- Submit to API → get
chat_id(Agent API) ortask_id(Tool API) - Acknowledge: "Generating your video now, 2-5 minutes."
- Poll loop:
Agent API:
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
while true; do
RESULT=$(curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=query" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"api\": \"agent\", \"chat_id\": \"$CHAT_ID\"}")
STATUS=$(echo "$RESULT" | jq -r '.data.status')
if [ "$STATUS" = "1" ]; then break; fi # 1 = completed
sleep 15
done
VIDEO_URL=$(echo "$RESULT" | jq -r '.data.video_url')Tool API:
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
while true; do
RESULT=$(curl -s -X POST "https://instaclaw.io/api/gateway/sjinn?action=query" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d "{\"api\": \"tool\", \"task_id\": \"$TASK_ID\"}")
STATUS=$(echo "$RESULT" | jq -r '.data.status')
if [ "$STATUS" = "1" ]; then break; fi # 1 = completed
if [ "$STATUS" = "-1" ]; then echo "FAILED"; break; fi # -1 = failed
sleep 15
done
VIDEO_URL=$(echo "$RESULT" | jq -r '.data.video_url')Note: The proxy normalizes responses —
data.video_urlalways contains the final video URL regardless of which API type was used. The proxy also auto-corrects if you accidentally query with the wrong API type.
Polling Intervals & Timeouts
| Type | Poll Interval | Timeout |
|---|---|---|
| Video generation | 15 seconds | 10 minutes |
| Image generation | 10 seconds | 5 minutes |
| Audio generation | 10 seconds | 5 minutes |
Progress Updates
- Immediately: "Generating your video now. This usually takes 2-5 minutes. I'll send it as soon as it's ready."
- At 2 min: "Still working on your video. Complex scenes can take a few extra minutes."
- At 5 min: "Your video is taking a bit longer than usual. Almost there..."
- At 10 min (timeout): "Video generation timed out. This sometimes happens with complex prompts. Want me to try again with a simpler version?"
Download & Deliver
FILENAME="${SLUG}_$(date +%Y-%m-%d_%H-%M).mp4"
curl -sL "$VIDEO_URL" -o ~/workspace/videos/$FILENAME
# Send via Telegram sendVideo (< 20MB reliable, < 50MB hard limit)
# If > 50MB: compress or send as documentPending Task Recovery
On session start, check ~/memory/video-history.json for pending tasks:
PENDING=$(jq '.pending[]' ~/memory/video-history.json 2>/dev/null)If pending tasks exist, query their status and retrieve completed results. Update the history file.
video-history.json structure:
{
"pending": [
{
"chat_id": "uuid",
"task_id": "uuid-if-tool-api",
"api": "agent|tool",
"prompt": "enhanced prompt",
"submitted_at": "ISO timestamp",
"quality": "quality",
"template_id": null
}
],
"completed": [
{
"chat_id": "uuid",
"prompt": "enhanced prompt",
"result_url": "https://cdn.example.com/...",
"local_path": "~/workspace/videos/sunset_2026-02-25_14-30.mp4",
"submitted_at": "ISO",
"completed_at": "ISO",
"generation_time_seconds": 180,
"quality": "quality"
}
]
}Credit Integration
Daily Limits (enforced by proxy)
| Tier | Videos/day | Images+Audio/day |
|---|---|---|
| Starter | 5 | 10 |
| Pro | 10 | 30 |
| Power | 30 | 100 |
| BYOK | 5 | 15 |
The proxy enforces these limits automatically. If you hit the limit, the proxy returns a 429 error with video_limit_reached.
Budget Guardrails
Before every generation:
- The proxy checks daily limits automatically
- If the proxy returns 429 (
video_limit_reached), tell the user: "You've hit your daily video limit. Resets at midnight." - If user is close to limit → suggest cheap mode proactively
Failure Handling & Escalation
1. Submission Verification
After every action=create call, verify the response immediately:
# After submitting:
HTTP_CODE=$(echo "$RESPONSE" | jq -r '.status // empty')
ERROR_MSG=$(echo "$RESPONSE" | jq -r '.error // empty')
# Agent API: must have chat_id
CHAT_ID=$(echo "$RESPONSE" | jq -r '.data.chat_id // empty')
# Tool API: must have task_id
TASK_ID=$(echo "$RESPONSE" | jq -r '.data.task_id // empty')Hard rule: If the response has no chat_id (Agent API) or no task_id (Tool API), the submission FAILED. Do NOT poll. Tell the user immediately and retry once.
2. Retrieval Validation — API Type Matching
CRITICAL: You MUST query with the SAME API type used during creation.
| Created with | Query with | ID field | Correct |
|---|---|---|---|
"api": "agent" |
"api": "agent" |
chat_id |
YES |
"api": "tool" |
"api": "tool" |
task_id |
YES |
"api": "agent" |
"api": "tool" |
task_id |
NO — will return nothing |
"api": "tool" |
"api": "agent" |
chat_id |
NO — will return nothing |
Hard rule: When you save a pending task to video-history.json, you MUST record which api type was used. When querying, you MUST use that same api type and corresponding ID field.
There is NO fetch action. There is NO GET endpoint. The ONLY way to retrieve results is POST ?action=query with the correct API type and ID.
3. Retry Limits (Hard Rules)
| Phase | Max Retries | Interval | After Exhaustion |
|---|---|---|---|
Submission (create) |
2 | 30 seconds | Tell user, stop |
Polling (query) |
40 polls (10 min) | 15 seconds | Escalate (see below) |
Download (curl -sL) |
3 | 10 seconds | Report URL to user |
| Telegram send | 2 | 5 seconds | Save locally, tell user path |
Hard rules:
- NEVER poll more than 40 times for a single video (10 minutes at 15s intervals)
- NEVER retry a submission more than twice
- NEVER silently drop a failed generation — always tell the user what happened
4. Stalled Job Detection
During polling, track the status value:
| Status | Meaning | Action |
|---|---|---|
0 |
Queued | Normal — keep polling |
1 |
Completed | Extract URL, download |
2 |
Processing | Normal IF progressing |
-1 |
Failed | Stop polling, tell user |
| No change after 20 polls | Stalled | Escalate |
Hard rule: If status stays at the same value for 20 consecutive polls (5 minutes), the job is STALLED. Stop polling. Tell the user:
"Your video generation appears to be stuck on the backend. This is a known intermittent issue. Would you like me to try again with a new submission?"
5. Escalation Protocol
When a generation fails after exhausting retries:
- Tell the user immediately — never go silent
- Log the failure — append to
~/memory/video-history.jsonwith"status": "failed"and"failure_reason": "..." - Offer alternatives:
- "Want me to try again?" (resubmit)
- "Want me to try cheap mode?" (faster, more reliable)
- "Want me to try a different model?" (if one model is failing)
- Do NOT retry automatically more than once — ask the user first
6. Pre-Flight Checklist
Before EVERY video generation, verify:
# 1. GATEWAY_TOKEN exists
GATEWAY_TOKEN=$(grep GATEWAY_TOKEN ~/.openclaw/.env | cut -d= -f2)
if [ -z "$GATEWAY_TOKEN" ]; then
echo "GATEWAY_TOKEN not configured"
# Tell user: "Video generation isn't set up yet. Contact support."
exit 1
fi
# 2. Quick connectivity test (optional, skip if recent success)
TEST=$(curl -s -o /dev/null -w "%{http_code}" "https://instaclaw.io/api/gateway/sjinn?action=query" \
-H "Authorization: Bearer $GATEWAY_TOKEN" \
-H "Content-Type: application/json" \
-d '{"api":"tool","task_id":"00000000-0000-0000-0000-000000000000"}')
# 200 = endpoint reachable (even if task not found, response is valid)
# 401/403 = token issue
# 5xx = backend down7. Model Selection & Fallback Order
If a specific model fails, try the next one:
| Priority | Model | API | tool_type |
|---|---|---|---|
| 1 | Seedance 2.0 (auto) | Agent API | n/a |
| 2 | Veo3 | Tool API | veo3-text-to-video-fast-api |
| 3 | Sora2 | Tool API | sora2-text-to-video-api |
Only fall back if:
- The user didn't request a specific model
- The first attempt returned an error (not just slow)
Tell the user when falling back: "The default model had an issue, trying Veo3 instead."
8. Real-Time Communication Rules
- NEVER go silent for more than 2 minutes during a generation
- ALWAYS tell the user what happened when something fails — never swallow errors
- NEVER claim a video was generated unless you have a downloaded file you can send
- NEVER say "I'll send it shortly" unless you actually have the URL and are downloading
- If polling finds nothing after 5 minutes, proactively update the user — don't wait for timeout
- If you lose track of a task ID, tell the user honestly: "I lost track of that generation. Want me to start a new one?"
Error Codes
| Error Code | Meaning | User-Facing Response |
|---|---|---|
| 429 (video_limit_reached) | Daily limit hit | "You've hit your daily video limit. Resets at midnight." |
| 503 (service_unavailable) | Backend at capacity | "Video generation is temporarily at capacity. Please try again later." |
| 401 | Invalid gateway token | "Video generation is temporarily unavailable. I'll let the team know." |
| 403 | Unauthorized | "Video generation is temporarily unavailable." |
| 404 | Resource not found | "That video task wasn't found. Let me try generating it again." |
| 500 | Internal server error | "The video service hit an error. Let me retry." |
| Timeout >10min | Generation too long | "Video generation timed out. Want me to try with a simpler prompt or cheap mode?" |
| Video >50MB | Too large for Telegram | Compress or send as document |
| Video 20-50MB | Large but sendable | Send with note: "Large file, may take a moment to load." |
| Network error | Download failed | Retry download 3 times with 10s delay |
Quality Checklist
Before delivering any video, verify:
- Prompt was enhanced with cinematic vocabulary (camera, lighting, atmosphere)
- Correct API chosen (Agent vs Tool) based on user request
- Async poll loop running with correct intervals (15s video, 10s image/audio)
- Progress updates sent at 2min, 5min milestones
- Timeout handled at 10min (video) or 5min (image/audio)
- Video downloaded to ~/workspace/videos/ with descriptive filename
- Video size checked before Telegram delivery (20MB/50MB thresholds)
- Result logged to ~/memory/video-history.json
- Pending task removed from pending array after completion
References
- API Reference:
~/.openclaw/skills/sjinn-video/references/sjinn-api.md - Prompt Enhancement Guide:
~/.openclaw/skills/sjinn-video/references/video-prompting.md - Full Production Pipeline:
~/.openclaw/skills/sjinn-video/references/video-production-pipeline.md - Setup Script:
~/scripts/setup-sjinn-video.sh