Video Generation Skill

Generate AI videos from text prompts or images. Supports Google Veo 3.1 and Pollinations.ai (free). Use when generating video, creating animations, text-to-video, AI video, video generation, make clip, animate.

anton-abyzov 149 18 Updated 5mo ago

GitHub

Install

npx skillscat add anton-abyzov/specweave/plugins-specweave-media-skills-video

Install via the SkillsCat registry.

SKILL.md

Video Generation Skill

Generate videos from text prompts (or images) using AI models. Video generation is asynchronous - Google Veo requires polling for completion.

Provider Fallback Chain (Follow This Order)

Tier 1: Google Veo 3.1 (PAID, billing required) ─── Best quality, audio ──┐
        ↓ on error                                                         │
Tier 2: Pollinations.ai (FREE, no key) ───────────────────────────────────┘

Note: Unlike image generation, there are no free Gemini native video models. Veo requires billing. Pollinations provides a free fallback.

Workflow

Step 1: Parse User Request

Extract from the user's prompt:

Description: What the video should show
Duration: Desired length (Veo: 5-8 seconds, Pollinations: 4-10 seconds)
Style: Cinematic, animation, documentary, etc.
Source image: Optional image to use as starting frame (image-to-video, Veo only)
Output path: Where to save (default: ./generated-media/)

Step 2: Prepare Output Directory

mkdir -p ./generated-media

Step 3: Load API Key from .env

# Source .env if it exists (for GEMINI_API_KEY)
if [ -f .env ]; then
  export $(grep -E '^GEMINI_API_KEY=' .env | xargs)
fi

# Also check parent dirs (monorepo support)
if [ -z "$GEMINI_API_KEY" ] && [ -f ../.env ]; then
  export $(grep -E '^GEMINI_API_KEY=' ../.env | xargs)
fi

Step 4: Generate Video (Fallback Chain)

Tier 1: Google Veo 3.1 (PAID, requires GEMINI_API_KEY + billing)

Available models:

veo-3.1-fast-generate-preview — Fast, ~$0.15/sec (720p/1080p)
veo-3.1-generate-preview — Standard with audio, ~$0.40/sec (default)

IMPORTANT: Veo is asynchronous. You must:

Submit the generation request
Poll the operation endpoint every 10 seconds
Download the video when done

Cost warning: Before generating, tell the user the estimated cost (~$0.75-3.20 per clip) and confirm they want to proceed.

TIMESTAMP=$(date +%s)
MODEL="veo-3.1-generate-preview"
PROMPT="YOUR_PROMPT_HERE"
OUTFILE="generated-media/video-${TIMESTAMP}.mp4"
TMPFILE="/tmp/gemini-vid-response-${TIMESTAMP}.json"
SUCCESS=false

if [ -n "$GEMINI_API_KEY" ]; then
  echo "Starting video generation with $MODEL..."
  echo "Estimated cost: ~$2.00-3.20 for a 5-8 second clip"

  # Step 1: Start generation (returns operation ID)
  curl -s -X POST \
    "https://generativelanguage.googleapis.com/v1beta/models/${MODEL}:predictLongRunning" \
    -H "x-goog-api-key: $GEMINI_API_KEY" \
    -H "Content-Type: application/json" \
    -o "$TMPFILE" \
    -d "{
      \"instances\": [{
        \"prompt\": \"${PROMPT}\"
      }]
    }"

  # Extract operation name
  OPERATION=$(python3 -c "
import json, sys
with open('$TMPFILE') as f:
    data = json.load(f)
if 'error' in data:
    print(f'Error: {data[\"error\"][\"message\"][:200]}', file=sys.stderr)
    sys.exit(1)
print(data.get('name', ''))
" 2>/dev/null)

  if [ -n "$OPERATION" ] && [ "$OPERATION" != "" ]; then
    echo "Video generation started: $OPERATION"
    echo "Polling for completion (this may take 1-3 minutes)..."

    # Step 2: Poll until done
    MAX_POLLS=30  # 5 minutes max
    POLL_COUNT=0
    while [ $POLL_COUNT -lt $MAX_POLLS ]; do
      sleep 10
      POLL_COUNT=$((POLL_COUNT + 1))

      curl -s \
        "https://generativelanguage.googleapis.com/v1beta/${OPERATION}" \
        -H "x-goog-api-key: $GEMINI_API_KEY" \
        -o "$TMPFILE"

      IS_DONE=$(python3 -c "
import json, sys
with open('$TMPFILE') as f:
    data = json.load(f)
print(data.get('done', False))
" 2>/dev/null)

      if [ "$IS_DONE" = "True" ]; then
        echo "Video generation complete!"

        # Step 3: Extract video URI and download
        VIDEO_URI=$(python3 -c "
import json, sys
with open('$TMPFILE') as f:
    data = json.load(f)
try:
    uri = data['response']['generateVideoResponse']['generatedSamples'][0]['video']['uri']
    print(uri)
except (KeyError, IndexError):
    print('', file=sys.stderr)
    sys.exit(1)
" 2>/dev/null)

        if [ -n "$VIDEO_URI" ]; then
          curl -s -L -o "$OUTFILE" \
            "$VIDEO_URI" \
            -H "x-goog-api-key: $GEMINI_API_KEY"
          SUCCESS=true
          echo "Generated with Veo 3.1 (paid)"
        else
          echo "ERROR: Could not extract video URI from response"
        fi
        break
      fi

      echo "  Still generating... (${POLL_COUNT}/${MAX_POLLS})"
    done

    if [ $POLL_COUNT -ge $MAX_POLLS ]; then
      echo "WARNING: Video generation timed out after 5 minutes"
      echo "Operation: $OPERATION"
      echo "You can check status later with:"
      echo "  curl -s 'https://generativelanguage.googleapis.com/v1beta/${OPERATION}' -H 'x-goog-api-key: \$GEMINI_API_KEY'"
    fi
  else
    echo "Veo failed to start (likely billing not enabled)"
  fi

  rm -f "$TMPFILE"
fi

If Tier 1 fails (no key, billing not enabled, or generation error), continue to Tier 2.

Tier 2: Pollinations.ai

Free video models: seedance (best quality), wan (image-to-video with audio), grok-video
Paid video models: veo (Google Veo 3.1 Fast)

Note: gen.pollinations.ai requires a free API key (register at https://pollinations.ai). Video uses the same /image/ endpoint but returns video/mp4 for video models.

if [ "$SUCCESS" != "true" ]; then
  echo "Trying Pollinations.ai video..."
  ENCODED_PROMPT=$(python3 -c "import urllib.parse; print(urllib.parse.quote('''${PROMPT}'''))")
  POLL_MODEL="seedance"  # Free, good quality

  # Try authenticated endpoint first
  if [ -n "${POLLINATIONS_API_KEY:-}" ]; then
    curl -s -L --max-time 180 \
      -H "Authorization: Bearer $POLLINATIONS_API_KEY" \
      -o "$OUTFILE" \
      "https://gen.pollinations.ai/image/${ENCODED_PROMPT}?model=${POLL_MODEL}"
  else
    # Anonymous endpoint (may be unreliable)
    curl -s -L --max-time 180 \
      -o "$OUTFILE" \
      "https://image.pollinations.ai/prompt/${ENCODED_PROMPT}?model=${POLL_MODEL}"
  fi

  # Verify it's actually a video file
  if [ -f "$OUTFILE" ] && [ -s "$OUTFILE" ]; then
    FILETYPE=$(file -b "$OUTFILE" | head -1)
    if echo "$FILETYPE" | grep -qiE "video|MP4|MPEG|ISO Media|QuickTime"; then
      SUCCESS=true
      echo "Generated with Pollinations.ai (free)"
    else
      echo "Pollinations returned non-video: $FILETYPE"
      rm -f "$OUTFILE"
    fi
  fi
fi

Step 5: Verify Output

if [ -f "$OUTFILE" ] && [ -s "$OUTFILE" ]; then
  file "$OUTFILE"
  SIZE=$(du -h "$OUTFILE" | cut -f1)
  echo "Video generated successfully: $OUTFILE ($SIZE)"
  echo "Play with: open '$OUTFILE'"
else
  echo "ERROR: All providers failed. Possible causes:"
  echo "  - Veo: Billing not enabled or quota exceeded"
  echo "  - Pollinations: Service temporarily down"
  echo ""
  echo "Solutions:"
  echo "  1. Enable billing at https://aistudio.google.com/ for Veo"
  echo "  2. Try again in a few minutes (Pollinations may recover)"
  echo "  3. Consider /sw-media:remotion for programmatic video (no AI, no API key)"
fi

Step 6: Report Result

Tell the user:

File path to the generated video
Which provider/model was used
Video duration (if known)
Cost: estimate (Veo: ~$0.75-3.20) or "free" (Pollinations)
Playback: open file.mp4 (macOS), xdg-open file.mp4 (Linux)

Image-to-Video (Google Veo Only)

If the user provides a source image, use image-to-video mode:

# Convert image to base64
IMAGE_B64=$(base64 -i source-image.png)

curl -s -X POST \
  "https://generativelanguage.googleapis.com/v1beta/models/veo-3.1-generate-preview:predictLongRunning" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -o "$TMPFILE" \
  -d "{
    \"instances\": [{
      \"prompt\": \"${PROMPT}\",
      \"image\": {
        \"bytesBase64Encoded\": \"${IMAGE_B64}\"
      }
    }]
  }"
# Then poll as above

Error Handling

Error	Action
`GEMINI_API_KEY` not set	Skip Veo, use Pollinations only
Veo billing not enabled	Auto-fallback to Pollinations
Generation timed out	Report operation ID so user can check later
Pollinations 502/timeout	Report all providers failed, suggest Remotion
Content policy block	Report prompt was blocked, suggest rewording

Cost Awareness

IMPORTANT: Video generation costs money with Google Veo. Always inform the user before generating:

Model	Cost	Duration
Veo 3.1 Fast (720p)	~$0.15/sec = ~$0.75-1.20 per video	5-8 sec
Veo 3.1 Standard	~$0.40/sec = ~$2.00-3.20 per video	5-8 sec
Pollinations	Free	4-10 sec

Before generating with Google Veo, confirm: "This will cost approximately $X. Proceed?"

Setup Instructions (Show When No API Key Found)

If no GEMINI_API_KEY is set, inform the user:

Using free Pollinations.ai provider (rate limited, shorter clips).

For higher quality video with audio, set up Google Veo 3.1:

Go to https://aistudio.google.com/

Create or select a project with billing enabled

Generate an API key

Add to your .env file: GEMINI_API_KEY=your-key-here

The same key works for both image AND video generation.
Video costs ~$0.75-3.20 per clip depending on model/resolution.

For programmatic video (no AI, no API key), try /sw-media:remotion.

Activation Keywords

generate video, create video, make video, AI video, text-to-video, video generation, create animation, make clip, generate clip, animate, create movie, video from text, video from image

Video Generation Skill

Install

Video Generation Skill

Provider Fallback Chain (Follow This Order)

Workflow

Step 1: Parse User Request

Step 2: Prepare Output Directory

Step 3: Load API Key from .env

Step 4: Generate Video (Fallback Chain)

Tier 1: Google Veo 3.1 (PAID, requires GEMINI_API_KEY + billing)

Tier 2: Pollinations.ai

Step 5: Verify Output

Step 6: Report Result

Image-to-Video (Google Veo Only)

Error Handling

Cost Awareness

Setup Instructions (Show When No API Key Found)

Activation Keywords

Categories

Install

Recommended Skills