Use when the user wants to analyze images using vision-capable AI models through OATDA's unified API. Supports OpenAI GPT-4o, Anthropic Claude, and Google Gemini vision models.
Install
npx skillscat add devcsde/oatda-skills/oatda-vision-analysis Install via the SkillsCat registry.
OATDA Vision Analysis
Analyze images using vision-capable AI models through OATDA's unified API endpoint.
When to Use
Use this skill when the user wants to:
- Analyze or describe an image via OATDA
- Extract text (OCR) from an image using a vision model
- Understand diagrams, charts, or screenshots
- Get AI-powered image descriptions from OpenAI, Anthropic, or Google
Prerequisites
The user needs an OATDA API key. Check in this order:
$OATDA_API_KEYenvironment variable~/.oatda/credentials.jsonconfig file
If neither exists, tell the user:
You need an OATDA API key. Get one at https://oatda.com, then set it:
export OATDA_API_KEY=your_key_here
Step-by-Step Instructions
1. Resolve the API key
# Check env var first; if empty, auto-load from credentials file
if [[ -z "$OATDA_API_KEY" ]]; then
export OATDA_API_KEY=$(cat ~/.oatda/credentials.json 2>/dev/null | jq -r '.profiles[.defaultProfile].apiKey' 2>/dev/null)
fi
# Verify key exists (show first 8 chars only)
echo "${OATDA_API_KEY:0:8}"If the output is empty or null, stop and ask the user to configure their API key.
IMPORTANT:
- Never print the full API key. Only show the first 8 characters for verification.
- The key resolution script and subsequent
curlcommands must run in the same shell session. Each separate bash/terminal invocation starts with an isolated environment where previously exported variables are lost. Either run all commands in one session, or chain them (e.g.,export OATDA_API_KEY=... && curl ...).
2. Determine the model
Map common aliases to provider/model format:
| User says | Provider | Model |
|---|---|---|
| gpt-4o (default) | openai | gpt-4o |
| gpt-4o-mini | openai | gpt-4o-mini |
| claude, sonnet | anthropic | claude-3-5-sonnet |
| gemini | gemini-2.0-flash | |
| gemini-1.5 | gemini-1.5-pro |
Default: openai / gpt-4o if no model specified.
3. Validate the image URL
- Only accept
https://URLs ordata:image/base64 data URIs - Reject
http://URLs, local file paths, and internal IPs (localhost, 127.0.0.1, 169.254.x.x) - If the user provides a local file, suggest they convert it to base64 first
4. Make the API call
The vision endpoint uses a contents array (NOT a simple prompt string):
curl -s -X POST "https://oatda.com/api/v1/llm/image" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-d '{
"provider": "<PROVIDER>",
"model": "<MODEL>",
"contents": [
{
"type": "text",
"text": "<ANALYSIS_PROMPT>"
},
{
"type": "image",
"image": {
"url": "<IMAGE_URL>",
"detail": "auto"
}
}
]
}'Replace <PROVIDER>, <MODEL>, <ANALYSIS_PROMPT>, and <IMAGE_URL> with actual values.
CRITICAL: This endpoint is /api/v1/llm/image (NOT /api/v1/llm/vision). The request body uses contents array, NOT a prompt + imageUrl structure.
Optional parameters (add to body):
temperature: 0-2, default 0.7maxTokens: Max response tokens
Image detail levels (in the image.detail field):
"auto"— Let the model decide (default)"low"— Faster, cheaper, less detail"high"— More detail, higher cost
5. Parse the response
{
"success": true,
"provider": "openai",
"model": "gpt-4o",
"response": "The image shows a sunset over...",
"usage": {
"promptTokens": 800,
"completionTokens": 200,
"totalTokens": 1000
},
"costs": {
"inputCost": 0.004,
"outputCost": 0.006,
"totalCost": 0.01,
"currency": "USD"
}
}Present the response field to the user. Optionally mention token usage and cost.
6. Handle errors
| HTTP Status | Meaning | Action |
|---|---|---|
| 401 | Invalid API key | Tell user to check their key |
| 400 | Bad request | Check image URL is valid HTTPS, model supports vision |
| 429 | Rate limited | Wait 5 seconds and retry once |
Full Example
User asks: "Describe this image: https://example.com/photo.jpg"
curl -s -X POST "https://oatda.com/api/v1/llm/image" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer $OATDA_API_KEY" \
-d '{
"provider": "openai",
"model": "gpt-4o",
"contents": [
{"type": "text", "text": "Describe this image in detail"},
{"type": "image", "image": {"url": "https://example.com/photo.jpg", "detail": "auto"}}
]
}'Tips
- The endpoint is
/api/v1/llm/image— do NOT confuse with/api/v1/llm/generate-image(that's for generation) - The body uses
contentsarray format, NOT a simple prompt string - Only HTTPS image URLs are accepted — no HTTP, no local file paths
- Image tokens are included in the prompt token count and affect cost
- For OCR tasks, use
"detail": "high"for better text recognition - NEVER expose the full API key in output
- Related skills:
/oatda:oatda-generate-imagefor creating images,/oatda:oatda-list-modelsfor available vision models