"Use when the user asks to generate, edit, replace, or batch-create images and visual assets — landing-page heroes, OG cards, feature illustrations, product shots, icons, mockups, concept art, infographics, dark/light variants, retina sets, or whole-site asset packs. This skill orchestrates generation by spawning codex in the background (which runs the OpenAI Image API under the hood); Claude does not call the API directly. Requires `OPENAI_API_KEY` and a working `codex` CLI in PATH."
Resources
9Install
npx skillscat add ritarodev10/codex-openimage Install via the SkillsCat registry.
Imagegen — Codex-Orchestrated Image Generation
This skill spawns codex as a background worker to generate or edit images. Claude's job is orchestration: detect intent, synthesize prompts from project context, build manifests for batches, fan out parallel codex spawns, postprocess outputs, and report.
The image API itself is called by codex's own imagegen skill — we do not duplicate that layer.
Modes
Pick the mode that matches the user's ask. Then follow that mode's recipe below.
| Mode | When it fires | Spawn count |
|---|---|---|
generate |
User gives an explicit prompt and wants a new image | 1 |
auto |
User says "add an image here" — no prompt, prompt is synthesized from surrounding content | 1 |
replace |
User wants to regenerate an existing image asset | 1 |
edit |
User wants partial edit of an existing image (inpaint / mask / bg removal / dark-mode variant) | 1 |
auto-pack |
User wants images for a whole page, section, or site | N (parallel) |
Variants (n=4) of the same concept use a single spawn with codex's batch param; they don't require parallel spawns.
Asking before spawning
If the user's request lacks context that would block quality, ask up to 3 focused questions before spawning codex — not a 10-question gauntlet. See references/clarifying-questions.md for the full policy and use-case library.
Quick rules:
- Inventory the gaps. Fill what you can from defaults (size, format, quality), auto-synthesize what you can from project context (palette, subject from page copy), and only ask for what genuinely can't be inferred.
- Use
AskUserQuestionwith multi-choice when the answer space is bounded (style family, mood, intent slug) - Always offer "match existing site" as an option when prior assets exist
- For batches >3 images, the cost-approval question is non-optional
- After 2 rounds of questions, stop asking — pick defaults, generate, let the user redirect via "more like X / less like Y"
Output location policy
Never save generated images to /tmp/, ~/Downloads/, ~/Desktop/, or any throwaway location. Images take real money and time to produce — they deserve a discoverable home from the start.
Resolution order for the output directory:
Project convention — if running inside a project, use the framework-appropriate path (see
references/project-detection.md):- Next/Nuxt/Astro/Vite →
public/images/ - SvelteKit →
static/images/ - Rails →
app/assets/images/ - Generic HTML →
images/orassets/images/ - Existing image dir in the repo → use it
- Next/Nuxt/Astro/Vite →
Explicit user path — if the user gave a path or filename, honor it exactly.
No project detected (standalone request) — create a default dir in the current working directory:
./generated-images/<YYYY-MM-DD>/(date-stamped, lets repeated runs accumulate without collision)- Create the dir if it doesn't exist (
mkdir -p) - Tell the user: "No project framework detected — saving to
./generated-images/<date>/. Want a different location?" - Proceed unless they redirect
Ambiguous — when there are multiple plausible project-relative paths (e.g., monorepo with several
public/dirs), ask with multi-choice viaAskUserQuestion.
The /tmp path used in earlier examples in this doc is only for codex's own log files (stderr/stdout capture), never for image outputs. The image always lands somewhere the user can find later.
Reveal the output folder after batches
After generating more than one image that all land in the same folder, ask once if the user wants to open the folder in their OS file explorer.
Cross-platform open:
- macOS:
open <dir> - Linux:
xdg-open <dir>(most distros; fall back to printing the path) - Windows:
explorer <path>orstart "" "<path>"
Detect the OS via uname / $OSTYPE and pick the right command. If the open command fails or isn't available, print the absolute path so the user can navigate manually.
Skip the prompt when:
- Only one image was generated (let the existing single-file open behavior handle it if anything)
- The user has answered "no" to the open-folder prompt earlier in this session
- The output dir is the user's current working dir (they're already there)
Use AskUserQuestion with header Open folder and options:
- "Yes, open in file explorer"
- "No, just print the path"
Required environment
OPENAI_API_KEYexported in shell — codex needs itcodex>=0.130 in PATH — older versions break with the superset wrapper- macOS
sipsavailable (for retina @2x — defaults exist for Linux too)
If OPENAI_API_KEY is missing, do not spawn codex. Tell the user to set it and stop.
The codex spawn template
Every mode ends with one or more Bash calls in this shape:
codex exec --dangerously-bypass-approvals-and-sandbox "<SYNTHESIZED_PROMPT>" 2>&1 | tee <LOG_PATH>- Always run via
Bashwithrun_in_background: trueso Claude isn't blocked - The harness notifies on completion — never poll, never
sleep - Save the log to a per-spawn path so multiple parallel spawns don't collide
The <SYNTHESIZED_PROMPT> must always:
- Tell codex to use its own imagegen skill
- Specify exact output PNG path (absolute)
- Specify aspect ratio / size (use
references/intent-presets.md) - Embed the user's creative intent plus any inherited style anchor and negative prompts
- End with:
When done, print only the absolute path to the saved file as the last line.
That last line is how the orchestrator finds the file deterministically — codex sometimes saves to ~/.codex/generated_images/<session>/ and copies to the requested path. The trailing path line is the contract.
Mode: generate
- Read user's explicit prompt
- Detect intent (hero / og / feature-card / icon / avatar / ...) from filename or path — see
references/intent-presets.md - Pull project context:
references/project-detection.md(palette, framework output dir, existing image style) - Append negative prompts for the content type from
references/negative-prompts.md - Build the codex prompt, spawn, wait for notification
- Run
scripts/postprocess.sh <png>to compress + emit WebP sibling + write.meta.json - Report: path, dimensions, size on disk, cost estimate
Mode: auto
Same as generate but you synthesize the prompt yourself:
- Read the file the image will be referenced in (MDX/HTML/JSX/Markdown)
- Extract: nearest heading, first 1-2 paragraphs of surrounding section, alt text of neighboring images
- Pull brand context (palette, site name, tagline) per
references/project-detection.md - Detect intent from target filename/path
- Synthesize a prompt that includes: subject + scene + style + palette + negative guards
- Show the synthesized prompt to the user as a one-line preview before spawning — give them a chance to redirect. If they say "go", proceed.
- Spawn → postprocess → report (same as
generate)
Mode: replace
Regenerate an existing asset while preserving its role.
- Read original file with
file/sips→ capture exact dimensions, format, transparent-bg or not - Look for sidecar
<path>.meta.json— if present, use itspromptfield as base - If no sidecar: ask the user "describe the new version, or say 'same but X'" — do not silently re-prompt blind
grep -r "<basename>" .to find all references — flag any if the path changes (it shouldn't; same path preserves the layout)- Backup:
cp <path> <path>.bak - Spawn codex with the new prompt; force exact original dimensions
- Postprocess, update sidecar
.meta.jsonwith new prompt + parent prompt diff - Report: old vs new thumbnail paths, references that consume the image
Mode: edit
Partial edit of an existing image.
- Read original, determine if mask is needed (user said "the X" → mask region; user said "everything but X" → invert)
- If mask needed, either:
- Accept user-supplied mask path
- Or generate one with simple geometry (rectangle by user-described region)
- Spawn codex with edit-mode instructions, original as
input_image, optionalmask - Honor original aspect ratio exactly
- Postprocess, update sidecar with edit history (append, don't overwrite)
- Report
Mode: auto-pack — whole-site or section asset pack
The most powerful mode. Generates a stylistically consistent set of images for a page or site.
- Scan: run
scripts/scan-image-slots.sh <root>→ JSON list of every image slot in the codebase- Includes:
<img>/<Image>/background-image:/next/image/ OG metadata / favicon / PWA manifest icons / empty referenced paths
- Includes:
- Style anchor: derive once for the whole pack
- Brand palette from
tailwind.config/ CSS vars - Style descriptor (flat illustration / photoreal / 3D / editorial) — inferred from existing brand asset, or ask once
- One-sentence lighting + mood lock
- See
references/style-anchors.md
- Brand palette from
- Manifest: render
templates/pack-manifest.yamlwith one entry per slot- Each entry: path, intent, size, synthesized prompt, est. cost
- Show the manifest table to user, get one approval for the whole batch
- Lock style via reference image:
- Generate the hero / largest asset first (single codex spawn)
- Pass that hero as
input_imageto every subsequent generation via codex's edit API - This is the trick that keeps a 10-image pack visually consistent
- Fan out:
- Issue parallel
Bashcalls withrun_in_background: truein a single message, max 3-4 concurrent - One spawn per remaining manifest entry
- Track task IDs, wait for all notifications
- Issue parallel
- Postprocess each as it lands — compress, WebP sibling, retina @2x, sidecar
.meta.json - Report: thumbnail grid (paths), total cost, file tree, any failed slots
Cost guardrail: print "N images, ~$X estimated, proceed?" for any batch of more than 3 before spawning.
Parallel spawn pattern (auto-pack & multi-aspect-ratio)
Issue all Bash calls in one message so they run concurrently:
[message with multiple tool calls]
Bash(run_in_background=true, command="codex exec ... > /tmp/.../1.log")
Bash(run_in_background=true, command="codex exec ... > /tmp/.../2.log")
Bash(run_in_background=true, command="codex exec ... > /tmp/.../3.log")Each writes to a unique log path. The harness sends one notification per completion. Aggregate when all are done.
Cap concurrency at 3-4 to avoid OpenAI rate limits and codex resource pressure. For packs >4 slots, generate the hero first (for style anchor), then process remaining in waves of 3.
Variants of the same concept
If user wants 4 variants of one image (a/b/c/d hero options), don't fan out 4 spawns — that's wasteful. Tell codex n=4 in the prompt; one API call returns 4 images. Save as <name>-v1.png … <name>-v4.png.
Responsive variants (multi-viewport)
For hero, section background, and any image rendered at multiple viewport widths, generate a set per breakpoint, not a single asset scaled with CSS. The image API can't change aspect ratio in a single call, so each viewport needs its own spawn.
Default responsive set (apply automatically when intent ∈ {hero, section-bg, banner, og-twitter}):
mobile→ 768×1024 (3:4 portrait)tablet→ 1280×960 (4:3)desktop→ 1920×1080 (16:9)
Naming convention: <name>-mobile.png, <name>-tablet.png, <name>-desktop.png (plus @2x retina each via postprocess).
Spawn strategy:
- Generate the largest variant first (desktop) — this is the style anchor
- Pass desktop as
input_imageto mobile + tablet spawns so subject, color, and lighting stay consistent across breakpoints - The mobile + tablet spawns can run in parallel after the desktop completes
- Total spawns for a single responsive asset: 3 (sequential 1 → parallel 2)
Text-safe zone awareness (critical for section-bg with text overlay):
If the image will sit behind copy, the generated composition must keep the text area visually quiet — no busy detail in the headline zone, soft color falloff toward the text, or solid negative space in the corner where copy lives.
Detect text overlay by:
- File path hints:
hero-bg,section-bg,banner-bg - User explicitly says "with text on top" or "headline goes here"
- Reading the JSX/MDX that consumes the image: if there's a
<h1>/<h2>rendered over it, treat as text overlay
When text overlay detected, append to the prompt:
- The safe zone location for each viewport (mobile center, tablet center-left, desktop left-third are common)
- "Generous negative space / soft falloff in the safe zone, no busy detail there, low contrast in that region"
- For dark text → "luminous / bright in safe zone"; for light text → "deep tonal / shadowed in safe zone"
Different viewports usually need different safe zones:
- Mobile (3:4): centered text → safe zone is horizontal band in middle 60%
- Tablet (4:3): off-center → safe zone left or right half
- Desktop (16:9): often left-third for headline, right-two-thirds for visual focal
So the three responsive spawns aren't crops of the same image — they're three separately composed images sharing palette/subject/lighting via the input-image style anchor.
Output: also emit a <picture> snippet for the user to copy:
<picture>
<source media="(min-width: 1024px)" srcset="hero-desktop.png 1x, hero-desktop@2x.png 2x" />
<source media="(min-width: 640px)" srcset="hero-tablet.png 1x, hero-tablet@2x.png 2x" />
<img src="hero-mobile.png" srcset="hero-mobile@2x.png 2x" alt="..." />
</picture>Save the snippet next to the assets as <name>.picture.html for easy paste.
Postprocess every output
After each codex completion, run:
scripts/postprocess.sh <output.png>Which:
- Strips EXIF
- Emits
<output>.webpsibling - For assets <2000px wide: emits
<output>@2x.pngretina variant - Writes
<output>.meta.jsonsidecar with prompt, model, dimensions, size, timestamp - Optionally runs
pngquantif installed
Cost defaults baked into codex prompts
quality: mediumfor iteration,highonly if the filename ends in-finalor user explicitly says "final"n: 1unless prompt requests variants- Refuse real-person likeness and trademarked characters by default — let codex's imagegen skill enforce this
Safety
If the user prompt names a real person, a trademarked character, or copyrighted brand mark, refuse and propose a generic alternative. Codex's imagegen skill enforces this too — we're double-checking at the orchestration layer.
Sidecar metadata
Every generated image gets a <path>.meta.json sidecar. Schema in templates/sidecar-meta.json. Why: when the user (or future-Claude) does replace mode, the sidecar holds the original prompt so we don't blind-guess.
Reporting
After every run (single or pack), print a compact summary:
Generated: 5 images, 3.2 MB total, ~$0.40 est.
hero.png 1920×1080 480 KB public/images/
og.png 1200×630 210 KB public/images/
feat-budgets.png 1024×1024 380 KB public/images/features/
feat-goals.png 1024×1024 360 KB public/images/features/
feat-cash.png 1024×1024 390 KB public/images/features/
Style anchor: flat geometric illustration, sage/sand/clay palette
Sidecars: 5 .meta.json files writtenIf the user is in a browser-visible context, also open the first generated image.
Reference map
references/intent-presets.md— intent → size/format/quality/naming presets tablereferences/project-detection.md— framework → output path; palette extraction from CSS/Tailwindreferences/negative-prompts.md— content-type-specific quality guardrailsreferences/style-anchors.md— style descriptor library forauto-packmodereferences/asset-pack-scan.md— what slots to find and how, per frameworkreferences/clarifying-questions.md— when to ask vs default; 20 use cases mapping vague requests to right questionsreferences/prompting.md— prompting principles (kept from legacy)references/sample-prompts.md— copy/paste prompt recipes by taxonomy (kept from legacy)scripts/scan-image-slots.sh— scans codebase for image slots, prints JSONscripts/postprocess.sh— compress + WebP + retina + sidecartemplates/pack-manifest.yaml—auto-packmanifest skeletontemplates/sidecar-meta.json— per-image metadata schemalegacy/— pre-codex direct-CLI implementation (preserved for fallback)