wuyoscar

gpt-image

General-purpose image generation and editing via OpenAI GPT Image 2 (`gpt-image-2`). Wraps `/v1/images/generations` and `/v1/images/edits` with every documented parameter exposed as a CLI flag. Use whenever an agent or user needs to (a) generate an image from a text prompt, (b) edit or colorize an existing image, (c) combine multiple reference images, (d) inpaint with an alpha-channel mask, or (e) render dense typography / Chinese text. Reads `OPENAI_API_KEY` from env; writes PNG/JPEG/WebP to disk. Optional prompt-craft references under `references/` for photorealism, posters, infographics, and character sheets.

wuyoscar 2,613 241 Updated 1mo ago

Resources

6
GitHub

Install

npx skillscat add wuyoscar/gpt-image-2-skill

Install via the SkillsCat registry.

SKILL.md

gpt-image

General image generation/editing CLI for OpenAI's gpt-image-2. Designed for agents: all API parameters are first-class flags on scripts/generate.py, defaults are sane, output is a file on disk. The skill auto-loads when Claude detects an image-generation intent — no slash command needed.

One-line usage

uv run ~/.claude/skills/gpt-image/scripts/generate.py -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]

Reads OPENAI_API_KEY from env. Writes to OUT (or auto-named YYYY-MM-DD-HH-MM-SS-<slug>.png in ./fig/ or cwd). Prints output path(s) on stdout. Exit 0 on success, 1 on API error, 2 on bad args / missing key.

CLI flags (complete reference)

Flag Type / Values Default Applies to Description
-p, --prompt str — required both Text prompt for generation, or edit instruction.
-f, --file path auto both Output path. Auto-gen if omitted. Extension follows --format.
-i, --image path (repeatable) edits Reference image(s). Presence switches endpoint to /v1/images/edits.
-m, --mask path edits Alpha-channel PNG mask. Transparent = regenerated; opaque = preserved.
--model str gpt-image-2 both Model ID. Fallbacks: gpt-image-1.5, gpt-image-1, gpt-image-1-mini.
--size literal / shortcut 1024x1024 both Literals: 1024x1024, 1536x1024, 1024x1536, 2048x2048, 2048x1152, 3840x2160, 2160x3840, or any 16-px multiple up to 3840 max edge (3:1 ratio cap, 655k–8.3M total pixels). Shortcuts: 1k 2k 4k portrait landscape square wide tall.
--quality auto | low | medium | high high both Cost roughly 10× per step. low ≈ $0.005/img, medium ≈ $0.04, high ≈ $0.17. We default to high because gpt-image-2's typography and fine-detail rendering degrade noticeably below it. Override to low for quick iterations.
-n, --n int 1 both Number of images to return. >1 suffixes filenames _0, _1, …
--background auto | opaque API default generations only opaque disables transparent background.
--moderation auto | low API default generations only low relaxes content filter where policy allows.
--format png | jpeg | webp png both Response encoding.
--compression int 0–100 both JPEG/WebP compression. Ignored for PNG.
--user str both Optional end-user identifier for OpenAI abuse tracking.

Endpoint selection

Mode Trigger Endpoint
Generate from prompt no -i POST /v1/images/generations (JSON body)
Edit / reference-based -i one or more times POST /v1/images/edits (multipart form)
Inpaint with mask -i + -m POST /v1/images/edits with mask file

Canonical examples

# 1. Vanilla generate, 1K square, auto quality
uv run generate.py -p "a photorealistic convenience store at 10pm"

# 2. 2K portrait poster with exact Chinese text, high quality
uv run generate.py \
  -p 'Design a 3:4 tea poster. Exact copy: "山川茶事" / "冷泡系列" / "中杯 16 元"' \
  --size portrait --quality high -f poster.png

# 3. 4-image grid, transparent background disabled, webp
uv run generate.py -p "isometric furniture, minimalist" \
  -n 4 --background opaque --format webp --compression 85

# 4. Edit / colorize existing image
uv run generate.py -p "colorize this manga page and translate to Chinese" \
  -i page.jpg -f colored.png

# 5. Multi-reference brand collab
uv run generate.py -p "77 (the cat) × KFC employee poster" \
  -i cat.png -i kfc_logo.png -f collab.png --size portrait

# 6. Masked inpaint
uv run generate.py -p "replace sky with aurora, keep foreground intact" \
  -i photo.jpg -m sky_mask.png -f aurora.png --quality high

# 7. 4K widescreen render
uv run generate.py -p "cinematic Shanghai skyline at dusk" \
  --size 4k --quality high -f skyline.png

Response handling

  • API returns data: [{ b64_json: "…" }] by default; the script decodes base64 and writes bytes.
  • If the API returns url instead, the script GETs the URL and writes the downloaded bytes.
  • With -n > 1, files are suffixed: out.pngout_0.png, out_1.png, …

Error surface

Condition Exit stderr
OPENAI_API_KEY unset 2 error: OPENAI_API_KEY not set. ...
--mask without -i 2 error: --mask requires --image (edits endpoint only)
-i path missing 2 error: --image not found: PATH
OpenAI returns non-2xx 1 error: <status> from OpenAI: <body> (first 2000 chars of response)
Response has no image data 1 error: no image data in response: <json>

When an agent hits exit 1, it should surface the response body verbatim — it usually names the problem (rate limit, moderation block, invalid size).

Size picking guide

Intent Size
Default / social square 1024x1024 (1k)
Mobile screenshot, portrait poster, beauty/skincare 1024x1536 (portrait)
Landscape photo, gameplay screenshot 1536x1024 (landscape)
Hi-res print, paper figure 2048x2048 (2k)
Widescreen cinematic, dashboard hero 3840x2160 (4k)
Tall story banner, vertical video thumbnail 2160x3840 (tall)

Prompt-craft references (optional, load only when needed)

These are not required to use the script — they exist for prompt-quality uplift when the user's intent needs more structure than a one-liner.

  • references/craft.md — 12 cross-cutting principles: exact-text-in-quotes, aspect-ratio-first, camera/shot language, scene density, style anchoring, negation, reference-based unlocks, dense Chinese text, three-glances test.
  • references/gallery.md — 56 community-curated templates across 8 categories: photography, games, UI/UX, typography, infographics, character consistency, editing, collage. Each entry keeps its original Source: @handle attribution.

Load a reference file only when the user's request signals that category (e.g. asks for a poster → load typography section of gallery; asks about rendering Chinese → load craft.md sections 1, 7, 10).

Attribution

Prompt patterns curated from `ZeroLu/awesome-gpt-image` under CC BY 4.0. Individual @handle attributions preserved per-entry in references/gallery.md.

Categories