gpt-image

General-purpose image generation and editing via OpenAI GPT Image 2 (`gpt-image-2`). Wraps `/v1/images/generations` and `/v1/images/edits` with every documented parameter exposed as a CLI flag. Use whenever an agent or user needs to (a) generate an image from a text prompt, (b) edit or colorize an existing image, (c) combine multiple reference images, (d) inpaint with an alpha-channel mask, or (e) render dense typography / Chinese text. Reads `OPENAI_API_KEY` from env; writes PNG/JPEG/WebP to disk. Optional prompt-craft references under `references/` for photorealism, posters, infographics, and character sheets.

wuyoscar 3,862 325 Updated 3mo ago

Resources

GitHub

Install

npx skillscat add wuyoscar/gpt-image-2-skill

Install via the SkillsCat registry.

SKILL.md

gpt-image

General image generation/editing CLI for OpenAI's gpt-image-2. Designed for agents: all API parameters are first-class flags on scripts/generate.py, defaults are sane, output is a file on disk. The skill auto-loads when Claude detects an image-generation intent — no slash command needed.

One-line usage

uv run ~/.claude/skills/gpt-image/scripts/generate.py -p "PROMPT" [-f OUT] [-i REF...] [-m MASK] [options]

Reads OPENAI_API_KEY from env. Writes to OUT (or auto-named YYYY-MM-DD-HH-MM-SS-<slug>.png in ./fig/ or cwd). Prints output path(s) on stdout. Exit 0 on success, 1 on API error, 2 on bad args / missing key.

CLI flags (complete reference)

Flag	Type / Values	Default	Applies to	Description
`-p, --prompt`	str	— required	both	Text prompt for generation, or edit instruction.
`-f, --file`	path	auto	both	Output path. Auto-gen if omitted. Extension follows `--format`.
`-i, --image`	path (repeatable)	—	edits	Reference image(s). Presence switches endpoint to `/v1/images/edits`.
`-m, --mask`	path	—	edits	Alpha-channel PNG mask. Transparent = regenerated; opaque = preserved.
`--model`	str	`gpt-image-2`	both	Model ID. Fallbacks: `gpt-image-1.5`, `gpt-image-1`, `gpt-image-1-mini`.
`--size`	literal / shortcut	`1024x1024`	both	Literals: `1024x1024`, `1536x1024`, `1024x1536`, `2048x2048`, `2048x1152`, `3840x2160`, `2160x3840`, or any 16-px multiple up to 3840 max edge (3:1 ratio cap, 655k–8.3M total pixels). Shortcuts: `1k` `2k` `4k` `portrait` `landscape` `square` `wide` `tall`.
`--quality`	`auto` \| `low` \| `medium` \| `high`	`high`	both	Cost roughly 10× per step. `low` ≈ $0.005/img, `medium` ≈ $0.04, `high` ≈ $0.17. We default to `high` because gpt-image-2's typography and fine-detail rendering degrade noticeably below it. Override to `low` for quick iterations.
`-n, --n`	int	`1`	both	Number of images to return. `>1` suffixes filenames `_0`, `_1`, …
`--background`	`auto` \| `opaque`	API default	generations only	`opaque` disables transparent background.
`--moderation`	`auto` \| `low`	API default	generations only	`low` relaxes content filter where policy allows.
`--format`	`png` \| `jpeg` \| `webp`	`png`	both	Response encoding.
`--compression`	int 0–100	—	both	JPEG/WebP compression. Ignored for PNG.
`--user`	str	—	both	Optional end-user identifier for OpenAI abuse tracking.

Endpoint selection

Mode	Trigger	Endpoint
Generate from prompt	no `-i`	`POST /v1/images/generations` (JSON body)
Edit / reference-based	`-i` one or more times	`POST /v1/images/edits` (multipart form)
Inpaint with mask	`-i` + `-m`	`POST /v1/images/edits` with `mask` file

Canonical examples

# 1. Vanilla generate, 1K square, auto quality
uv run generate.py -p "a photorealistic convenience store at 10pm"

# 2. 2K portrait poster with exact Chinese text, high quality
uv run generate.py \
  -p 'Design a 3:4 tea poster. Exact copy: "山川茶事" / "冷泡系列" / "中杯 16 元"' \
  --size portrait --quality high -f poster.png

# 3. 4-image grid, transparent background disabled, webp
uv run generate.py -p "isometric furniture, minimalist" \
  -n 4 --background opaque --format webp --compression 85

# 4. Edit / colorize existing image
uv run generate.py -p "colorize this manga page and translate to Chinese" \
  -i page.jpg -f colored.png

# 5. Multi-reference brand collab
uv run generate.py -p "77 (the cat) × KFC employee poster" \
  -i cat.png -i kfc_logo.png -f collab.png --size portrait

# 6. Masked inpaint
uv run generate.py -p "replace sky with aurora, keep foreground intact" \
  -i photo.jpg -m sky_mask.png -f aurora.png --quality high

# 7. 4K widescreen render
uv run generate.py -p "cinematic Shanghai skyline at dusk" \
  --size 4k --quality high -f skyline.png

Response handling

API returns data: [{ b64_json: "…" }] by default; the script decodes base64 and writes bytes.
If the API returns url instead, the script GETs the URL and writes the downloaded bytes.
With -n > 1, files are suffixed: out.png → out_0.png, out_1.png, …

Error surface

Condition	Exit	stderr
`OPENAI_API_KEY` unset	2	`error: OPENAI_API_KEY not set. ...`
`--mask` without `-i`	2	`error: --mask requires --image (edits endpoint only)`
`-i` path missing	2	`error: --image not found: PATH`
OpenAI returns non-2xx	1	`error: <status> from OpenAI: <body>` (first 2000 chars of response)
Response has no image data	1	`error: no image data in response: <json>`

When an agent hits exit 1, it should surface the response body verbatim — it usually names the problem (rate limit, moderation block, invalid size).

Size picking guide

Intent	Size
Default / social square	`1024x1024` (1k)
Mobile screenshot, portrait poster, beauty/skincare	`1024x1536` (portrait)
Landscape photo, gameplay screenshot	`1536x1024` (landscape)
Hi-res print, paper figure	`2048x2048` (2k)
Widescreen cinematic, dashboard hero	`3840x2160` (4k)
Tall story banner, vertical video thumbnail	`2160x3840` (tall)

Prompt-craft references (optional, load only when needed)

These are not required to use the script — they exist for prompt-quality uplift when the user's intent needs more structure than a one-liner.

references/craft.md — 12 cross-cutting principles: exact-text-in-quotes, aspect-ratio-first, camera/shot language, scene density, style anchoring, negation, reference-based unlocks, dense Chinese text, three-glances test.
references/gallery.md — 56 community-curated templates across 8 categories: photography, games, UI/UX, typography, infographics, character consistency, editing, collage. Each entry keeps its original Source: @handle attribution.

Load a reference file only when the user's request signals that category (e.g. asks for a poster → load typography section of gallery; asks about rendering Chinese → load craft.md sections 1, 7, 10).

Attribution

Prompt patterns curated from `ZeroLu/awesome-gpt-image` under CC BY 4.0. Individual @handle attributions preserved per-entry in references/gallery.md.

gpt-image

Resources

Install

gpt-image

One-line usage

CLI flags (complete reference)

Endpoint selection

Canonical examples

Response handling

Error surface

Size picking guide

Prompt-craft references (optional, load only when needed)

Attribution

Categories

Install

Recommended Skills