Gemini image generation and editing skill for text-to-image, image-to-image edits, multi-reference composition, and Google Search grounding. Use when creating or modifying images via Gemini (default model gemini-3-pro-image-preview) with the Python SDK.
Resources
1Install
npx skillscat add xiangyu-cas/vision-skills/image-generation Install via the SkillsCat registry.
Image Generation with Gemini
Use this skill when the user asks to generate or edit images with Gemini using the Python SDK. Default to gemini-3-pro-image-preview, and mention gemini-2.5-flash-image only as an optional faster/cheaper alternative.
Workflow
- Identify task type (text-to-image, edit, or multi-reference).
- Ensure
GEMINI_API_KEYis available (env or stored in.env), then use the Python SDK. This will make network requests to the Gemini API - Choose model + output (
response_modalities=["IMAGE"]if image-only) and run. Generation can take ~30 seconds; allow 30–60 seconds before retrying. - Save returned images with
part.as_image(); if none, report a clear error.
Use these references
references/python.mdfor Python SDK usage
Response handling (Python SDK)
Use part.as_image() to access image outputs and save them. If no image parts are returned, surface a clear error and suggest checking the API key, model name, and response modalities.
Timing note
Image generation may take around 30 seconds. When running commands via the shell tool, set a longer timeout (e.g., 60–120 seconds) to avoid premature timeouts.