Image Generation — webhouse.app Docs

Agents can generate images via Google Gemini 3 Pro Image. Output goes through the same media pipeline as user uploads — variants, EXIF, AI alt-text — and every image is tagged as AI-generated.

What is the `generate_image` tool?

When you enable Image generation on an agent, the agent gains a new tool called generate_image. The tool calls Google's Gemini 3 Pro Image model (commonly known as Nano Banana 2) and produces a real image from a text prompt. The image is saved to the site's media library, optimised, analyzed, and stamped with provenance metadata — all in one tool call.

The agent decides on its own when to call the tool. A typical run looks like:

The agent reads its prompt ("Write a short post about mountain sunrises").
The agent calls generate_image with a descriptive prompt of its own ("A serene sunrise over snow-capped mountains, soft golden hour light, photorealistic").
Gemini returns the image bytes.
The image is saved, processed, analyzed for alt-text, and the tool returns a Markdown image tag pointing at the saved file.
The agent embeds that tag at the top of the article body.
The final post lands in the curation queue with the image already in place.

Pipeline parity with uploads

The key design decision: a generated image goes through the exact same processing pipeline as a user-uploaded image. Specifically:

Step	Same as upload?
Save bytes to `public/uploads/` via the media adapter	✓
Generate WebP variants (Sharp, default 400 / 800 / 1200 / 1600 widths)	✓
Extract EXIF metadata	✓ (will be empty on synthetic images)
Run F44 AI vision analysis to produce caption + alt-text + tags	✓
Append to `media-meta.json`	✓

The only thing that differs is provenance. Generated images get four extra fields on their MediaMeta entry:

json
{
  "generatedByAi": true,
  "generatedByModel": "gemini-3-pro-image-preview",
  "generatedAt": "2026-04-08T...",
  "generatedPrompt": "A serene sunrise over snow-capped mountains..."
}

Marking and filtering in the media library

Every AI-generated image gets a distinct purple AI badge on the media card (separate from the gold sparkles badge that marks AI-analyzed-but-uploaded images). The badge tooltip shows the original prompt for quick context.

The media list sidebar gets a new AI generated filter under the AI Analysis section. Click it to see only the images your agents have produced.

Both grid view and list view render the badge.

Cost

Nano Banana 2 (Gemini 3 Pro Image Preview) costs $0.039 per image. The cost is charged to the Cockpit budget via addCost() immediately after a successful generation. If you have per-agent cost guards enabled, the image cost counts toward those caps too.

Failure mode: no hallucinated placeholders

The tool description has a strict rule baked in: if generation fails, the agent must omit the image entirely. No placeholder URLs, no "image coming soon" text, no stock images.

When generate_image fails, the handler returns a string that begins with Image generation failed:. The agent's tool description tells it that any such string means "do NOT include any image in your final output. Continue writing the article without one." The error message itself reminds the agent of this rule inline.

The curation Preview modal also defends against this — if a Markdown image URL doesn't start with http(s)://, /, or data:, it renders a small "⚠ Invalid image URL" warning chip instead of a broken <img>.

Configuration

Make sure the Gemini API key is set on the org or site (org-level aiGeminiApiKey is inherited via F87).
Open the agent you want to enable, scroll to Tools, tick Image generation (Gemini Nano Banana), save.
Run the agent with a prompt that benefits from an image.

Keys resolved in order: ai-config.json → GOOGLE_GENERATIVE_AI_API_KEY env → GEMINI_API_KEY env. If none are set, the tool returns null from buildToolRegistry and is silently skipped — the agent can still run, just without the image option.

Webhook and curation embed

When the agent finishes, the agent.completed webhook embed renders the generated image inline (Discord embed.image) provided the image URL is publicly reachable. Locally-generated images can't be reached by Discord, so the embed falls back to a clickable link in the description until the document is approved and deployed.

What is the generate_image tool?