Travel headline
Prompt: "Add CLIFFSIDE MORNINGS in bold white serif at top center, a small italic sub-line beneath."
Upload a JPG or PNG, describe the caption you want in plain English, and Pixazo's GPT Image 2 renders the text directly onto your photo — at the spot, in the style, and in the language you described. No drag handles, no font picker, no sign-up to preview.
Sample · prompt: "editorial caption, top + bottom-right"
Pixazo's Add Text to Photo runs on GPT Image 2 — OpenAI's multimodal image model, served through Pixazo's gateway. Upload a JPG or PNG, describe what text you want and where it should go in plain English, and the model reads both the photo and your prompt together to render the caption. No drag handles, no font dropdown, no layer stack — the prompt is the interface.
Where it shines: fast caption generation from natural-language briefs, social tiles in batches (1–4 variations per run), multilingual captions (Arabic, Hebrew, Urdu, Persian, CJK all render natively with correct direction and ligatures), and editorial typography that respects the photo's focal subjects when you tell it to. Where it stops: pixel-precise drag-nudge UIs, exact .woff brand-font loading, knockout / cut-through effects that interact with photo geometry — those still need Photoshop or Figma.
Free tier: 1024×1024 output, 1 variation, Pixazo watermark. Plus / Pro unlock 2048×1152 high-quality output, up to 4 variations per request, PNG / WebP with transparent-background option, and commercial-use license.
You write it in your prompt — in plain English. GPT Image 2 reads both your photo and your instructions, then renders the text at the spot, in the size, and in the style you described. No grids, no drag handles, no coordinate inputs.
The right column is a cheat-sheet of prompt fragments that reliably land in the spot you want. Mix and match: position + size + color + style + (optional) alignment.
→ If a result isn't quite right, edit the prompt ("move slightly higher", "smaller", "thinner serif") and re-run. Each re-prompt costs one credit.
Four sample directions — pick a card to see the prompt behind it. Each was generated in under 10 seconds at the source resolution.
Prompt: "Add CLIFFSIDE MORNINGS in bold white serif at top center, a small italic sub-line beneath."
Prompt: "Centered serif quote with attribution beneath, in cream over a low-contrast still-life background."
Prompt: "Top kicker 'NEW ARRIVAL', a huge 60% in lime green centered, designed for an Instagram square."
Prompt: "Full masthead at top — title VOGUE-style, issue line, cover-story label, edition number bottom-right."
Drag a JPG or PNG into the upload box, or browse to pick. Up to 5 MB on the free tier, 25 MB on paid tiers. The image becomes the image input for GPT Image 2.
Describe the text, the placement, and the style in one sentence. e.g., "Add the headline CLIFFSIDE MORNINGS at top center in bold white serif with a subtle drop shadow."
Pick output size (1024×1024 up to 2048×1152), quality (standard or high), format (PNG or WebP), and how many variations to generate (1 to 4).
Click Generieren. GPT Image 2 renders your captioned photo in 6–12 seconds. Pick the variation you like and download as PNG or WebP. If a result is off, edit the prompt and re-run.
Four cases where Pixazo's text-on-photo placement falls short. We list them up-front so you can pick the right tool for the job before you upload.
Placement is prompt-driven — you describe where the text should go, you don't drag it to a pixel coordinate. If the first result lands 50px off, edit the prompt ("a bit higher", "closer to the edge") and re-run. Each re-prompt is one credit.
GPT Image 2 draws each glyph from its own learned typography — it cannot load your custom .woff file. Describe the style in the prompt ("Bodoni-like high-contrast serif", "Helvetica geometric sans"), or pass a reference image via the Pixazo API for closer 1:1 matching.
This is a static-image tool — the output is a single PNG or WebP, not a video / GIF / Lottie. For animated captions (lyric videos, kinetic typography, scrolling subtitles), use Pixazo's KI-Videoeditor or Lyric Video Generator.
GPT Image 2 places type on top of the photo. It doesn't wrap text behind a subject's outline, do knockout cut-throughs, or render 3D-extruded letterforms with consistent lighting. For those, finish in Photoshop, Figma, or Blender after generation.
Seven Fragen readers actually ask. Click any line to expand.
You decide — in plain English. Tell the prompt where you want the caption ("top center", "bottom-right corner", "across the sky", "below the subject") and GPT Image 2 places it there. If the first result isn't quite right, edit your prompt ("move slightly higher", "smaller text") and re-generate. There is no drag-handle UI — placement is prompt-driven.
There is no hard character cap, but shorter captions render sharpest. For a one-line headline (under 40 characters) the output is crisp at 2K resolution. For longer blocks (a paragraph, lyric, or quote) describe a smaller font in the prompt — e.g., "add the following block in a small caption font at bottom, max 8 lines" — and the AI will lay it out.
Describe the look in your prompt — e.g., "Bodoni-like high-contrast serif", "Helvetica-style geometric sans", "Playfair display". GPT Image 2 doesn't load .woff files; it draws each glyph from its learned typography. For exact 1:1 brand-font matching, use Pixazo's API with a reference image — the model imitates the reference letterforms.
Only if you tell it to. The model reads your photo and your prompt together — if you say "above the subject" or "in the sky, avoiding faces", it honors that. If you don't specify, GPT Image 2 picks a reasonable default (usually upper or lower band, away from clear focal subjects). For sensitive shots, always specify placement in the prompt.
Yes — describe every block in a single prompt. Example: "Add 'CLIFFSIDE MORNINGS' centered at top in bold white serif, and add '— a quiet hour before the tide' in italic gray beneath it." The model handles up to ~3–4 distinct text blocks per generation reliably; for more, generate in passes.
Yes. GPT Image 2 handles Arabic, Hebrew, Urdu, and Persian natively — direction, kerning, and ligatures all render correctly. Mixed Arabic + English (bidi) lines also work. Just write the text in your prompt in the target language; no special flag needed.
On the free tier, your source image is held for 24 hours then auto-deleted (used only to render the result and let you re-download). On Plus / Pro, you control retention via account settings (24h / 30d / never). Pixazo does not use your photos for model training. See the Datenschutzrichtlinie for the full data-handling section.
Lead AI Design Researcher, Pixazo · 8+ years in generative image systems
Deepak leads typography & placement quality for Pixazo's image-editing tools and owns the prompt-engineering playbook for the GPT Image 2 playground — the recommended prompt patterns, the fallback prompts when the model misreads a brief, and the QA harness that measures placement accuracy across 200+ photo / prompt pairs. Before Pixazo he led applied research for image-to-image diffusion at two AI startups.
Upload, write the prompt, generate. Eight seconds from raw photo to a download-ready captioned image — powered by GPT Image 2.