1. Inicio
  2. Herramientas de IA
  3. Add Text to Photo
Free · No watermark · 10s renders
Powered by GPT Image 2 · OpenAI multimodal

Add text to your photofree, online, in 10 seconds.

Upload a JPG or PNG, describe the caption you want in plain English, and Pixazo's GPT Image 2 renders the text directly onto your photo — at the spot, in the style, and in the language you described. No drag handles, no font picker, no sign-up to preview.

4.6★
2,104 ratings
~8 s
avg. render
2K
max output
4
variations / run
AI-captioned travel photo example with editorial-style headline overlay Sample · prompt: "editorial caption, top + bottom-right"
Wanderlust
Cliffside Mornings
— a quiet hour, before the tide

Pixazo's Add Text to Photo runs on GPT Image 2 — OpenAI's multimodal image model, served through Pixazo's gateway. Upload a JPG or PNG, describe what text you want and where it should go in plain English, and the model reads both the photo and your prompt together to render the caption. No drag handles, no font dropdown, no layer stack — the prompt is the interface.

Where it shines: fast caption generation from natural-language briefs, social tiles in batches (1–4 variations per run), multilingual captions (Arabic, Hebrew, Urdu, Persian, CJK all render natively with correct direction and ligatures), and editorial typography that respects the photo's focal subjects when you tell it to. Where it stops: pixel-precise drag-nudge UIs, exact .woff brand-font loading, knockout / cut-through effects that interact with photo geometry — those still need Photoshop or Figma.

Free tier: 1024×1024 output, 1 variation, Pixazo watermark. Plus / Pro unlock 2048×1152 high-quality output, up to 4 variations per request, PNG / WebP with transparent-background option, and commercial-use license.

Prompt-driven placement

How do you tell it where to put the text?

You write it in your prompt — in plain English. GPT Image 2 reads both your photo and your instructions, then renders the text at the spot, in the size, and in the style you described. No grids, no drag handles, no coordinate inputs.

The right column is a cheat-sheet of prompt fragments that reliably land in the spot you want. Mix and match: position + size + color + style + (optional) alignment.

→ If a result isn't quite right, edit the prompt ("move slightly higher", "smaller", "thinner serif") and re-run. Each re-prompt costs one credit.

The Squad · 4 sample directions

What kinds of captions can you make?

Four sample directions — pick a card to see the prompt behind it. Each was generated in under 10 seconds at the source resolution.

Editorial-style overlay on travel photo, sample 01
SAMPLE / 01

Travel headline

Prompt: "Add CLIFFSIDE MORNINGS in bold white serif at top center, a small italic sub-line beneath."

Quote-card overlay on still-life photo, sample 02
SAMPLE / 02

Quote card

Prompt: "Centered serif quote with attribution beneath, in cream over a low-contrast still-life background."

Product-promo overlay on lifestyle photo, sample 03
SAMPLE / 03

Product promo

Prompt: "Top kicker 'NEW ARRIVAL', a huge 60% in lime green centered, designed for an Instagram square."

Magazine-cover overlay with masthead text, sample 04
SAMPLE / 04

Magazine cover

Prompt: "Full masthead at top — title VOGUE-style, issue line, cover-story label, edition number bottom-right."

4 steps · ~45 sec total

How do you add text to a photo?

01

Upload

Drag a JPG or PNG into the upload box, or browse to pick. Up to 5 MB on the free tier, 25 MB on paid tiers. The image becomes the image input for GPT Image 2.

02

Write the prompt

Describe the text, the placement, and the style in one sentence. e.g., "Add the headline CLIFFSIDE MORNINGS at top center in bold white serif with a subtle drop shadow."

03

Set outputs

Pick output size (1024×1024 up to 2048×1152), quality (standard or high), format (PNG or WebP), and how many variations to generate (1 to 4).

04

Genera

Click Genera. GPT Image 2 renders your captioned photo in 6–12 seconds. Pick the variation you like and download as PNG or WebP. If a result is off, edit the prompt and re-run.

Honest limits · What this tool won't do

Where the AI falls short

Four cases where Pixazo's text-on-photo placement falls short. We list them up-front so you can pick the right tool for the job before you upload.

  • No pixel-precise drag-handle UI

    Placement is prompt-driven — you describe where the text should go, you don't drag it to a pixel coordinate. If the first result lands 50px off, edit the prompt ("a bit higher", "closer to the edge") and re-run. Each re-prompt is one credit.

  • Exact .woff brand-font loading isn't supported here

    GPT Image 2 draws each glyph from its own learned typography — it cannot load your custom .woff file. Describe the style in the prompt ("Bodoni-like high-contrast serif", "Helvetica geometric sans"), or pass a reference image via the Pixazo API for closer 1:1 matching.

  • No animated text or video output

    This is a static-image tool — the output is a single PNG or WebP, not a video / GIF / Lottie. For animated captions (lyric videos, kinetic typography, scrolling subtitles), use Pixazo's Editor de vídeo con IA or Lyric Video Generator.

  • Knockout, masking & 3D text effects are out of scope

    GPT Image 2 places type on top of the photo. It doesn't wrap text behind a subject's outline, do knockout cut-throughs, or render 3D-extruded letterforms with consistent lighting. For those, finish in Photoshop, Figma, or Blender after generation.

Q & A

Preguntas frecuentes

Seven frecuentes readers actually ask. Click any line to expand.

Q · 01How does Pixazo decide where to put the text?+

You decide — in plain English. Tell the prompt where you want the caption ("top center", "bottom-right corner", "across the sky", "below the subject") and GPT Image 2 places it there. If the first result isn't quite right, edit your prompt ("move slightly higher", "smaller text") and re-generate. There is no drag-handle UI — placement is prompt-driven.

Q · 02How long can the text be?+

There is no hard character cap, but shorter captions render sharpest. For a one-line headline (under 40 characters) the output is crisp at 2K resolution. For longer blocks (a paragraph, lyric, or quote) describe a smaller font in the prompt — e.g., "add the following block in a small caption font at bottom, max 8 lines" — and the AI will lay it out.

Q · 03Can I match a specific brand font?+

Describe the look in your prompt — e.g., "Bodoni-like high-contrast serif", "Helvetica-style geometric sans", "Playfair display". GPT Image 2 doesn't load .woff files; it draws each glyph from its learned typography. For exact 1:1 brand-font matching, use Pixazo's API with a reference image — the model imitates the reference letterforms.

Q · 04Will the text cover up faces or focal objects?+

Only if you tell it to. The model reads your photo and your prompt together — if you say "above the subject" or "in the sky, avoiding faces", it honors that. If you don't specify, GPT Image 2 picks a reasonable default (usually upper or lower band, away from clear focal subjects). For sensitive shots, always specify placement in the prompt.

Q · 05Can I add multiple lines or blocks of text in one go?+

Yes — describe every block in a single prompt. Example: "Add 'CLIFFSIDE MORNINGS' centered at top in bold white serif, and add '— a quiet hour before the tide' in italic gray beneath it." The model handles up to ~3–4 distinct text blocks per generation reliably; for more, generate in passes.

Q · 06Does Pixazo support right-to-left languages like Arabic and Hebrew?+

Yes. GPT Image 2 handles Arabic, Hebrew, Urdu, and Persian natively — direction, kerning, and ligatures all render correctly. Mixed Arabic + English (bidi) lines also work. Just write the text in your prompt in the target language; no special flag needed.

Q · 07What happens to my photo after I add text — does Pixazo store it?+

On the free tier, your source image is held for 24 hours then auto-deleted (used only to render the result and let you re-download). On Plus / Pro, you control retention via account settings (24h / 30d / never). Pixazo does not use your photos for model training. See the Política de privacidad for the full data-handling section.

DJ
Reviewed by

Lead AI Design Researcher, Pixazo · 8+ years in generative image systems

Deepak leads typography & placement quality for Pixazo's image-editing tools and owns the prompt-engineering playbook for the GPT Image 2 playground — the recommended prompt patterns, the fallback prompts when the model misreads a brief, and the QA harness that measures placement accuracy across 200+ photo / prompt pairs. Before Pixazo he led applied research for image-to-image diffusion at two AI startups.

Ready to caption a photo?

Upload, write the prompt, generate. Eight seconds from raw photo to a download-ready captioned image — powered by GPT Image 2.