Wan - AI Video Generator

MODEL

Choose an AI palette for artistic style.

Wan 2.6

Create Video with Character Reference

Wan — AI Video Generator

3 specialized video generation models by Alibaba. Character reference videos, image animation with synced sound, and realistic character animation.

Wan 2.6

Create Video with Character Reference

Generates character videos driven by visual references

UPLOAD REFERENCE (VIDEO 1)* — video upload, required
UPLOAD REFERENCE (VIDEO 2) — optional second video reference
PROMPT — up to 1000 characters
ASPECT RATIO — Portrait or Landscape only

WAN 2.5

Animate image with synced sound

Animates a static image matched to audio

REFERENCE IMAGE* — image upload, required
PROMPT — up to 3000 characters
DURATION — 5 sec or 10 sec dropdown

Wan 2.2 Animate

Create realistic character videos

Generates character animation from image only

UPLOAD CHARACTER IMAGE* — image upload, required
UPLOAD REFERENCE VIDEO — optional video upload
No prompt field — visual input only

Model Details & Workflows

Wan 2.6 — Two-Video Reference Workflow

Wan 2.6 accepts two reference videos (one required, one optional) and generates a new video based on visual patterns extracted from both. This dual-input approach allows for more nuanced character motion synthesis. You may also provide a text prompt (up to 1000 chars) to guide the generation. Aspect ratio is limited to Portrait or Landscape — Square output is not supported.

WAN 2.5 — Image-to-Video with Audio Sync

WAN 2.5 takes a static reference image and generates an animated video. You can provide a prompt (up to 3000 characters) and select output duration (5 or 10 seconds). This model is optimized for creating smooth animations from single-frame inputs with audio-synchronized motion where applicable.

Wan 2.2 Animate — Visual-Input-Only Generation

Wan 2.2 Animate works purely from visual inputs. Upload a character image (required) and optionally a reference video. This model does not use text prompts — all generation parameters come from image analysis and optional motion reference. This approach can produce highly consistent character animations without prompt interpretation overhead.

Input Guidelines

Reference Videos for Wan 2.6

Upload clear, well-lit video clips with a single subject against a simple background. Avoid fast cuts, scene changes, or multiple people. Keep reference videos under 10 seconds for best motion extraction quality. The reference person's proportions do not need to match your target character exactly, but extreme differences may produce unnatural results. Gross body motion (walking, gesturing, turning) transfers well; subtle expressions and micro-gestures often get lost.

Character Images for Wan 2.2 Animate and WAN 2.5

Start with clear, well-lit still images of your character. Transparent or solid-color backgrounds reduce generation artifacts. Wan analyzes pose, proportions, and visual features, so consistency in these elements produces the most reliable results. Full-body and upper-body poses both work well.

Aspect Ratio and Duration

Wan 2.6 supports Portrait (vertical) and Landscape (horizontal) aspect ratios only — Square is not available. WAN 2.5 can generate videos in either 5-second or 10-second durations. Choose based on your animation complexity and output requirements.

Prompt Guide (Wan 2.6 & WAN 2.5)

"The character walks forward confidently, arms swinging naturally, looking straight ahead"

Clear directional movement with natural secondary motion. Both models handle walking animations well when direction and gait are described simply.

"The character raises their right hand and waves slowly, slight smile, head tilts to the left"

Specific body part instructions with emotional cues. Including which hand and direction improves accuracy for upper-body animations.

"The character dances with energetic hip-hop movements, arms and legs in full motion"

High-energy motion prompt. Models can generate dance movements but they tend to be generic interpretations. For precise choreography, use reference video instead.

Honest Limitations

Character proportions may shift. During animation, limb lengths, head size, and body proportions can drift from the original character image, especially in longer or complex animations.
Hands and fingers remain problematic. Like most current video generation models, Wan frequently produces malformed hands — extra fingers, merged digits, or anatomically impossible bending during motion.
Wan 2.6 has no Square aspect ratio. This model only supports Portrait and Landscape output. You cannot generate square videos with Wan 2.6.
Wan 2.2 Animate does not accept text prompts. This model relies entirely on image and optional reference video inputs. Text descriptions have no effect on generation for this model.
Reference video extraction has limits. Fast movements, occlusions, and unclear footage produce poor motion extraction. Best results come from clean, single-person reference clips with steady lighting.
No environmental interaction. Characters do not realistically interact with objects, furniture, or surfaces. A character attempting to sit may hover above a chair or clip through it.
Multi-character scenes are unreliable. All Wan models are optimized for single-character animation. Scenes with two or more characters often produce merging artifacts, inconsistent motion, or one character being ignored entirely.

Frequently Asked Questions

What is Wan and who made it?

Wan is an AI video generation model developed by Alibaba Cloud (the Wan Team), focused on character animation and video synthesis. Pixazo provides access to Wan through its platform layer. Pixazo does not own, train, or modify the underlying model — it serves as an interface and compute layer.

What is the difference between Wan 2.6, WAN 2.5, and Wan 2.2 Animate?

Wan 2.6 generates videos using two optional reference videos plus an optional text prompt. WAN 2.5 animates a static image with text guidance and lets you choose duration (5 or 10 seconds). Wan 2.2 Animate uses only image and optional reference video inputs — no text prompts. Each model has different strengths: use Wan 2.6 for multi-reference video-driven motion, WAN 2.5 for image animation with detailed prompts, and Wan 2.2 Animate for pure visual consistency without prompt interpretation.

Can I use Square aspect ratio with Wan 2.6?

No. Wan 2.6 supports only Portrait and Landscape aspect ratios. If you need square video output, consider WAN 2.5 or Wan 2.2 Animate as alternatives, though these models may have different output ratio options.

Does Wan 2.2 Animate accept text prompts?

No. Wan 2.2 Animate is a visual-input-only model. It generates animations based on the character image and optional reference video alone. Text prompts are not supported and will have no effect on the output.

What are the main limitations across all Wan models?

Key limitations include: character proportions can shift during animation, hands and fingers are frequently malformed, reference video extraction works best with clean single-subject clips, environmental interaction is not realistic, and multi-character scenes often have artifacts. Additionally, Wan 2.6 is limited to Portrait/Landscape aspect ratios, and Wan 2.2 Animate does not accept text prompts.

Which Wan model should I use?

Choose based on your input and control preferences: Use Wan 2.6 if you want to guide animation with reference videos and optional text. Use WAN 2.5 if you prefer to animate a single image with detailed text descriptions and fixed duration. Use Wan 2.2 Animate if you want pure visual consistency without text interpretation, relying only on character and reference images.