Wan 2.2 by Alibaba Wan AI

Webサイトへのアクセス

Official Website

Main Features

Text to Video: Generate high-quality videos based on text descriptions
Image to Video: Convert static images into dynamic videos
First-Last Frame Control: Support for specifying start and end frames to generate intermediate transitions
Advanced Control Features: Provide precise video generation control and creative options
Character Reference & Motion Reference: Combine character style with reference motion to create personalized video content

Technical Characteristics

SOTA Performance: Consistently outperforms existing open-source models and commercial solutions across multiple benchmarks
Consumer-grade GPU Support: T2V-1.3B model requires only 8.19GB VRAM, compatible with almost all consumer-grade GPUs
Multi-task Capability: Excels in Text-to-Video, Image-to-Video, Video Editing, Text-to-Image, and Video-to-Audio tasks
Visual Text Generation: First video model capable of generating both Chinese and English text
Powerful Video VAE: Wan-VAE delivers exceptional efficiency and performance, encoding and decoding 1080P videos of any length

Model Versions

Wan2.2-I2V: 14B parameter model supporting 480P and 720P resolutions
Wan2.2-T2V: 14B parameter model supporting 480P and 720P resolutions
Wan2.2-T2V-1.3B: Lightweight version suitable for consumer-grade GPUs
Wan2.2-FLF2V-14B-720P: First-Last-Frame-to-Video generation model

Typical Applications

Complex Motion Generation: Excels at generating realistic videos with extensive body movements, complex rotations, dynamic scene transitions, and fluid camera motions
Physical Simulation: Generates videos that accurately simulate real-world physics and realistic object interactions
Cinematic Quality: Offers movie-like visuals with rich textures and a variety of stylized effects
Controllable Editing: Features a universal editing model for precise edits using image or video references

Usage Instructions

Users can generate videos through the web interface by inputting text descriptions or uploading images, with support for smart expansion and safety checker features. Generating a 5-second 480P video requires 130 credits and takes approximately 4 minutes on an RTX 4090.