Video Generation - SiliconFlow

1. Use Cases

Video generation models are technologies that utilize text or image descriptions to generate dynamic video content. With continuous advancements in technology, their application scenarios are becoming increasingly diverse. Below are some potential application areas:

Dynamic Content Generation: Video generation models can create dynamic visual content for describing and explaining information.
Multimodal Intelligent Interaction: By combining image and text inputs, video generation models can be used in more intelligent and interactive application scenarios.
Replacing Traditional Visual Technologies: Video generation models can replace or enhance traditional machine vision technologies, addressing more complex multimodal problems. As technology progresses, the multimodal capabilities of video generation models will integrate with vision-language models, driving their comprehensive applications in intelligent interaction, automated content generation, and complex scene simulation. Additionally, video generation models can combine with image generation models (image-to-video), further expanding their application scope to achieve richer and more diverse visual content generation.

2. Usage Recommendations

When crafting prompts, focus on detailed, chronological descriptions of actions and scenes. Include specific actions, appearances, camera angles, and environmental details. All content should be written cohesively in one paragraph, starting directly with the action. The description should be specific and precise, imagining yourself as a cinematographer describing a shot script. Keep the prompt within 200 words. For optimal results, construct prompts using the following structure:

Begin with a single sentence describing the main action.
- Example: A woman with light skin, wearing a blue jacket and a black hat with a veil, first looks down and to her right, then raises her head back up as she speaks.
Add specific details about actions and gestures.
- Example: She first looks down and to her right, then raises her head back up as she speaks.
Precisely describe the appearance of the character/object.
- Example: She has brown hair styled in an updo, light brown eyebrows, and is wearing a white collared shirt under her blue jacket.
Include details about the background and environment.
- Example: The background is out of focus but shows trees and people in period clothing.
Specify the camera angle and movement.
- Example: The camera remains stationary on her face as she speaks.
Describe the lighting and color effects.
- Example: The scene is captured in real-life footage, with natural lighting and true-to-life colors.
Note any changes or unexpected events.
- Example: A gust of wind blows through the trees, causing the woman’s veil to flutter slightly.

Example video generated from the above prompt:

3. Experience Address

You can click Playground to try it out.

4. Supported models

4.1 Text-to-video models

Currently supported text-to-video models:

Wan-AI/Wan2.1-T2V-14B
Wan-AI/Wan2.1-T2V-14B-Turbo
Wan-AI/Wan2.2-T2V-A14B

4.2 Image-to-Video Resolution

Currently supported image-to-video models:

Wan-AI/Wan2.1-I2V-14B-720P
Wan-AI/Wan2.2-I2V-A14B

The resolution is automatically matched based on the aspect ratio of the user’s uploaded image:

16:9 👉 1280×720
9:16 👉 720×1280
1:1 👉 960×960

Note: The list of supported text-to-video models may change. Please filter by the “Video” tag in the “Models” to view the list of supported models.

​1. Use Cases

​2. Usage Recommendations

​3. Experience Address

​4. Supported models

​4.1 Text-to-video models

​4.2 Image-to-Video Resolution

1. Use Cases

2. Usage Recommendations

3. Experience Address

4. Supported models

4.1 Text-to-video models

4.2 Image-to-Video Resolution