Video Generation

The Video Generation node creates short videos from text prompts, images, or existing videos. It supports multiple AI providers and generation modes.

Inputs & Outputs

Direction	Handle	Position	Type	Description
Input	`start-frame`	20%	🟣 Image	Starting image for image-to-video or interpolation
Input	`end-frame`	35%	🟣 Image	Ending image for video interpolation
Input	`references`	50%	🟣 Image	Style or content reference images
Input	`prompt-input`	65%	🟢 Text	Text description of the desired video
Output	`video-output`	—	🟠 Video	The generated video
Output	`first-frame-out`	—	🟣 Image	First frame extracted from the video
Output	`last-frame-out`	—	🟣 Image	Last frame extracted from the video

Not all inputs are used in every mode. The node adapts its interface based on the selected provider and connected inputs.

Generation Modes

Text to Video

Generate a video entirely from a text prompt — no image inputs needed.

How to use: Connect only a Text Prompt to the prompt-input handle.

Available on: Veo, Sora, fal.ai

Image to Video

Animate a single image into a video clip. The image becomes the starting frame and the AI generates motion from there.

How to use: Connect an image to the start-frame handle, plus a Text Prompt describing the desired motion.

Available on: Veo, Sora, fal.ai

Video Interpolation

Generate a video that transitions between two keyframe images. The AI creates smooth motion from the first image to the second.

How to use: Connect images to both start-frame and end-frame handles.

Available on: Veo (8-second duration, forced)

Video Remix

Remix an existing video with a new prompt to change its style or content while maintaining the original motion.

How to use: Connect a video source and a Text Prompt describing the desired changes.

Available on: Sora only

Mode Auto-Detection

The node automatically selects the appropriate mode based on your connections:

Veo:

Both start-frame and end-frame connected → Video Interpolation
Only start-frame or end-frame connected → Image to Video
No image inputs → Text to Video

Sora:

Remix video connected → Video Remix
start-frame connected → Image to Video
No image inputs → Text to Video

Provider Comparison

Feature	Veo 3.1	Sora	fal.ai
Duration	4, 6, or 8 seconds	5, 8, 10, 15, or 20 seconds	5, 6, or 10 seconds
Aspect Ratio	16:9, 9:16	Size-based (see below)	16:9, 9:16, 1:1
Resolution	720p HD, 1080p Full HD	—	—
Audio	Yes (generated soundtrack)	No	No
Reference Images	Up to 3 (Asset) or 1 (Style)	—	—
Modes	Text-to-video, Image-to-video, Interpolation	Text-to-video, Image-to-video, Remix	Text-to-video, Image-to-video

Sora Size Options

Instead of aspect ratios, Sora uses explicit dimensions:

Size	Aspect Ratio
1280×720	16:9
720×1280	9:16
1792×1024	16:9 HD
1024×1792	9:16 HD

Veo Reference Types

When using Veo with reference images, you can choose:

Type	Max Images	Use Case
Asset	3	Content reference — the AI incorporates elements from the images
Style	1	Style reference — the AI matches the visual style and mood

Configuration

Setting	Description	Availability
Model	AI model to use for generation	All providers
Duration	Video length in seconds	All providers
Aspect Ratio	Output dimensions	Veo, fal.ai
Size	Output pixel dimensions	Sora
Resolution	720p or 1080p	Veo only
Audio	Generate soundtrack with the video	Veo only
Reference Type	Asset (content) or Style (mood)	Veo only

Frame Outputs

The Video Generation node also outputs the first and last frames of the generated video as separate image outputs. This is useful for:

Chaining videos — use the last frame of one video as the start frame of the next
Creating thumbnail images from video content
Feeding frames back into Image Generation for further refinement

Video generation costs vary by model, duration, and resolution. Longer videos and higher resolutions cost more credits. Credits are deducted before generation starts and refunded automatically on failure.

See Credit System for details.

Tips

Start with shorter durations (4–5 seconds) to iterate on prompts before committing to longer videos
Use Image to Video mode for more predictable results — the starting image anchors the output
For Veo, enable audio generation to get a matching soundtrack (enabled by default)
Connect the last-frame-out to another Video Generation node's start-frame to create multi-shot sequences

On this page