I've Been Using Seedance 2.0 for Twenty Days, I Recommend You Save This Set of Prompt Formulas

I’ve been following the video generation track for almost three years, starting from Runway Gen-2, testing all the way to Sora, Kling, Veo, I’ve used almost all mainstream tools on the market. I’ve heard too many “revolutionary” slogans over the years, but most of them look amazing in demos and fall apart when you use them yourself.

So when Seedance 2.0 came out, I didn’t rush to write a review. I used it intensively for two weeks, made about fifty videos, stepped on a lot of pitfalls, and today I’m compiling the truly useful practical experience.

First, the conclusion: this tool is really good, but only if you know how to write prompts.

Seedance 2.0 Twenty Days Usage Experience

1. First Understand What Makes Seedance 2.0 Special

Leaving other features aside, two points are most practical for ordinary creators:

First, it supports four types of input: image, video, audio, and text. You can reference any uploaded content in natural language - actions, effects, camera movements, characters, scenes, even sounds.

In plain language: before, when you had an image in your head, you had to translate it into “spells” that the model could understand. Now you can just throw the materials at it and tell it “reference the camera movement of this video, the character from this image”, and it will accurately understand your needs.

The official limit is a maximum of 9 images, 3 videos (total duration no more than 15 seconds), 3 audio files, adding up to no more than 12 materials. For most scenarios, 3-5 images + 1 reference video are enough, adding too many反而容易出现信号冲突.

Another point I didn’t notice at first but found extremely satisfying after using it: previous video generation tools produce images first and add audio later, but Seedance 2.0 generates audio and video simultaneously through a dual-branch diffusion Transformer architecture.

What does that mean? You write “a girl in a cafe smiling and saying ‘the weather is really nice today’”, and the video you get will have matching lip movements, ambient sounds (coffee machine, faint chatter) all included, even background music is already matched for you. Before, this process would take half an hour in CapCut, now it’s done in one go.

2. The Universal Formula I Use Every Day Now

After tinkering with so many videos, the most stable prompt structure I’ve summarized is this:

Subject + Scene + Action + Lighting + Camera Language + Style + Constraints

It sounds complicated, but it’s just splitting an image into 7 elements. Let’s take an example, you’ll understand immediately:

A young woman wearing a beige knitted sweater, sitting at a wooden table by the window (subject + scene), holding a hot coffee with both hands, gently blowing the steam from the cup mouth, then lifting her eyes to look out the window (action), afternoon sunlight filtering through blinds onto her face, forming dappled light and shadow (lighting), camera slowly pushing from side medium shot to facial close-up (camera language), cinematic Japanese fresh style, warm tones, film grain (style), stable and smooth picture, clear details, avoid hand deformities (constraints)

If you throw this directly into Seedance, the generated video will basically be usable right away.

3. Several Very Counterintuitive Points, I’ve Stepped on All These Pitfalls

1. Don’t Translate Chinese into English

I was writing prompts in English for the first week, thinking that like Midjourney, English works better. It turns out Seedance 2.0 doesn’t have this problem at all, there’s no need to translate Chinese prompts into English, it’s a complete waste of time.

Just write directly in Chinese, and the more colloquial the more accurate, “a girl elegantly tucks her hair behind her ear” gives more accurate results than “a girl elegantly touches her hair”, because its training data is mainly in Chinese.

2. Words Like “Good-looking”, “Beautiful”, “Premium” Are Equivalent to Saying Nothing

Vague prompts only produce unpredictable results. Instead of writing “city at night”, write “cyberpunk night scene, neon lights reflecting on wet streets, flying cars in the background, rain falling”.

The principle is simple: you have to write things that can be actually seen visually, not subjective feelings in your mind.

3. Action Descriptions Should Be Written as Storyboards, Not as Stories

This is the point I want to emphasize the most. Words like “dancing”, “walking”, “laughing” are basically useless, you have to write more specifically:

❌ Wrong example: Girl dancing ✅ Correct example: Girl steps out with left foot first, hands spread naturally, skirt flutters when turning, finally stops and poses side profile

Core principle: Write prompts as storyboards, not as stories. Every sentence should describe specific content that can be observed in the picture. I have this sentence posted on my monitor now to read every day.

4. Three Ready-made Templates, Copy and Use

I’ve compiled templates for three high-frequency usage scenarios. Just copy directly and adjust the content as needed.

Template 1: Character Emotion Close-up (Suitable for avatar videos, splash screens)

A long-haired girl leaning against a floor-to-ceiling window, wearing a white cotton shirt, slightly lowering her head, right hand gently brushing a strand of falling hair, then slowly lifting her head to look at the camera, corners of mouth naturally rising into a soft smile, morning soft light filtering through sheer curtains, forming delicate diffused light on her face, camera smoothly pushing from medium shot to facial close-up, shallow depth of field, cinematic Japanese portrait style, cream tones, realistic skin texture, smooth picture without shaking, avoid facial distortion.

Template 2: Atmospheric Landscape (Suitable for Vlog openings, background materials)

West Lake in Hangzhou on an autumn morning, thin mist covering the water, Three Pools Mirroring the Moon faintly visible in the distance, an old willow tree by the bank hanging over the water, several fallen leaves slowly falling onto the water, creating small ripples, camera slowly rising from low angle above the water, passing by willow branches, finally freezing on the distant lake view, Chinese ink painting style, ink wash tones, golden morning light, quiet and restrained picture, rich details, avoid camera shake.

Template 3: Product Display (Best for e-commerce, Xiaohongshu)

An amber essential oil dropper bottle standing still on a dark marble countertop, dried flowers and an open old book casually placed beside it, camera slowly rotating 360 degrees around the bottle, top light filtering through blinds forming oblique light beams, warm halo reflecting on the bottle body, high-end advertising film style, dark retro style, rich and saturated colors, exquisite picture, product clear without distortion, no extra text watermarks.

Master these skills and you will basically avoid 80% of the pitfalls, and the quality of your generated videos will improve significantly. If you’ve used other AI video tools before, I believe you will feel a significant efficiency improvement when using Seedance 2.0.

Start Using Seedance 2.0