Make Realistic Lip-sync Music Videos with SeeDance 2.0

I just made this music video, and the lip-sync portion is amazingly impressive.

I actually used SeeDance 2.0 Fast at Kie.ai, but you can use SeeDance 2.0 as well and get up to 1080p resolution. For each generation, I used

  • Prompt
  • Reference image (not first-frame image)
  • Reference video (this was just a black video containing the audio clip)
  • “Generate audio” enabled
  • “Web search” disabled
  • Duration = duration of reference video

SeeDance 2 supports reference audio, but for some reason it didn’t lip-sync my reference image and sometimes it would change the lyrics.

Here’s a screenshot of the inputs.

Below are the inputs and outputs for various lip-sync clips.

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

Reference image:

Reference Video:

Output:


Prompt:

use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.

A man (same subject, unchanged face and outfit) singing into a microphone at the center of a large ancient Roman-style amphitheater at night. Camera is positioned at chest height, medium close-up framing, stable and focused on the singer.

The audience fills the stone bleachers behind him, hundreds of people seated and standing, naturally animated: subtle head movements, clapping, cheering, shifting in seats, occasional phone screens glowing, realistic variation in motion without repetition.

Warm golden stage lighting illuminates the singer from the front and slightly below, creating a cinematic glow on his face. Behind the singer, rows of soft amber lights line the steps and columns. Moving stage lights sweep slowly across the audience and architecture, creating gentle light motion across the crowd and stone surfaces.

The night sky is clear with visible stars. Light atmospheric haze adds depth and catches the beams of moving lights. The columns and amphitheater remain stable and realistic.

The singer performs naturally: subtle head movement, mouth lip-syncing accurately, slight body sway, breathing and posture shifts.

Camera behavior: very subtle cinematic push-in (slow, minimal), no drifting or unintended orbit, no zoom jitter. Maintain subject as the clear focal point at all times.

Depth of field: subject sharp, audience slightly softened but still readable.

Lighting style: warm amber/yellow tones only, no harsh white light, no overexposure, cinematic contrast.

Reference image:

Reference Video:

Output:

Here are some similar clips using the same prompt and reference image but different reference videos (for the audio).

Reference Video:

Output:

Reference Video:

Output:

SeeDance 2.0 seems to also support making a video of a person playing a musical instrument in a way that matches the sounds in a reference source. Consider the following:

Prompt:

use the song from the given video and use the character from the given image to make a music video of the man playing the guitar sounds in the song. sync the playing of the guitar to the guitar sounds in the song.

Reference image:

Reference Video:

Output: