Subtitle Edit is a free app that lets you generate and edit subtitles from a video. If you have a song that you want the lyrics for, you can export the song as a video and then add the video to Subtitle Edit. In the example below, I want the lyrics for an Italian music video.
Since my video is in Italian, I set the language option to Italian.
When processing is done, you will see the lyrics in the left pane. Clicking on a verse will jump the playhead to the point in the waveform where that verse begins. The “Text” field lets you edit subtitle text. In the waveform, you can also drag the vertical start and end lines for each verse, which will update the timestamp accordingly.
Most AI image-to-video generation tools support first-frame reference images. Considering how much more expensive video generation is compared to image generation, it makes sense to use image references, like a first frame, when generating videos. However, providing a first-frame image, with or without a last-frame reference, still fails when you need character consistency because the AI model only knows what the character looks like in the first-frame image. Fortunately, SeeDance 2.0 supports multiple reference images, so you can upload both a first-frame image and a character sheet containing different views of a character.
For example, I had the following character sheet.
I then used it to create the following first-frame image using Nano Banana 2 in OpenArt.ai.
If I zoom in, I can see that the face looks close enough to the one in the character sheet.
When I created a video using Kling 2.5 of the woman walking, using that image as the first frame, I got the following.
The video starts out fine because of the first-frame reference, but as it progresses, the woman’s face slowly changes and looks less and less like the one in the character sheet. Here’s a screenshot of just her face in one frame of the video.
What’s particularly different is the nose, but the width and height of her face looks somewhat different as well, especially compared to the character sheet.
Now, let’s see how the same video turned out using SeeDance 2.0 with multiple references. For this, I used Kie.ai.
Since I wanted to keep the setting and just replace the subject, I used Photoshop to “remove” the subject from the previous image. I selected the subject and clicked “Remove”, which used AI to remove the woman.
This is what I got.
Next, I upload the character sheet and setting image to Kie.ai (SeeDance 2.0 page), gave it the same prompt I used in Kling 2.5.
Here’s the resulting video.
Notice how the character looks EXACTLY like the one in the character sheet throughout the entire video clip.
Here’s a close-up of the face near the end of the clip.
I just made this music video, and the lip-sync portion is amazingly impressive.
I actually used SeeDance 2.0 Fast at Kie.ai, but you can use SeeDance 2.0 as well and get up to 1080p resolution. For each generation, I used
Prompt
Reference image (not first-frame image)
Reference video (this was just a black video containing the audio clip)
“Generate audio” enabled
“Web search” disabled
Duration = duration of reference video
SeeDance 2.0 supports generating videos up to 15 seconds long. But, if you give it a 15s reference video and you want to lipsync a character in it, the lipsync won’t work. So, when generating lipsync videos, always provide a reference video that is no longer than 13 seconds to be safe.
When creating reference videos, make sure the duration is a whole number, not a fraction, e.g., 5 full seconds, not 5.5 seconds. The reason is because in the UI, Kie.ai or another app may round down the duration to the nearest whole second, and if you tell SeeDance you want to generate a 5s video, then it will generate a 5s video, not a 5.5-second video, and your lip-sync video will be truncated. I use Capcut to generate my black reference videos. I put a playhead at a location where I want a segment to begin and end and set a marker at each location, making sure the time ends with :00 (no frames), e.g.
start 2:54:00
end: 3:04:00
duration = 10s
If I really need to split at a location between seconds, like 2:54:09, then make sure the end location includes the same number of frames, e.g., 3:04:09, so you end up with a duration in whole seconds.
SeeDance 2 supports reference audio, but for some reason, it didn’t lip-sync my reference image, and sometimes it would change the lyrics.
Also, the following method worked well for English audio. It may not work for other languages. If you find that it doesn’t work for your language, then see some options below.
Here’s a screenshot of the inputs.
Below are the inputs and outputs for various lip-sync clips.
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
Reference image:
Reference Video:
Output:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing.
A man (same subject, unchanged face and outfit) singing into a microphone at the center of a large ancient Roman-style amphitheater at night. Camera is positioned at chest height, medium close-up framing, stable and focused on the singer.
The audience fills the stone bleachers behind him, hundreds of people seated and standing, naturally animated: subtle head movements, clapping, cheering, shifting in seats, occasional phone screens glowing, realistic variation in motion without repetition.
Warm golden stage lighting illuminates the singer from the front and slightly below, creating a cinematic glow on his face. Behind the singer, rows of soft amber lights line the steps and columns. Moving stage lights sweep slowly across the audience and architecture, creating gentle light motion across the crowd and stone surfaces.
The night sky is clear with visible stars. Light atmospheric haze adds depth and catches the beams of moving lights. The columns and amphitheater remain stable and realistic.
The singer performs naturally: subtle head movement, mouth lip-syncing accurately, slight body sway, breathing and posture shifts.
Camera behavior: very subtle cinematic push-in (slow, minimal), no drifting or unintended orbit, no zoom jitter. Maintain subject as the clear focal point at all times.
Depth of field: subject sharp, audience slightly softened but still readable.
Lighting style: warm amber/yellow tones only, no harsh white light, no overexposure, cinematic contrast.
Reference image:
Reference Video:
Output:
Here are some similar clips using the same prompt and reference image but different reference videos (for the audio).
Reference Video:
Output:
Reference Video:
Output:
Playing Musical Instruments That Sync to Music
SeeDance 2.0 also seems to support making a video of a person playing a musical instrument in a way that matches the sounds in a reference source. Consider the following:
Prompt:
use the song from the given video and use the character from the given image to make a music video of the man playing the guitar sounds in the song. sync the playing of the guitar to the guitar sounds in the song.
Reference image:
Reference Video:
Output:
Here’s another example.
Prompt: use the song from the given video and use the character from the given image to make a music video of the man playing the saxophone such that the sound of the saxophone in the song is in sync with the playing of the saxophone. the background should be solid green as in the reference image for chroma key background removal later on. the camera remains fixed. do not zoom in or out. the man’s fingers move on the saxophone naturally and in sync with the sound from the song. the man moves his body naturally as he plays the saxophone.
The duration was set to the duration of the reference video. I used SeeDance 2.0 Fast and 720p resolution. I later upscaled the video to 4K using Topaz Video AI.
Result
SeeDance 2.0 Error – output audio may contain sensitive information
If you get an error that says “The request failed because the output audio may contain sensitive information.”, then disable audio generation.
For example, in order to make the following video,
I had to use the following settings in Kie.ai:
Prompt: use the song from the given video and use the character from the given image to make a music video of the man singing the song in front of a green screen, as shown in the reference image. he stands in place and sings the exact lyrics in the song audio as if lip-syncing to the audio with natural face and body movements, but keep his hands beside his body. The camera is fixed and doesn’t zoom in or out and doesn’t pan.
Do not add shadows, floor shadows, lighting gradients, reflections, stage lighting, environmental lighting, or any background elements. Camera locked off and completely static.
Reference Image:
Reference Video: (example)
generate_audio: off
The duration was set to the duration of the reference video. The resulting clip was
I then removed the green background in Capcut to overlay the singer on a series of background video clips.
Singing Lip Sync Videos Using HeyGen
If your song is not in English and SeeDance 2.0 can’t lip-sync it correctly, then use HyeGen with custom motion enabled, as follows.
Log in to HeyGen and create an avatar. You can simply upload a photo of your singer. I used the one below. I put my avatar on a green background so I can chroma key it out.
Open Avatar Studio and
in the Script section on the left, instead of typing your script, upload your song’s audio (mp3)
in the Avatar and Voice section on the right, under Voice, you can ignore this since you’ll be using the audio you uploaded
in the Avatar and Voice section on the right, under Motion Engine, choose “Avatar IV”
then, and this is important, click the “Advanced Settings” button.
Toggle on “More expressive motion” and enter a custom motion prompt.
Optionally, you may click the “Generate motion prompts” icon, which will generate motion tags as shown below.
Then, click the Generate button.
Following are examples comparing different settings.
HeyGen LipSync Using Avatar IV WITHOUT Custom Motion
HeyGen LipSync Using Avatar IV WITH Custom Motion
In this example, I didn’t click the “Generate motion prompt” button.
HeyGen LipSync Using Avatar IV WITH Custom Motion
In this example, I did click the “Generate motion prompt” button.
As you can see, in the first example, the avatar doesn’t look like he is singing at all, and in the last 2 examples, the avatar looks more expressive. It may be difficult to tell the difference for such a short clip, but the difference is actually huge when you lipsync a full song, as in the following example.
The lip-sync quality is definitely not as good as SeeDance 2.0, but it seems to be the best option when SeeDance 2.0 doesn’t work for a particular language.
UPDATE 6/5/2026
There’s another way to generate lip-sync videos using SeeDance 2.0, and it supports non-English languages. Here, I’m using Kie.ai. Instead of uploading a black video with audio, I upload an audio and include the lyrics in the prompt.
Inputs:
Prompt: Lyrics: Naik bajaj jingga bunyinya setengah mati
The guy in reference @image1 sings the verse in @audio1 in a music video way. The verse in the lyrics is in the Indonesian language. Keep the camera fixed. Don’t zoom in or out. Keep the background solid green as in reference @image1. The man in reference @image1 moves his body naturally in a music video way.
Reference Image:
Reference Audio:
Duration: 6s
Output:
Inputs:
Prompt: Lyrics: Naik bajaj jingga bunyinya setengah mati
The guy in reference @image1 sings the verse in @audio1 in a music video way. The verse in the lyrics is in the Indonesian language. Keep the camera fixed. Don’t zoom in or out. Keep the background solid green as in reference @image1. The man in reference @image1 moves his body naturally in a music video way.
Reference Image:
Reference Audio:
Duration: 6s
Output:
UPDATE 6/13/2026 – Actually, using a video reference containing the audio is better than an audio reference. See following example.
Inputs:
Prompt: Lyrics:
Còn tôi như cánh chim Sẽ bay đi muôn phương Mang về mầm xanh tươi
use the song from the given video (@video1) and use the character from the given image (@image1) to make a music video of the man singing the song. he sings the exact lyrics in the song as if lip-syncing. The lyrics are in Vietnamese. He sings passionately and moves his body naturally to the sound of the music. Keep the camera fixed. Don’t zoom in or out. Keep the background solid green as in reference @image1.
Reference Image:
Reference Video:
Duration: 11s
Output:
Create Cinematic, Multi-Shot Lip-sync Music Videos
To create cinematic, multi-shot lip-sync music videos in one SeeDance 2.0 video generation, do the following:
Give Claude or ChatGPT the lyrics to the whole song so it knows what the song is about
Create reference video clips in 720p containing audio segments that are 14s or less. Don’t split mid-word.
For each clip, give Claude the mp3 and the lyrics for that clip, if any, and tell Claude you want a SeeDance prompt to generate a music video. Specifically, tell Claude to give you the shots (scenes) similar to the example below.
Shot 1: Medium-close on the singer at golden hour along a cliffside coast, glowing amber coastline and ocean curving behind him, warm sun on his face. Camera slow gentle push-in. He is the only person in frame.
Shot 2: Medium shot of the singer standing at a coastal overlook, vast golden California coastline stretching into the distance behind him, soft waves and warm haze. Camera slow drift. He is the only person in frame.
Shot 3: Medium-close, front-on, on the singer with the blazing golden sunset coastline glowing behind him, the warmest light of the clip full on his face, a peaceful contented expression. Camera slow push-in. He is the only person in frame.
Then, append it to your base prompt, which is
LYRICS: “[enter lyrics for the clip / segment here]”
use the song from the given video (@video1) and use the character from the given image (@image1) to make a music video of the man singing the song
@image1 is the face and identity reference for the lead singer — match his face, afro, beard, and glasses to @image2 throughout, keeping his identity consistent.
The generated audio must match the audio in @video1 EXACTLY and the lip sync must match the vocal segments in @video1 EXACTLY.
4. Add your character sheet as the first reference image (@image1).
5. Add your reference video
6. Specify a duration that matches the reference video duration
Here’s how I created this overview video in 5 minutes with just a single prompt.
Get Content
Gather the content for the overview video you want to create. The content can be local files (PDFs, text files, etc), copied text, and website URLs. In my case, I got these URLs to pages explaining how credit cards work:
In the left pane of NotebookLM, add your source content.
Add a Prompt
In the middle pane, add a prompt describing what you want NotebookLM to do. In my example, I asked ChatGPT to give me a prompt to tell NotebookLM to generate an overview video of the content in my sources, which I then pasted into NotebookLM.
Generate Overview Video
In the right pane, click “Video Overview” to have NotebookLM generate an overview video based on the content and your prompt. My 4-minute video was generated in a few minutes.
Let’s say you have the rights to a song, e.g, a song that’s in the public domain, and you want to create a cover for it by only replacing the lyrics. Here’s how you can do it using ChatGPT and Suno AI.
Upload the original song to Suno
For this example, I uploaded this song.
Suno will add your uploaded song to your workspace, as shown below. When you click on the song’s title, you’ll see an auto-generated style description of the song and the lyrics.
Song Style
A French pop song with a moderate tempo and a romantic, dreamy atmosphere, The instrumentation features a prominent acoustic guitar playing arpeggiated chords, a bass guitar providing a smooth, walking bass line, and a drum kit with a soft, brushed snare sound, Synthesizers contribute to the ethereal quality with pad sounds and occasional melodic lines, The female lead vocalist sings with a soft, breathy tone, employing a gentle vibrato, The song structure follows a verse-chorus pattern with a bridge, The chord progression is primarily diatonic, creating a sense of warmth and familiarity, Reverb is applied generously to the vocals and some instrumental elements, enhancing the dreamy quality, The overall mix is balanced, with the vocals sitting clearly in the foreground
To create new lyrics, the easiest way is to ask ChatGPT to do it for you. For example, here’s a prompt I gave ChatGPT to generate new lyrics that meet my specific criteria:
I want to create a cover of a French song. The lyrics are below. Give me English lyrics that have a similar theme and can be used as a drop-in replacement for the French lyrics so I can use the Suno cover feature. If the French song has a rhyming pattern, the new lyrics should rhyme as well. If possible, try to match the syllable count, but prioritize the flow and meaning of the lyrics over the syllable count.
[Verse 1] Quand tu me souris Mon cœur s’envole Tout devient doux Tout prend son rôle …
ChatGPT will then respond with the lyrics. Here’s what it gave me:
[Intro]
(Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa)
VERSE 1
When you smile at me, My heart goes free, Everything glows, Warm like it knows, Your name rings on, Soft as a song, In dreams each night, You are my light.
[Chorus]
Your love lifts me higher each day, Like an angel learning to fly, Every touch sends my worries away, Under moonlight, you open the sky.
[Verse 2]
Your eyes shine bright, like summer air, Every night I find you there, Your soft voice brings me into dream, Calling you flows like a stream.
[Chorus]
Your love lifts me higher each day, When your hands are holding mine, Time stands still in your gentle way, In your heart, all my echoes align.
[Bridge]
Even when the morning shines, And the world moves on, Still your love stays in my mind, You’re the dream I lean upon.
[Chorus]
Your love lifts me higher each day, Every moment feels so true, I feel you— So high in love with you.
[Outro]
Floating through the sky (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa) (Para para pa)
Create cover song
Now that you have new lyrics, go back to Suno and click on the 3 dots next to the original song title, the click “Remix/Edit” > “Cover”
In the left pane, Suno will load the original song and lyrics. Replace the original lyrics with your new lyrics. In the “Style” field, paste the Suno-generated style description. Suno will also set some advanced options, like “weirdness”, “style influence”, and “audio influence”. You can keep the defaults.
Click “Create”. Suno will create 2 cover songs, as shown below.
Edit the cover song
After listening to the two cover songs, I like the second one more, but one section didn’t sound right. Some of the lines in verse 2 sounded rushed. To fix this, edit the song by clicking on “Open in Editor”.
With the editor open, you’ll see the song’s waveform, color-coded by section. Click on the section containing the lyrics you wan to edit. In this example, that’s the pink section shown below. When you click on it, the lyrics for the section will be selected in the lyrics box on the left. You can then type in revised lyrics in the “new lyrics” box below it. In this case, I made some of the lyrics shorter (fewer words).
Click the “Replace” button. Suno will generate two alternate versions of that section with the modified lyrics you provided. Click the play button beside each one to preview the alternate versions. If you don’t like either one, click “Regenerate” to generate more versions. When you like a version, click “Commit” to replace the section with the new section.
When you’re done editing, click “Save as new song”. The edited song will appear in your workspace.
You can then download the song.
Here’s the cover song in English. As you’ll hear, the backing instrumentals sound almost identical to the original French song, but the lyrics are new.
Recently, I needed some high-quality Mediterranean images. I tried searching stock photo libraries, but they were expensive, it took too long, and the images weren’t that good or what I was really looking for. I found several videos on YouTube that had the type of images I wanted, but I didn’t want to copy them exactly, so I used AI to create new images inspired by them. Here’s how I created them.
AI Model: SeeDream 4.0
Prompt: Create a Greek home using the same colors, lighting, and elements from the reference image, but it should look different from the reference image.
Reference Image:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image using the same colors, lighting, and elements from the reference image, but it should look different from the reference image. The perspective should be 45 degrees from the perspective of the reference image, facing the sea.
Reference Image:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same composition and view as the first one, but use colors and materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same composition and view as the first one, but use colors and materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same composition and view as the first one, but use colors and materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition and view as the first one, but use colors and materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition and view as the first one, but use colors and materials from the second one. Keep the blue and white tile in the first image as is.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
AI Model: SeeDream 4.0
Prompt: Create an image with the same, layout, composition, elements and view/framing as the first one, but use colors and building materials from the second one.
Reference Images:
AI-Generated Image:
I ended up using the images to create 5-second background videos using Kling AI for this music video:
There are many talking lipsync AI tools out there. The inputs are usually text and an image of a person. The results are almost indistinguishable from a non-AI talking video. But when it comes to lipsync videos involving singing, that’s a whole different story. Generating realistic singing lipsync videos is apparently very challenging. Tools like Kling AI and Runway ML, despite being very popular tools for video generation, do a horrible job at this. After trying a number of tools, the two best ones I’ve found are TopMediAI and HeyGen. In this post, I’ll share my experience using them.
UPDATE 12/19/2025: Longcat Avatar is a new option that is worth trying and comparing against.
UPDATE 12/8/2025: There’s a new singing lipsync generator called WaveSpeed MultiTalk (WAN 2.1). Preliminary testing indicates that, with respect to video quality, MultiTalk is better than TopMediAI but not as good as Heygen. With respect to lipsync, Multitalk is just as good as TopMediAI and better than Heygen.
This tool does a decent job at creating singing lipsync videos, and the interface is very simple and intuitive. Though it’s designed for singing, it’s far from perfect.
Inputs
upload an audio file (mp3) between 2 and 30 seconds
upload an image of the character you want to sing
When generating a video using TopMediAI, sometimes, generation will fail repeatedly. From my experience, you have to keep trying 3-5 times until generation succeeds. It’s annoying, but it’ll eventually work.
This tool was designed for creating talking lipsync videos, not for singing. Nevertheless, it’s most advanced motion engine (Avatar IV) does a pretty good job a generating a singing lipsync video if you choose the “Quality” mode with a “Custom motion” value of “singing”. If you use the “Avatar Unlimited” engine, the results are just not good enough, in my opinion.
Update 12/8/2025: If you use the “Faster” generation mode, the quality appears to be just as good as the “Quality” mode, so just choose that mode since it costs half the cost of the “Quality” mode.
The process to create a lipsync video using HeyGen is more complex. Here are the steps:
Click “Avatars” > “Create New” > “Start from a photo” >
Upload a photo and wait for it to be processed
Choose to create a new avatar or add the photo as a new “look” of an existing avatar. (One avatar can have multiple “looks”)
Click “Create with AI Studio”
Click “Audio” > “Upload Audio” , then upload your audio clip. You can upload a clip anywhere between 1 second and 3 minutes.
You can also choose from a previously uploaded audio.
Play and confirm the uploaded/selected audio.
HeyGen will attempt to transcribe the audio. If transcription fails, you won’t be able to proceed. In my experience, if it fails, it’s usually because the audio clip is too short. When I upload a longer clip, it usually can transcribe it. Note that the transcription can be wrong. This doesn’t appear to matter, as the video generation appears to be based on sound rather than words.
Click “Generate”.
Comparing HeyGen to TopMediAI
Body movements
Neither TopMediAI nor HeyGen will make your character dance, but they will animate your character’s body to some extent. This is good, because older technologies literally only animated the lips or face and left everything else frozen/static. I feel that TopMediAI generates stronger body and lip movements, which makes the results look more realistic from that perspective.
Lipsync accuracy
When uploading a audio clip, it’s better to isolate the vocals from the backing track to prevent TopMediAI and HeyGen from getting confused. Neverthless, even when you upload the vocal track of a song, both AI tools occasionally produce inaccurate results, e.g., instead of lip movements to sing the word “hati”, TopMediAI made the lip movements as if to sing the word “hapi”; it wasn’t able to detect the difference between the “t” and “p” sounds. HeyGen seems to do a better job at lipsync accuracy.
Sustained vocal sounds
TopMediAI animates both the subject’s body and their lips to try to match the sounds in the audio file. This is particularly necessary for sustained vocal sounds, like in the following example.
Using the same inputs, and using HeyGen’s most advanced model (Avatar IV in “Quality” mode with a “Custom Motion” value of “Singing”, you can see below that HeyGen failed.
Video picture quality
With TopMediAI, if you upload an image of a zoomed-out character, even if it’s a hi-res image, the tool will have difficulty detecting the facial features, and the resulting video will be blurry with lots of artifacts. For that reason, I only upload images containing close-up shots of the character from the waist up. However, even then, the picture quality of the generated lipsync video deteriorates, sometimes significantly. For example, here’s the source image I uploaded to TopMediAI:
And here’s a frame from the generated video:
That’s a big difference.
HeyGen, on the other hand, does a much better job at preserving picture quality of the source. For example, compare the source and generated (screenshot) images below.
Teeth
TopMediAI can’t seem to produce consistent and natural-looking teeth. Sometimes, the results are acceptable, but other times, they are not. Compare the following.
Teeth appear okay, but still not perfect.Teeth appear heavily chipped
HeyGen, on the other hand, does a very good job and showing natural, and almost perfect, teeth, as in this example:
Output resolution
With HeyGen, you can export videos up to 4K quality. With TopMediAI, there are no resolution options.
Recommendations
I would definitely use HeyGen’s Avatar IV with the “Quality” mode first to generating singing lipsync videos. If the results don’t look good, then I’d use TopMediaAI as a fallback.
When generating images and videos using AI, you need to include in your prompt how the camera takes the picture or video. Following are common camera shots and movements you can reference when creating your prompts. The example images below were all generated using Nano Banana. The videos were generated using Kling AI.
Shot Types (Distance and Framing)
Shot Type
Description
Extreme Close-Up (ECU)
Focuses tightly on one detail (eyes, mouth, hand). Great for emotional intensity.
Close-Up (CU)
Shows the subject’s head and shoulders — captures facial emotion clearly.
Medium Shot (MS)
Shows the subject from waist up — good balance between subject and background.
Full Shot (FS)
Shows the entire body of the subject within the frame.
Wide Shot (WS)
Shows the subject and full surroundings — emphasizes environment.
Extreme Wide Shot (EWS)
Subject is small within a large landscape — epic and cinematic.
Over-the-Shoulder (OTS)
Camera is behind one person’s shoulder, focusing on what they’re looking at.
POV (Point of View)
The camera shows what the character sees — immersive perspective.
Extreme Close-Up
Close-Up
Medium Shot
Full Shot
Wide Shot
Extreme Wide Shot
Over-the-Shoulder Shot
POV (Point of View) Shot
Camera Angles
Angle
Description
Eye-Level
Neutral, natural perspective — like the viewer’s eye line.
High Angle
Camera looks down on the subject — makes them seem small or vulnerable.
Low Angle
Camera looks up at the subject — makes them seem powerful or heroic.
Dutch Angle (Tilted)
Camera is tilted diagonally — adds tension or unease.
Bird’s Eye View / Top-Down
Shot from directly above — good for movement, choreography, or maps.
Eye-Level, medium shot
High-angle, medium shot
Low-angle, medium shot
Dutch Angle (Tilted), medium shot
Bird’s Eye View / Top-Down shot
Camera Movements (Dynamic Shots)
Movement
Description
Static
Camera doesn’t move — perfect for portraits or emotional moments.
Pan
Camera rotates horizontally left ↔ right.
Tilt
Camera moves vertically up ↕ down.
Dolly (Push/Pull)
Camera moves forward or backward smoothly on rails. Great for dramatic reveals.
Truck (Left/Right)
Camera moves side to side — similar to dolly but horizontally.
Crane / Jib
Camera moves up or down through large vertical space — majestic motion.
Orbit / 360° Move
Camera circles around the subject — cinematic hero shot.
Tracking / Follow Shot
Camera moves following the subject — conveys movement and energy.
Zoom In / Out
Lens zooms, not physical movement — adds focus or emotional punch.
Handheld
Shaky or organic motion — feels immersive or documentary-style.
Drone Shot
High-altitude or sweeping view — ideal for landscapes, travel, or sports.
Artificial Intelligence (AI) is about teaching computers to do smart things that normally require human intelligence — such as understanding language, recognizing faces, playing games, or creating art and music.
AI learns from lots of examples so it can notice patterns. For example, if you show AI thousands of pictures of cats and dogs, it will eventually know the difference between what a cat looks like and what a dog looks like. It doesn’t know what a cat or dog does; it just knows what a cat or dog looks like. Data is the fuel of AI. The more data and the cleaner the data, the better the AI is. AI improves by trial and error. Initially, AI will guess what a cat or dog looks like. If it makes a mistake and humans correct it, AI will learn and, eventually, not make the same mistake.
Huge Neural Networks
AI uses neural networks, which are like a simplified brain. Large Language Model (LLM) and Diffusion Model are two types of huge networks. These models are trained on billions of examples from the internet.
Imagine a big school containing teachers (humans) and students (AIs).
The Teacher (Humans)
The teachers (humans) give the students (AIs) tons of examples: books, images, songs, videos, code — everything. The students don’t just memorize — they practice until they can do similar things on their own.
The LLM Student (AI)
One student, Lucy the LLM, loves reading and writing. She studies every book in the library and learns:
“After the words ‘Once upon a’, the next word is usually ‘time’.”
She becomes amazing at predicting the next word and can write essays, stories, or even hold a conversation — because she knows how words fit together.
Lucy = Large Language Model (writes and speaks intelligently).
The Diffusion Student (AI)
Another student, Danny the Diffusion Model, loves art. His training exercise:
The teacher shows him a picture.
Then they cover it with random paint splatters.
Danny learns to carefully “un-splatter” the image until it looks clear again.
After years of practice, Danny can now start with a blank canvas (just random dots) and, when you say “a cat wearing sunglasses,” he paints that from scratch.
Danny = Diffusion Model (paints images from words).
The Big Picture
Both Lucy and Danny are smart in different ways:
Lucy talks and writes (text world 🌍).
Danny paints and visualizes (image world 🎨). They often work together — Lucy writes the idea, Danny draws it.
Types of AI
Category
Description
Example
Narrow AI
Specialized, task-based AI
ChatGPT
Generative AI
Creates new content
DALL·E, Runway
Agentic AI
Acts independently to achieve goals
AutoGPT, Devin
Analytical AI
Analyzes large data
Fraud detection
Predictive AI
Forecasts outcomes
Stock or weather models
Conversational AI
Talks with humans
ChatGPT, Siri
Robotic AI
Moves and interacts physically
Drones, factory robots
Machine Learning teaches AI patterns.
Generative AI teaches AI creativity.
Agentic AI teaches AI action and autonomy.
AI Leaderboard
There are many AI models by different creators, but some are more intelligent than others. This leaderboard ranks AI models based on intelligence score.
Some models are better than others at specific tasks. The following leaderboards compare leading models by
Here’s the current top-5 leaderboard for web development as of Nov 11, 2025.
AI, CPUs, and GPUs
Type
Analogy
Good For
CPU
A brilliant chef cooking one perfect dish at a time
Complex logic, single-thread tasks
GPU
A kitchen with 1,000 chefs making the same dish simultaneously
Massive parallel work (like AI math)
Originally, GPUs were built for graphics — drawing images and 3D scenes in video games.
CPUs (central processing units) are like smart workers — great at doing one task at a time, but carefully.
GPUs are like armies of workers — they can do thousands of small calculations at once.
What would take a CPU weeks to do, a GPU can do in hours.
AI depends heavily on GPUS.
Nvidia, the company that makes most AI GPUs (like the H100 and A100), has become one of the most valuable companies in the world because of AI demand.
Data centers around the world are being built specifically to host GPU farms — giant rooms filled with thousands of GPUs that power AI models.
Cloud providers (like AWS, RunPod, or Google Cloud) rent out GPU power so smaller developers can build and test AI apps without owning hardware.
AI Training VS Inference
There are two main stages of AI:
1. Training (Learning)
This is when the AI learns from data. For example, teaching an AI to recognize cats by showing it millions of cat photos. It requires huge amounts of computation and it needs massive GPU clusters — sometimes thousands of GPUs working together for weeks or months.
Think of it like “going to school.”
2. Inference (Using What It Learned)
Once trained, the AI can now use what it knows — answering questions, generating images, etc. It still uses GPUs, but fewer — since it’s now recalling knowledge rather than learning it.
Think of it like “taking an exam” — it’s using what it learned efficiently.
AI Hallucinations
An AI hallucination is when an AI makes up something false but presents it as true. It happens because AIs don’t know truth — they just predict what sounds right based on pattern-matching. For example,
AI Type
What “Hallucination” Looks Like
LLM (ChatGPT, Claude)
Makes up fake facts, quotes, sources, or people
Image Model (Midjourney, DALL·E)
Adds random visual details that weren’t in the prompt (like extra fingers 👋 or distorted objects)
Video / Audio Models
Create unrealistic motion, or mis-sync voices and faces
Hugging Face
Hugging Face is basically the GitHub of Artificial Intelligence. It’s a giant online platform where people share, explore, and use AI models, datasets, and tools — all in one place. At Hugging Face, you’ll find
Section
What It Offers
Example
Models
Pre-trained AI models (text, image, audio, etc.)
ChatGPT-like LLMs, Stable Diffusion, Whisper
Datasets
Large collections of data used to train AIs
Wikipedia text, image caption sets, code samples
Spaces
Interactive apps people build with AI
You can test image generators, chatbots, translators
Transformers Library
Hugging Face’s open-source code that makes it easy to use models
Used by researchers, developers, and hobbyists everywhere
Imagine you want to build an app that turns spoken words into text, translates it to French, and then summarizes it. On Hugging Face, you can:
Search for a speech-to-text model (like OpenAI’s Whisper).
Add a translation model (like Helsinki-NLP English-to-French).
Plug in a summarizer (like BART or T5).
Run it all with just a few lines of Python code using the transformers library.
You don’t have to train anything from scratch — it’s all there, ready to go.
You can also use models in your own code using Python. For example,
Install the transformers library:
pip install transformers
Load a model (for example, a text generator):
from transformers import pipeline
generator = pipeline("text-generation", model="gpt2")
result = generator("Once upon a time, there was a cat", max_length=30)
print(result[0]["generated_text"])
ComfyUI
ComfyUI is a visual, node-based interface for creating AI images and videos. It’s a drag-and-drop app that lets you build your own AI image or video generator. Instead of typing long code, you build a “workflow” by connecting blocks called nodes. Each node does one thing:
One node loads your model
One node reads your text prompt
One node generates an image
Another node might upscale, add depth, or edit colors
You can run ComfyUI locally on your computer, but you need a lot of space and a powerful computer, preferably with a GPU. These computers are expensive. Alternatively, you can run ComfyUI in a browser at ComfyCloud, RunComfy, and ThinkDiffusion, where you can rent powerful GPUs and pay only for when you’re using the service.
Image and Video AI Playground
You can test some of the leading image and video AI models at RunComfy’s playground.
After testing various AI models to generate images of a character based on one or two reference character images, I have found SeeDream 4.0 to be the best. I’ve even found it to be much better than Google’s Nano Banana for this purpose. Here are some example SeeDream-generated images from two photos of me.
My Reference Photos
Front shotSide shot
Prompts + Generated Images
Note: With consistent character images, you can bring them into Kling AI as references to generate consistent-character videos.
Photorealistic cinematic medium shot of the man skiing, face uncovered, wearing his black hat and sunglasses, snow crystals clinging to his eyebrows and beard, breath visible in the cold air, cheeks flushed from exertion, eyes focused ahead, crisp alpine sunlight illuminating his face — ultra-realistic GoPro portrait with shallow depth of field and cinematic lighting. His entire face is visible.
Wide shot of the same man skiing down a pristine mountain slope, wearing a black baseball cap and semi-transparent gradient sunglasses. The camera faces him from the front, capturing his body leaning into a turn, powder snow flying, blue sky above, cinematic lighting, GoPro ultra-wide perspective.
Cinematic chest-up shot of the man wearing a black baseball cap forward and gradient sunglasses, ocean droplets on his face and shoulders, sunlight reflecting in his lenses, deep blue waves behind him, captured from a GoPro mounted near the surfboard’s nose.
Wide front-side GoPro shot of the man surfing a curling turquoise wave, wearing his black baseball cap and gradient sunglasses, camera facing him from low on the wave surface, dynamic water spray and cinematic sunlight.
Chest-up shot of the man free-falling through a bright blue sky, wearing a black baseball cap forward and gradient sunglasses, wind sweeping his jacket and straps, sunlight reflecting off his glasses, clouds far below, cinematic color and realism.
Wide cinematic shot of the man skydiving, body stretched out and stable, camera facing him from the front at mid-distance, black cap and sunglasses clearly visible, clouds and the earth below in vast perspective, soft sun haze.
Chest-up shot of the man riding a sports motorcycle, wearing a black baseball cap facing forward and gradient sunglasses, wind flowing through his clothes, warm golden-hour lighting, GoPro mounted near the handlebars, cinematic clarity.
Wide front-side GoPro shot of the man on a motorcycle, black cap and gradient sunglasses visible, leaning slightly into a curve, mountain road ahead, trees and sunlight framing the motion, cinematic outdoor tone.
Wide underwater GoPro shot of the man swimming near a massive whale, camera facing him from the front, black swim cap and swimming goggles visible, sunlight streaming through the blue water, cinematic and peaceful tone.
Cinematic chest-up side-profile shot of a man in a helicopter cockpit, camera positioned slightly in front and to the left of him, showing the cockpit controls in front of him and his face in profile as he pilots, wearing a black baseball cap facing forward and semi-transparent gradient sunglasses, sunlight streaming through the windshield, clouds ahead, highly realistic lighting and detail.
Wide front-side shot from outside the helicopter, camera hovering just ahead of the cockpit, showing the man’s face through the glass, wearing a black baseball cap and gradient sunglasses, mountain clouds around the aircraft, cinematic aerial composition.
Cinematic chest-up side-angle shot of a man actively climbing a rugged rock wall, camera positioned slightly below and to his right, angled upward toward his face. He’s wearing a black baseball cap facing forward, semi-transparent gradient sunglasses, and mountain-climbing gear with a harness, rope, and gloves. His left hand grips a rock hold firmly while his right hand reaches upward, sunlight illuminating his face and the textured stone surface. Sharp, realistic detail with warm mountain tones.
Wide front-side GoPro shot showing the man mid-climb on a tall cliff face, wearing a black baseball cap, gradient sunglasses, and full climbing gear. Camera positioned several meters away at a diagonal angle showing his full body against a vast mountain backdrop. His left leg is bent, right leg extended, and the valley below stretches into hazy sunlight. Cinematic scale and clarity.
Chest-up shot from a GoPro mounted near the motorcycle’s front handlebars, camera angled slightly upward toward the man’s face as he rides. He’s wearing a black baseball cap facing forward, gradient sunglasses, and gloves. Dust trails swirl behind him under golden desert sunlight, with blurred dunes in the background. Cinematic and ultra-realistic motion energy.
Wide front-side shot of the man speeding across desert dunes on a dirt bike, camera hovering ahead and slightly to the right of the vehicle. He wears a black cap, gradient sunglasses, and gloves. Sand sprays from his back tire as he leans into a curve, sunlight casting long shadows across rippling dunes. Cinematic wide desert action.
Chest-up shot from a GoPro mounted on the kayak’s bow, camera angled back toward the man’s upper body and face. He’s wearing a black baseball cap forward, gradient sunglasses, a water-resistant jacket, and a life vest. His arms are mid-stroke with a double-bladed paddle, water splashing upward, sunlight sparkling on droplets, dynamic river motion, cinematic realism.
Wide front-side GoPro shot from low on the river surface, camera facing the man as he navigates a turbulent section of rapids. He’s wearing his black cap and gradient sunglasses, gripping the paddle firmly, kayak tilted slightly as water churns around him, cinematic sunlight filtering through mist.
Chest-up shot from a GoPro mounted on the paraglider harness bar, camera angled slightly upward toward the man’s face. He’s wearing a black baseball cap facing forward, gradient sunglasses, a light windbreaker, and safety harness straps. Clouds drift behind him, sunlight flaring through, mountain peaks faintly visible below, cinematic aerial serenity.
Drone shot showing the man from the chest up as he hikes through a dense green forest. The camera hovers at eye level slightly ahead of him, capturing his focused face under a black baseball cap and gradient sunglasses. Sunlight filters through tree leaves, creating a shadow across his jacket and backpack straps, cinematic natural lighting.
Wide drone shot slightly elevated in front of the man as he walks between tall pine trees. His cap and sunglasses reflect the early morning light while his backpack swings gently with his steps. Mist floats through the shafts of sunlight, cinematic forest depth.
Drone hovering close to the cliff edge on the side of the man, capturing his profile from the chest up as he stands poised ready to jump from above the turquoise ocean below. His black cap and gradient sunglasses face forward. Behind him, the vast sea glimmers beneath a bright sky, cinematic tension and anticipation.
Wide drone shot halfway between the top of a cliff and the ocean below showing the side/profile of the man mid-air after jumping from the cliff, face down, feet up, and body stretched toward the turquoise ocean water below. His face and cap visible from the side, cinematic natural lighting and energy.
Drone hovering ahead of the man, showing him from the chest up as he snowboards down a slope. He wears a black cap, gradient sunglasses, insulated jacket, and gloves. Snow trails burst behind him as he cuts through the powder under a bright winter sky, cinematic frozen landscape.
Drone shot showing the man from the chest up as he wakeboards across a shimmering blue lake. The drone hovers ahead and slightly above, facing him directly. He wears a black cap, gradient sunglasses, and life vest, holding the tow rope tightly, droplets splashing midair, cinematic high-speed energy.
Wide drone shot showing the man wakeboarding across open water, his body leaning slightly as he balances on the board. The drone moves ahead and to the side, keeping his face visible as water trails behind him in sparkling arcs, cinematic energy and clarity. The drone is not visible in the image. He wears a black cap, gradient sunglasses, and life vest.
Drone hovering ahead of the man at chest height as he runs across a wide desert plain. His black baseball cap and gradient sunglasses stand out against the golden dunes. He wears a sleeveless athletic shirt, sweat and dust on his skin, sunlight glowing in orange hues, cinematic desert light.
Drone hovering ahead of the man at chest height as he pedals up a rugged mountain trail, black baseball cap facing forward and gradient sunglasses catching the sunlight. His shirt and arms are dusted from the trail, and trees blur behind him as sunlight filters through pine branches, cinematic outdoor realism.
Wide drone shot from a front-side angle showing the man biking along a narrow ridge path surrounded by green valleys below. His black cap and sunglasses glint in the sun as the trail twists ahead, cinematic adventure composition.
Drone hovering ahead of the man, chest-up shot showing him ziplining through a lush tropical jungle. He wears a black baseball cap and gradient sunglasses, his shirt rippling from the wind, treetops blurring behind him, cinematic motion and sunlight streaming through green canopy.
Wide drone shot capturing the man ziplining high over dense jungle canopy, seen from a front angle that shows his face. His cap and sunglasses are visible, body extended midair, forest stretching below in cinematic scale. The drone is not visible in the image.
Drone hovering just above the sailboat deck, chest-up view of the man at the helm steering through calm blue water. He wears a black baseball cap and gradient sunglasses, soft wind rippling his shirt, ocean horizon stretching behind him, cinematic ocean serenity.
Cinematic chest-up shot of the man standing inside a colorful hot air balloon basket. The camera is positioned level with his chest, facing his face at a 45-degree angle from the front as the balloon floats above scattered clouds. He wears a black baseball cap and gradient sunglasses, his face softly illuminated by golden morning light. The colorful canopy rises above him, adding visual warmth and adventure.
Cinematic chest-up shot of a man standing on the deck of a fishing boat, holding a fishing rod angled toward the sea. The camera is positioned at chest height, facing him slightly from the side so that both his expression and the ocean behind him are visible. He wears a black baseball cap and gradient sunglasses, wind brushing against his shirt, sunlight reflecting on the waves, cinematic natural realism.
Cinematic chest-up shot of a man riding a horse across an open golden plain at sunset. The camera faces him slightly from the front at chest level. He wears a black baseball cap forward, gradient sunglasses, and short hair (no long hair at the back). The sunlight warms his face, cinematic natural beauty and motion.
Wide cinematic shot showing the man on horseback galloping across the open field, the horizon glowing orange behind him. His face is visible under a black cap and sunglasses, short hair beneath. Dust and grass swirl around, cinematic sweeping grandeur.
Cinematic chest-up shot of the man driving a vintage convertible along a coastal highway. The camera faces him from the passenger side, sunlight reflecting off his gradient sunglasses. He wears a black baseball cap forward and short hair, no long hair at the back. The ocean glimmers beyond the road, cinematic freedom and motion.
Cinematic chest-up shot of a man sandboarding down a tall golden dune, even though only his upper body is visible. The camera faces him from the front at chest level, capturing grains of sand flying around. He wears a black baseball cap forward, gradient sunglasses, and has short hair. His shirt ripples in the desert wind under warm sunset light, cinematic thrill and freedom.
Wide cinematic shot of the man carving across a massive dune while standing on a sandboard, his body angled into the slope as sand billows behind. The camera faces him from the front-side, 45 degrees from the front of his face, showing his all-black cap, and sunglasses illuminated by the sunset glow, cinematic desert scale. He has no hair at the back of his head.
Cinematic chest-up shot of a man trekking a rugged volcanic ridge, visible from chest up with his hiking backpack straps showing. The camera faces him slightly from below. He wears a black baseball cap, gradient sunglasses, and a determined look. Faint volcanic steam rises behind him, cinematic endurance and heat.
Wide cinematic shot from the side of the man walking along the sharp crest of a volcanic ridge. His face is visible under the black cap and sunglasses, catching the backlight, cinematic scale and strength.
Wide cinematic shot of a dune buggy going up the side of a steep sand dune, camera facing the front corner of the vehicle. The man’s face is visible through the open window, cap and sunglasses reflecting sunlight, cinematic rugged adventure.
Wide cinematic shot of the man crossing the middle of a long bridge stretched over a deep canyon, facing forward. His cap and sunglasses glint in sunlight, cinematic height and bravery.
Cinematic chest-up shot of a man riding a jet ski across turquoise tropical water, visible from the waist up with the handlebar in frame. He wears a black baseball cap, gradient sunglasses, and a determined smile as spray splashes around, cinematic speed and excitement.
Wide cinematic side/profile shot of the man cutting across tropical turquoise water on his jet ski, small islands in the background. His face remains visible as sunlight glints off his sunglasses, cinematic power and summer vibrance.
Cinematic chest-up underwater shot of a man swimming alongside dolphins, visible from the shoulders up. He wears a black swimming cap, gradient sunglasses, and a joyful expression. Dolphins swim beside him as sunlight patterns ripple across his face, cinematic harmony and wonder.
Cinematic chest-up shot of a man standing on a paddleboard on a still mountain lake, visible from chest to hat top. He’s holding a paddle. He wears a life vest, black baseball cap, gradient sunglasses, and calm focus, reflections of clouds in the water, cinematic serenity.
Wide cinematic side shot showing the man paddleboarding toward distant mountains on mirrored water. The entire paddleboard and the man standing on it are visible in the side profile shot. His face is visible as he looks to the camera, cap and sunglasses glinting in the sunlight, cinematic peaceful scale.
Cinematic chest-up shot of a man wearing a wingsuit mid-flight, camera facing him slightly from below to show air rushing around. He wears a black baseball cap, gradient sunglasses, with parts of the red wingsuit visible at his shoulders. Vast cliffs blur behind, cinematic thrill and velocity.
Cinematic chest-up shot of a man riding a camel across golden desert dunes, only his upper body and part of the camel’s saddle visible. He wears a black baseball cap, gradient sunglasses, and relaxed posture under warm sunlight. The horizon glows orange, cinematic calm and travel spirit.
Cinematic chest-up shot of a man ascending Mount Everest in climbing gear, rope visible over his shoulder, breath misting in the cold air. He wears a black beanie, gradient sunglasses, expression focused with snowflakes clinging to his jacket. Harsh sunlight reflects off snow, cinematic struggle and determination. He is not wearing a baseball cap.
Wide cinematic shot of the man scaling an icy slope near Everest’s peak, rope anchored below, prayer flags fluttering in the distance. His face is visible. He is wearing climbing gear, rope visible over his shoulder, thick gloves, a black beanie and sunglasses, cinematic grandeur and persistence. He is not wearing a baseball cap.
Cinematic chest-up shot of a man floating in space tethered to the International Space Station, wearing a detailed astronaut suit with a large glass helmet revealing his full face, he’s wearing a thin black beanie covering his hair and black sunglasses with a semi-transparent gradient lens. Earth glows blue below, cinematic awe and isolation. He is not wearing a baseball cap.
Cinematic chest-up shot of a man seated in a luxurious first-class cabin of an A380 commercial jet, eating a gourmet meal beside the window. He wears a black baseball cap, gradient sunglasses, short hair, relaxed smile, soft daylight illuminating his face and tableware. Background shows blurred clouds outside, cinematic comfort and refinement.