ComfyUI > Workflows > ComfyUI Sonic | Lip-Sync Portrait Animation

ComfyUI Sonic | Lip-Sync Portrait Animation

ComfyUI Sonic revolutionizes portrait animation by leveraging global audio perception for smoother, more expressive facial movements. By capturing the full audio context, ComfyUI Sonic ensures lifelike, emotionally resonant animations that go beyond phoneme-based methods. Experience the next generation of portrait animation with ComfyUI Sonic.

The ComfyUI Sonic nodes and its associated workflow are fully developed by smthemex. We give all due credit to smthemex for this innovative work. On the RunComfy platform, we are simply presenting smthemex's contributions to the community. It is important to note that there is currently no formal connection or partnership between RunComfy and smthemex. We deeply appreciate smthemex's work!

ComfyUI ComfyUI Sonic Workflow

ComfyUI Sonic | Advanced Lip-Sync Portrait Animation Framework

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI ComfyUI Sonic Examples

ComfyUI ComfyUI Sonic Description

ComfyUI Sonic redefines portrait animation by harnessing global audio perception for ultra-realistic facial movements and expressions. Unlike traditional methods, it captures the full context of speech—beyond phonemes—to generate fluid, emotionally rich animations. With cutting-edge AI technology, Sonic ensures seamless sync between voice and visuals, bringing characters to life with unmatched realism. Elevate your animations with ComfyUI Sonic and make every expression feel truly alive.

1.1 How to Use ComfyUI Sonic Workflow?

Sonic

Left nodes are your inputs for Audio and Avatar Image. Middle one is the Sonic Processing Node. Right side is the video combine node for outputting video.

Follow these Steps:

Input your Avatar Image which will be used to visualize the dialogues from the audio.
Input your Audio for generating an audio-driven voice-over of the inserted image.
Click Queue Prompt!!

Done! Your rendered video will be stored in the Outputs folder.

Strengths and Weaknesses of ComfyUI Sonic:

Strengths:

Sonic generates highly realistic and expressive portrait animations driven by audio.
Sonic uses SVD, so there is no flickering between frames.
Consistency is better than previously released audio2video models.

Weaknesses:

As Sonic uses SVD, far or full body shots may struggle with projecting vocals on the face properly.
Side view faces, or faces at complex angles might give distorted results.

1.2 ComfyUI Sonic Audio and Video Input

Sonic

Upload your Audio in the load audio node (Dialogues or Vocals)
Upload your image in the Load image node (A close-up or medium shot of a person)

1.3 ComfyUI Sonic Processing Node

Sonic

ComfyUI Sonic uses SVD Model under the hood for processing, so the results and settings are according to the SVD model. These settings are set to optimum; there's no necessity to change them.

Keep min resolution near 768 or under if there are artifacts like morphing or distorted hands.

ComfyUI Sonic transforms portrait animation by focusing on global audio perception for seamless, lifelike expressions. By capturing the full depth of speech, it creates animations that feel natural, emotive, and engaging. Whether for storytelling, virtual avatars, or content creation, ComfyUI Sonic delivers unmatched realism. Step into the future of animation with Sonic—where every word comes to life.