The Hallo2 technique was developed by Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, and Jingdong Wang from Fudan University and Baidu Inc. For more information, visit . ComfyUI_Hallo2 nodes and workflow was developed by smthemex. For more details, visit . All credits to their contributions.
Hallo2 is a cutting-edge model for generating high-quality, long-duration, 4K resolution audio-driven portrait animation videos. It builds upon the original Hallo model with several key improvements:
Hallo2 achieves this by using advanced techniques like data augmentation to maintain consistency over long durations, vector quantization of latent codes for 4K resolution, and an improved denoising process guided by both audio and text.
Hallo2 combines several advanced AI models and techniques to create its high-quality portrait videos:
So in summary - Hallo2 takes in audio and a portrait image, has an AI "agent" that sculpts video frames to match them while staying true to the original portrait, and employs some extra tricks to keep everything synced and coherent even in long videos. All of these parts work together in a multi-step pipeline to produce the impressive results you see.
Hallo2 has been integrated into ComfyUI via a custom workflow with several specialized nodes. Here's how to use it:
LoadImage
node. This should be a clear front-facing portrait. (Tips: The better framed and lit your reference portrait is, the better the results will be. Avoid side profiles, occlusions, busy backgrounds etc.)LoadAudio
node. It should match the mood you want the portrait to emote.HalloPreImgAndAudio
node. This preprocesses the image and audio into embeddings. Key parameters:
audio_separator
: Model for separating speech from background noise. Generally leave at default.face_expand_ratio
: How much to expand the detected face region by. Higher values include more of the hair/background.width
/height
: Generation resolution. Higher values are slower but more detailed. 512-1024 square is a good balance.fps
: Target video FPS. 25 is a good default.HalloLoader
node. Point it to your Hallo2 checkpoint, VAE, and motion module files.HalloSampler
node. This performs the actual video generation. Key parameters:
seed
: Random seed which determines minor details. Change it if you don't like the first result.pose_scale
/face_scale
/lip_scale
: How much to scale the intensity of pose, facial expression, and lip movements. 1.0 = full intensity, 0.0 = frozen.cfg
: Classifier-free guidance scale. Higher = follows conditioning more closely but is less diverse.steps
: Number of denoising steps. More steps = better quality but slower.HallosUpscaleloader
and HallosVideoUpscale
nodes to the end of the chain. The upscale loader reads in a pretrained upscaling model, while the upscaler node actually performs the upscaling to 4K.© Copyright 2024 RunComfy. All Rights Reserved.