The Hallo2 technique was developed by Jiahao Cui, Hui Li, Yao Yao, Hao Zhu, Hanlin Shang, Kaihui Cheng, Hang Zhou, Siyu Zhu, and Jingdong Wang from Fudan University and Baidu Inc. For more information, visit Hallo2 GitHub. ComfyUI_Hallo2 nodes and workflow was developed by smthemex. For more details, visit ComfyUI_Hallo2 GitHub. All credits to their contributions.
Hallo2 is a cutting-edge model for generating high-quality, long-duration, 4K resolution audio-driven portrait animation videos. It builds upon the original Hallo model with several key improvements:
Hallo2 achieves this by using advanced techniques like data augmentation to maintain consistency over long durations, vector quantization of latent codes for 4K resolution, and an improved denoising process guided by both audio and text.
Hallo2 combines several advanced AI models and techniques to create its high-quality portrait videos:
So in summary - Hallo2 takes in audio and a portrait image, has an AI "agent" that sculpts video frames to match them while staying true to the original portrait, and employs some extra tricks to keep everything synced and coherent even in long videos. All of these parts work together in a multi-step pipeline to produce the impressive results you see.
Hallo2 has been integrated into ComfyUI via a custom workflow with several specialized nodes. Here's how to use it:
LoadImage node. This should be a clear front-facing portrait. (Tips: The better framed and lit your reference portrait is, the better the results will be. Avoid side profiles, occlusions, busy backgrounds etc.)LoadAudio node. It should match the mood you want the portrait to emote.HalloPreImgAndAudio node. This preprocesses the image and audio into embeddings. Key parameters:
audio_separator: Model for separating speech from background noise. Generally leave at default.face_expand_ratio: How much to expand the detected face region by. Higher values include more of the hair/background.width/height: Generation resolution. Higher values are slower but more detailed. 512-1024 square is a good balance.fps: Target video FPS. 25 is a good default.HalloLoader node. Point it to your Hallo2 checkpoint, VAE, and motion module files.HalloSampler node. This performs the actual video generation. Key parameters:
seed: Random seed which determines minor details. Change it if you don't like the first result.pose_scale/face_scale/lip_scale: How much to scale the intensity of pose, facial expression, and lip movements. 1.0 = full intensity, 0.0 = frozen.cfg: Classifier-free guidance scale. Higher = follows conditioning more closely but is less diverse.steps: Number of denoising steps. More steps = better quality but slower.HallosUpscaleloader and HallosVideoUpscale nodes to the end of the chain. The upscale loader reads in a pretrained upscaling model, while the upscaler node actually performs the upscaling to 4K.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.