ComfyUI > Workflows > MMAudio | Video-to-Audio

MMAudio | Video-to-Audio

MMAudio generates synchronized audio from video and text inputs with unmatched precision. Using multimodal joint training, it adapts to diverse audio-visual and audio-text datasets seamlessly. Its advanced synchronization module ensures perfect alignment, transforming audio creation for modern content needs.

The ComfyUI-MMAudio nodes and its associated workflow are fully developed by Kijai. We give all due credit to Kijai for this innovative work. On the RunComfy platform, we are simply presenting Kijai’s contributions to the community. It is important to note that there is currently no formal connection or partnership between RunComfy and Kijai. We deeply appreciate Kijai’s work!

ComfyUI MMAudio Workflow

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI MMAudio Examples

ComfyUI MMAudio Description

The nodes and its associated workflow are fully developed by Kijai. We give all due credit to Kijai for this innovative work. On the RunComfy platform, we are simply presenting Kijai’s contributions to the community. It is important to note that there is currently no formal connection or partnership between RunComfy and Kijai. We deeply appreciate Kijai’s work!

MMAudio

MMAudio is a powerful tool for creating synchronized audio from video and text inputs. It utilizes multimodal joint training to learn from diverse audio-visual and audio-text datasets, ensuring exceptional adaptability. With its advanced synchronization module, it perfectly aligns audio to video frames. MMAudio revolutionizes audio generation, streamlining the process for creators and innovators alike.

1.1 How to Use MMAudio Workflow?

MMAudio

This is the MMAudio workflow, Left Side nodes are inputs for uploading video, Middle is processing MMAudio nodes, and right is the outputs node.

Upload your Video in input nodes.
Write your audio generation prompts.
Click Render !!!

1.2 Video Input

MMAudio

Click and Upload your Reference Video.

The video is set to downscale the video to ?*512 resolution as processing HD Video or longer video may run of out memory.

1.3 MMAudio Processing

MMAudio

Positive: Enter the video generation prompts for the audio.
Negative: Enter what you don't want to hear.
Steps : More steps may improve audio quality.

1.4 MMAudio Models

MMAudio

These are the model downloader nodes, it will automatically download models in your comfyui in 2-3 mins.

MMAudio Models : https://github.com/hkchengrex/MMAudio

With its innovative multimodal training and precise synchronization, MMAudio sets a new standard in audio generation. Whether you're crafting videos, animations, or immersive experiences, MMAudio empowers creators with seamless, high-quality audio. Elevate your projects and bring your ideas to life with MMAudio.

Want More ComfyUI Workflows?

AnimateDiff + Batch Prompt Schedule | Text to Video

Batch Prompt schedule with AnimateDiff offers precise control over narrative and visuals in animation creation.

Stable Fast 3D | ComfyUI 3D Pack

Create stunning 3D content with Stable Fast 3D and ComfyUI 3D Pack.

ComfyUI Img2Vid | Morphing Animation

Morphing animation with AnimateDiff LCM, IPAdapter, QRCode ControlNet, and Custom Mask modules.

AnimateDiff + ControlNet + IPAdapter V1 | Cartoon Style

Convert the original video into the desired animation by using only a few images to define the preferred style.

Pyramid Flow | Video Generation

Including both text-to-video and image-to-video mode.

LayerDiffuse | Text to Transparent Image

Use LayerDiffuse to generate transparent images or blend backgrounds and foregrounds with one another.

Wan 2.1 Control LoRAs | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Mochi Edit UnSampling | Video-to-Video

Mochi Edit: Modify Videos Using Text-Based Prompts and Unsampling.