Generate smooth videos from a start and end frame using Wan 2.1 FLF2V.

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

Trellis | Image to 3D

Trellis is an advanced Image-to-3D model for high-quality 3D assets generation.

FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

ComfyUI > Nodes > CosyVoice-ComfyUI > CosyVoiceDubbingNode

ComfyUI Node: CosyVoiceDubbingNode

Class Name

CosyVoiceDubbingNode

Category
AIFSH_CosyVoice

Author
AIFSH (Account age: 516days) Extension
CosyVoice-ComfyUI Latest Updated
2024-09-10 Github Stars
0.25K

Github Ask AIFSH Current Questions Past Questions

Table of Content

Description
CosyVoiceDubbingNode:
CosyVoiceDubbingNode Input Parameters:
CosyVoiceDubbingNode Output Parameters:
CosyVoiceDubbingNode Usage Tips:
CosyVoiceDubbingNode Common Errors and Solutions:
Related Nodes

How to Install CosyVoice-ComfyUI

Install this extension via the ComfyUI Manager by searching for CosyVoice-ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter CosyVoice-ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

CosyVoiceDubbingNode Description

Facilitates voice dubbing with advanced TTS technology, supports multiple languages and inference modes for AI artists.

CosyVoiceDubbingNode:

The CosyVoiceDubbingNode is designed to facilitate the process of voice dubbing by leveraging advanced text-to-speech (TTS) technology. This node allows you to input text and audio prompts to generate high-quality speech outputs in various languages. It supports multiple inference modes, including zero-shot, cross-lingual, and instruct-based inference, making it versatile for different dubbing scenarios. The node is particularly beneficial for AI artists looking to create multilingual voiceovers, as it can seamlessly switch between languages and adapt to different speech styles. By using this node, you can achieve natural and expressive voice dubbing, enhancing the overall quality and authenticity of your audio projects.

CosyVoiceDubbingNode Input Parameters:

tts_srt

This parameter accepts an SRT file, which is a standard subtitle format containing the text and timing information for the speech. The SRT file guides the node on what text to convert to speech and when, ensuring that the generated audio aligns perfectly with the intended timing. This is crucial for synchronizing the dubbed voice with visual content.

prompt_wav

This parameter takes an audio file in WAV format, which serves as a prompt for the TTS model. The prompt helps the model understand the desired voice characteristics, such as tone, pitch, and speaking style. By providing a sample of the target voice, you can achieve more accurate and personalized dubbing results.

language

This parameter allows you to select the language for the generated speech. The available options are <|zh|>, <|en|>, <|jp|>, <|yue|>, and <|ko|>. Choosing the correct language ensures that the TTS model uses the appropriate phonetic and linguistic rules, resulting in more natural and intelligible speech.

CosyVoiceDubbingNode Output Parameters:

audio

The output parameter is a dictionary containing the generated audio waveform and the sample rate. The waveform is a tensor representing the audio signal, and the sample rate indicates the number of samples per second. This output can be directly used in audio editing software or further processed for various applications. The high-quality audio output ensures that the dubbed voice is clear and professional.

CosyVoiceDubbingNode Usage Tips:

Ensure that your SRT file is accurately timed and contains the correct text to achieve precise synchronization between the audio and visual content.
Use a high-quality WAV file as the prompt to provide the TTS model with a clear example of the desired voice characteristics.
Experiment with different languages and inference modes to find the best settings for your specific dubbing project.

CosyVoiceDubbingNode Common Errors and Solutions:

"Invalid SRT file format"

Explanation: The provided SRT file does not conform to the standard subtitle format.
Solution: Verify that your SRT file is correctly formatted with proper timing and text entries.

"Unsupported audio format"

Explanation: The provided audio file is not in WAV format.
Solution: Convert your audio file to WAV format before using it as a prompt.

"Language not supported"

Explanation: The selected language is not among the supported options.
Solution: Choose one of the supported languages: <|zh|>, <|en|>, <|jp|>, <|yue|>, or <|ko|>.

"Inference mode not recognized"

Explanation: The specified inference mode is not valid.
Solution: Ensure that you are using one of the supported inference modes: zero-shot, cross-lingual, or instruct-based inference.

CosyVoiceDubbingNode Related Nodes

Go back to the extension to check out more related nodes.

CosyVoice-ComfyUI

Table of Content

Description
CosyVoiceDubbingNode:
CosyVoiceDubbingNode Input Parameters:
CosyVoiceDubbingNode Output Parameters:
CosyVoiceDubbingNode Usage Tips:
CosyVoiceDubbingNode Common Errors and Solutions:
Related Nodes

FLUX IPAdapter V2 | XLabs

Explore XLabs FLUX IPAdapter V2 model compared to V1 for your creative goals.

Uni3C Video-Referenced Camera & Motion Transfer

Extract camera movements and human motions from reference videos for professional video generation

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.