Visit ComfyUI Online for ready-to-use ComfyUI environment
Facilitates voice dubbing with advanced TTS technology, supports multiple languages and inference modes for AI artists.
The CosyVoiceDubbingNode is designed to facilitate the process of voice dubbing by leveraging advanced text-to-speech (TTS) technology. This node allows you to input text and audio prompts to generate high-quality speech outputs in various languages. It supports multiple inference modes, including zero-shot, cross-lingual, and instruct-based inference, making it versatile for different dubbing scenarios. The node is particularly beneficial for AI artists looking to create multilingual voiceovers, as it can seamlessly switch between languages and adapt to different speech styles. By using this node, you can achieve natural and expressive voice dubbing, enhancing the overall quality and authenticity of your audio projects.
This parameter accepts an SRT file, which is a standard subtitle format containing the text and timing information for the speech. The SRT file guides the node on what text to convert to speech and when, ensuring that the generated audio aligns perfectly with the intended timing. This is crucial for synchronizing the dubbed voice with visual content.
This parameter takes an audio file in WAV format, which serves as a prompt for the TTS model. The prompt helps the model understand the desired voice characteristics, such as tone, pitch, and speaking style. By providing a sample of the target voice, you can achieve more accurate and personalized dubbing results.
This parameter allows you to select the language for the generated speech. The available options are <|zh|>
, <|en|>
, <|jp|>
, <|yue|>
, and <|ko|>
. Choosing the correct language ensures that the TTS model uses the appropriate phonetic and linguistic rules, resulting in more natural and intelligible speech.
The output parameter is a dictionary containing the generated audio waveform and the sample rate. The waveform is a tensor representing the audio signal, and the sample rate indicates the number of samples per second. This output can be directly used in audio editing software or further processed for various applications. The high-quality audio output ensures that the dubbed voice is clear and professional.
<|zh|>
, <|en|>
, <|jp|>
, <|yue|>
, or <|ko|>
.© Copyright 2024 RunComfy. All Rights Reserved.