Visit ComfyUI Online for ready-to-use ComfyUI environment
Generate lifelike speech from text using pre-trained voice models for realistic audio outputs.
The CosyVoiceSFTNode is designed to facilitate the generation of speech from text using pre-trained voice models. This node leverages sophisticated speech synthesis technology to produce high-quality audio outputs that mimic the natural characteristics of human speech. It is particularly beneficial for applications requiring consistent and realistic voice outputs, such as virtual assistants, audiobooks, and other multimedia content. By utilizing pre-trained models, the node ensures that the generated speech is both accurate and expressive, capturing the nuances of different languages and speaker styles. This node is an essential tool for creators looking to integrate lifelike voice synthesis into their projects without the need for extensive training data or complex setup processes.
This parameter represents the text that you want to convert into speech. It is a string input that serves as the primary content for the speech synthesis process. The quality and clarity of the generated audio depend significantly on the text provided, as it forms the basis of the spoken output. There are no specific minimum or maximum values for this parameter, but the text should be coherent and grammatically correct to ensure the best results.
The speaker_name
parameter allows you to select the voice model that will be used to generate the speech. It offers a list of pre-trained speaker models, including options like 中文女
, 中文男
, 日语男
, 粤语女
, 英文女
, 英文男
, and 韩语女
. The default value is 中文女
. This parameter is crucial for tailoring the voice output to match the desired language and gender characteristics, providing flexibility in voice selection to suit different project needs.
This parameter controls the speed at which the generated speech is delivered. It is a float value with a default setting of 1.0, which represents the normal speaking rate. Adjusting this parameter allows you to speed up or slow down the speech, providing control over the pacing of the audio output. This can be particularly useful for matching the speech rate to specific content requirements or audience preferences.
The seed
parameter is an integer that sets the random seed for the speech synthesis process. Its default value is 42. By setting a specific seed, you can ensure that the speech generation process is deterministic, meaning that the same input will consistently produce the same output. This is useful for maintaining consistency across multiple runs or when fine-tuning the output for specific applications.
This boolean parameter determines whether the speech synthesis should utilize a 25Hz sampling rate. The default value is False
, meaning that the standard sampling rate is used unless specified otherwise. Enabling this option can be beneficial for certain applications that require a lower sampling rate, potentially reducing file size or meeting specific technical requirements.
The output of the CosyVoiceSFTNode is an audio file that contains the synthesized speech. This audio output is the result of converting the input text into spoken words using the selected speaker model and specified parameters. The quality of the audio is designed to be high, capturing the nuances of human speech and providing a realistic listening experience. This output can be used directly in various applications, such as multimedia projects, virtual assistants, or any other context where synthesized speech is required.
tts_text
is clear and grammatically correct to achieve the best audio quality.speaker_name
options to find the most suitable voice for your project.speed
parameter to match the desired pacing of your audio output, especially if the content requires a specific delivery speed.seed
parameter to maintain consistency across multiple runs, ensuring that the same input consistently produces the same output.tts_text
input before running the node.speaker_name
does not match any of the available pre-trained models.speaker_name
is correctly spelled and matches one of the available options: 中文女
, 中文男
, 日语男
, 粤语女
, 英文女
, 英文男
, 韩语女
.© Copyright 2024 RunComfy. All Rights Reserved.