Visit ComfyUI Online for ready-to-use ComfyUI environment
Sophisticated node for high-quality text-to-speech audio generation using advanced machine learning models in ComfyUI framework.
MegaTTS3Run is a sophisticated node designed to facilitate the generation of high-quality audio from text inputs using the MegaTTS3 model. This node is part of the ComfyUI framework and is specifically tailored for text-to-speech (TTS) applications. It leverages advanced machine learning models to convert written text into natural-sounding speech, supporting multiple languages such as English and Chinese. The node is equipped with features that allow for precise control over the speech synthesis process, including the ability to adjust parameters like time steps and weightings for phoneme and tone predictions. By utilizing this node, you can create realistic audio outputs that can be used in various applications, from virtual assistants to multimedia content creation.
The speaker
parameter specifies the voice model to be used for generating the audio. It is a string that corresponds to a specific speaker's voice profile stored in the system. This parameter is crucial as it determines the vocal characteristics of the generated speech, such as pitch, tone, and accent. There are no explicit minimum or maximum values, but it must match a valid speaker profile available in the system.
The text
parameter is the input string that you want to convert into speech. It is a required parameter and should be a well-formed sentence or phrase in the language specified by the text_language
parameter. The quality and clarity of the generated audio depend significantly on the input text's structure and content.
The text_language
parameter defines the language of the input text. It accepts two options: "en" for English and "zh" for Chinese, with "zh" being the default. This parameter ensures that the text is processed correctly according to the linguistic rules of the specified language, affecting pronunciation and intonation.
The time_step
parameter is an integer that controls the granularity of the speech synthesis process. It has a default value of 32 and a minimum value of 1. Adjusting this parameter can influence the smoothness and speed of the generated audio, with higher values potentially leading to more detailed and nuanced speech.
The p_w
parameter is a floating-point value that adjusts the weight of phoneme predictions during the synthesis process. It has a default value of 1.6 and a minimum value of 0.1. This parameter allows you to fine-tune the emphasis on phonetic accuracy, which can enhance the clarity and naturalness of the speech.
The t_w
parameter is a floating-point value that modifies the weight of tone predictions. It has a default value of 2.5 and a minimum value of 0.1. By adjusting this parameter, you can control the tonal quality of the speech, which is particularly important for tonal languages like Chinese.
The unload_model
parameter is a boolean that determines whether the model should be unloaded from memory after processing. It defaults to False
. Setting this to True
can help manage system resources, especially when running multiple instances or when memory usage is a concern.
The audio
output parameter provides the generated audio waveform as a result of the text-to-speech conversion. This output includes both the waveform data and the sample rate, allowing you to use the audio in various applications. The quality of the audio is influenced by the input parameters and the underlying model's capabilities, offering a realistic and natural-sounding speech output.
speaker
parameter matches a valid speaker profile to achieve the desired vocal characteristics in the output audio.time_step
, p_w
, and t_w
parameters to find the optimal balance between speed, clarity, and naturalness for your specific application.unload_model
parameter to manage system resources effectively, especially when working with large datasets or running multiple processes.speaker
parameter is set to a valid and existing speaker profile.text
parameter is a non-empty string and is properly formatted.unload_model
to True
to free up memory after processing.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.