Visit ComfyUI Online for ready-to-use ComfyUI environment
Powerful text-to-speech node with diverse voice options, leveraging Kokoro ONNX model for high-quality audio outputs.
Kokoro TTS is a powerful text-to-speech node designed to convert written text into spoken audio using a variety of voice options. This node leverages the Kokoro ONNX model to generate high-quality audio outputs, making it an excellent tool for AI artists and developers who want to add voice capabilities to their projects. The node is capable of handling different speakers, allowing for a diverse range of vocal outputs. It is particularly beneficial for creating voiceovers, narrations, or any application where converting text to speech is required. The node ensures ease of use by providing a straightforward interface for inputting text and selecting a speaker, while handling the complex processing in the background.
The text
parameter is a string input that represents the content you want to convert into speech. It supports multiline text, allowing you to input longer passages or scripts. The default value is a promotional message for BS Labs' YouTube channel, but you can replace it with any text you wish to vocalize. This parameter is crucial as it directly influences the audio output, with the content of the text being transformed into spoken words.
The speaker
parameter allows you to select the voice that will be used to generate the speech. It offers a variety of options, including voices like "af_sarah," "am_adam," and "bf_emma," among others. The default speaker is "af_sarah." This parameter is important because it determines the vocal characteristics of the output, such as tone, pitch, and accent, enabling you to tailor the audio to fit specific needs or preferences.
The audio
output parameter is a dictionary containing the generated audio waveform and its sample rate. The waveform is a tensor that represents the audio signal, formatted to be compatible with ComfyUI's audio output requirements. The sample rate indicates the number of samples per second in the audio, which is essential for playback quality. This output is crucial as it provides the final audio product that can be used in various applications, from multimedia projects to interactive installations.
{MODEL_URL}
and {VOICES_URL}
and place them in the same folder as the node.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.