SkyReels-A2 | Multi-Element Video Generation

Combine multi elements into dynamic videos with precision.

Uni3C Video-Referenced Camera & Motion Transfer

Extract camera movements and human motions from reference videos for professional video generation

Wan FusionX | T2V+I2V+VACE Complete

Most powerful video generation solution yet! Cinema-grade detail, your personal film studio.

ComfyUI Vid2Vid Dance Transfer

Transfers the motion and style from a source video onto a target image or object.

ComfyUI > Nodes > ComfyUI-KokoroTTS > Kokoro TextToSpeech

ComfyUI Node: Kokoro TextToSpeech

Class Name

Kokoro TextToSpeech

Category
kokoro

Author
benjiyaya (Account age: 397days) Extension
ComfyUI-KokoroTTS Latest Updated
2025-03-18 Github Stars
0.04K

Github Ask benjiyaya Current Questions Past Questions

Table of Content

Description
Kokoro TextToSpeech:
Kokoro TextToSpeech Input Parameters:
Kokoro TextToSpeech Output Parameters:
Kokoro TextToSpeech Usage Tips:
Kokoro TextToSpeech Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-KokoroTTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-KokoroTTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-KokoroTTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Kokoro TextToSpeech Description

Convert text to speech with Kokoro TTS engine for AI projects, offering natural-sounding audio with various voice options.

Kokoro TextToSpeech:

Kokoro TextToSpeech is a powerful node designed to convert written text into spoken audio using the Kokoro TTS engine. This node is particularly beneficial for AI artists and creators who wish to add a vocal element to their projects, providing a seamless way to generate high-quality speech from text inputs. The node leverages pre-trained models and a variety of voice options to produce natural-sounding audio, making it an essential tool for enhancing multimedia content with voiceovers or narration. Its primary function is to transform text into audio, offering a range of speaker voices to suit different stylistic needs, and ensuring that the generated speech is clear and engaging.

Kokoro TextToSpeech Input Parameters:

text

The text parameter is a string input that represents the written content you wish to convert into speech. This parameter is crucial as it forms the basis of the audio output. The text should be a coherent and grammatically correct sentence or set of sentences to ensure the generated speech is understandable and natural. There are no specific minimum or maximum length restrictions mentioned, but keeping the text concise can help maintain clarity in the audio output.

speaker

The speaker parameter allows you to select the voice that will be used to generate the speech. This parameter offers a variety of options, including voices like "af_sarah", "af_bella", "am_adam", and more, each providing a unique vocal tone and style. The default value is "af_sarah", but you can choose any available speaker to match the desired tone or character for your project. Selecting the right speaker can significantly impact the emotional and stylistic delivery of the text, making it an important consideration for achieving the desired effect in your audio output.

Kokoro TextToSpeech Output Parameters:

audio

The audio output parameter provides the generated speech in an audio format. This output includes a waveform tensor and a sample rate, which are essential for further processing or playback. The waveform represents the audio signal, while the sample rate indicates the number of samples per second, ensuring the audio quality is maintained. This output is crucial for integrating the generated speech into multimedia projects, allowing you to add a vocal dimension to your creative work.

Kokoro TextToSpeech Usage Tips:

Ensure that the text input is clear and free of errors to produce the best quality audio output. Proper punctuation and grammar can enhance the naturalness of the generated speech.
Experiment with different speaker options to find the voice that best fits the tone and style of your project. Each speaker has a unique vocal quality that can influence the overall impact of the audio.

Kokoro TextToSpeech Common Errors and Solutions:

ERROR: could not load kokoro-onnx in generate

Explanation: This error occurs when the Kokoro TTS engine fails to initialize, possibly due to missing or incorrect model files.
Solution: Verify that the model and voice files are correctly placed in the specified directory. Ensure that the paths to these files are correct and that the files are not corrupted.

ERROR: could not generate speech using kokoro.create

Explanation: This error indicates a failure in the speech generation process, which could be due to an invalid text input or an issue with the selected speaker.
Solution: Check the text input for any errors or unsupported characters. Ensure that the selected speaker is available and correctly specified.

ERROR: the text-to-speech generation did not return audio

Explanation: This error suggests that the text-to-speech process did not produce any audio output, possibly due to an empty or invalid text input.
Solution: Make sure the text input is not empty and is formatted correctly. Double-check the input parameters to ensure they are valid and properly configured.

Kokoro TextToSpeech Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-KokoroTTS

Table of Content

Description
Kokoro TextToSpeech:
Kokoro TextToSpeech Input Parameters:
Kokoro TextToSpeech Output Parameters:
Kokoro TextToSpeech Usage Tips:
Kokoro TextToSpeech Common Errors and Solutions:
Related Nodes

ReActor | Fast Face Swap

Professional face swapping toolkit for ComfyUI that enables natural face replacement and enhancement.

UNO | Consistent Subject & Object Generation

Create stable and consistent images from subject and object references.

Stable Fast 3D | ComfyUI 3D Pack

Create stunning 3D content with Stable Fast 3D and ComfyUI 3D Pack.

PMRF Ultra Fast Upscaler | Low VRAM ComfyUI

Ultra fast PMRF upscaler! 3.79s on medium machine. 2x scale.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.