MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

Dance Video Transform | Scene Customization & Face Swap

Transform dance videos with scene editing, face-swapping, and motion preservation.

MMAudio | Video-to-Audio

MMAudio: Advanced video-to-audio model for high-quality audio generation.

Trellis | Image to 3D

Trellis is an advanced Image-to-3D model for high-quality 3D assets generation.

ComfyUI > Nodes > comfyui-kokoro > Kokoro Generator

ComfyUI Node: Kokoro Generator

Class Name

KokoroGenerator

Category
kokoro

Author
stavsap (Account age: 4368days) Extension
comfyui-kokoro Latest Updated
2025-02-14 Github Stars
0.04K

Github Ask stavsap Current Questions Past Questions

Table of Content

Description
KokoroGenerator:
KokoroGenerator Input Parameters:
KokoroGenerator Output Parameters:
KokoroGenerator Usage Tips:
KokoroGenerator Common Errors and Solutions:
Related Nodes

How to Install comfyui-kokoro

Install this extension via the ComfyUI Manager by searching for comfyui-kokoro

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter comfyui-kokoro in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Kokoro Generator Description

Generate lifelike speech from text input with various speaker styles and languages using Kokoro ONNX model.

Kokoro Generator:

The KokoroGenerator node is designed to synthesize audio from text input, allowing you to create lifelike speech using a variety of speaker styles and languages. This node leverages the Kokoro ONNX model to transform written text into spoken words, providing a powerful tool for AI artists who wish to incorporate realistic voiceovers into their projects. By specifying parameters such as the speaker's voice, speech speed, and language, you can generate customized audio outputs that suit your creative needs. The KokoroGenerator is particularly beneficial for those looking to add a human touch to their AI-generated content, offering a seamless way to produce high-quality audio with minimal effort.

Kokoro Generator Input Parameters:

text

The text parameter is a string input that represents the content you wish to convert into speech. It supports multiline text, allowing you to input longer passages for synthesis. The default value is "I am a synthesized robot". This parameter is crucial as it directly influences the audio output, with the spoken words reflecting the text provided.

speaker

The speaker parameter specifies the voice style to be used for the audio generation. It is of type KOKORO_SPEAKER, which is a custom type representing different speaker profiles. This parameter allows you to choose from a variety of pre-defined voices, each with unique characteristics, to match the desired tone and style of your project.

speed

The speed parameter is a float that controls the rate of speech in the generated audio. It ranges from 0.1 to 4, with a default value of 1. Adjusting this parameter can make the speech faster or slower, allowing you to tailor the pacing to fit the context of your content. A higher value results in faster speech, while a lower value slows it down.

lang

The lang parameter is a string that determines the language of the synthesized speech. It does not support multiline input and defaults to "en-us". This parameter is essential for ensuring that the pronunciation and intonation of the generated audio align with the specified language, enhancing the authenticity of the speech.

Kokoro Generator Output Parameters:

audio

The audio output is a dictionary containing the generated waveform and its sample rate. The waveform is represented as a tensor, which is a multi-dimensional array used to store the audio data. The sample rate indicates the number of samples per second in the audio, which affects the quality and fidelity of the sound. This output is crucial for further processing or playback, as it provides the actual audio content generated by the node.

Kokoro Generator Usage Tips:

Ensure that the text input is clear and free of errors to achieve the best audio quality, as the generated speech will directly reflect the provided text.
Experiment with different speaker profiles to find the voice that best suits your project's tone and style, enhancing the overall impact of your audio content.
Adjust the speed parameter to match the desired pacing of your audio, keeping in mind that extreme values may affect the naturalness of the speech.

Kokoro Generator Common Errors and Solutions:

ERROR: could not load kokoro-onnx in generate

Explanation: This error occurs when the Kokoro ONNX model cannot be loaded, possibly due to missing or corrupted model files.
Solution: Ensure that the model and voices files are correctly downloaded and located in the specified directory. If necessary, re-download the files using the provided URLs.

no audio is generated

Explanation: This error indicates that the audio generation process failed, resulting in no output.
Solution: Check the input parameters for any inconsistencies or errors. Ensure that the text is valid and that the speaker profile is correctly specified. If the issue persists, verify that the model and voices files are intact and accessible.

Kokoro Generator Related Nodes

Go back to the extension to check out more related nodes.

comfyui-kokoro

Table of Content

Description
KokoroGenerator:
KokoroGenerator Input Parameters:
KokoroGenerator Output Parameters:
KokoroGenerator Usage Tips:
KokoroGenerator Common Errors and Solutions:
Related Nodes

ReActor | Fast Face Swap

Professional face swapping toolkit for ComfyUI that enables natural face replacement and enhancement.

MatAnyone Video Matting | Single Mask Removal

Remove video backgrounds with one mask frame for perfect subject isolation.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

ICEdit | Fast AI Image Editing with Nunchaku

ICEdit+Nunchaku: A solution for ultra-fast, precise AI image editing.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.