ICEdit | Fast AI Image Editing with Nunchaku

ICEdit+Nunchaku: A solution for ultra-fast, precise AI image editing.

FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

Step1X-Edit | AI Image Editing Tool

Perform 11 editing operations with natural language in Step1X-Edit.

Audioreactive Dancers Evolved

Transform your subject with an audioreactive background made of intricate geometries.

ComfyUI > Nodes > Comfyui-Spark-TTS > SparkTTS Voice Creator

ComfyUI Node: SparkTTS Voice Creator

Class Name

SparkTTS_VoiceCreator

Category
🧪AILab/🔊Audio

Author
1038lab (Account age: 774days) Extension
Comfyui-Spark-TTS Latest Updated
2025-04-15 Github Stars
0.09K

Github Ask 1038lab Current Questions Past Questions

Table of Content

Description
SparkTTS_VoiceCreator:
SparkTTS_VoiceCreator Input Parameters:
SparkTTS_VoiceCreator Output Parameters:
SparkTTS_VoiceCreator Usage Tips:
SparkTTS_VoiceCreator Common Errors and Solutions:
Related Nodes

How to Install Comfyui-Spark-TTS

Install this extension via the ComfyUI Manager by searching for Comfyui-Spark-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Comfyui-Spark-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

SparkTTS Voice Creator Description

Facilitates synthetic voice creation using SparkTTS for customizable, natural-sounding voices in ComfyUI integration.

SparkTTS Voice Creator:

The SparkTTS_VoiceCreator node is designed to facilitate the creation of synthetic voices using the SparkTTS text-to-speech synthesis system. This node is part of the ComfyUI-SparkTTS integration, which provides a robust platform for generating high-quality speech from text inputs. The primary goal of the SparkTTS_VoiceCreator is to enable users to create unique and natural-sounding voices by leveraging advanced machine learning models. This node is particularly beneficial for AI artists and developers who wish to incorporate custom voice synthesis into their projects, offering a seamless way to generate speech that can be tailored to specific needs. By utilizing the SparkTTS model, which supports multiple languages and offers fine control over speech characteristics, users can achieve a high degree of customization and realism in their voice outputs.

SparkTTS Voice Creator Input Parameters:

text

The text parameter is a string input that allows you to specify the text you want to convert into speech. This parameter supports multiline input, enabling you to enter longer passages of text. The default value is a sample text that demonstrates the node's capabilities. You can use double line breaks to separate paragraphs, which helps in structuring the speech output. This parameter is crucial as it directly influences the content of the generated speech.

reference_audio

The reference_audio parameter is an audio input that serves as a sample for cloning the voice. This parameter is essential for creating a voice that closely resembles the characteristics of the provided audio sample. By analyzing the reference audio, the node can capture unique voice traits, such as tone and accent, to produce a more personalized and accurate voice synthesis.

reference_text

The reference_text parameter is a string input that should contain the exact text spoken in the reference audio. This input significantly enhances the quality of voice cloning by helping the model understand the speaker's pronunciation patterns. Providing accurate reference text ensures that the synthesized voice closely matches the original speaker's style and intonation.

max_tokens

The max_tokens parameter is an integer input that controls the maximum length of the generated speech. It has a default value of 3000, with a minimum of 500 and a maximum of 5000. This parameter is important for managing memory usage and ensuring that the node can handle longer texts without running into out-of-memory errors. Adjusting this value allows you to balance between the length of the speech and the available computational resources.

SparkTTS Voice Creator Output Parameters:

wav

The wav output parameter provides the generated audio in waveform format. This output is the result of the text-to-speech synthesis process, where the input text is converted into a natural-sounding voice. The waveform can be used in various applications, such as voiceovers, virtual assistants, or any project requiring synthetic speech. The quality and characteristics of the output audio depend on the input parameters and the reference audio provided.

SparkTTS Voice Creator Usage Tips:

Ensure that the reference_audio is clear and of high quality to achieve the best voice cloning results.
Use accurate reference_text to improve the model's understanding of pronunciation patterns, leading to more natural-sounding speech.
Adjust the max_tokens parameter based on the length of your text and available memory resources to prevent out-of-memory errors.

SparkTTS Voice Creator Common Errors and Solutions:

"Failed to import from sparktts. Please make sure the sparktts folder exists."

Explanation: This error occurs when the necessary SparkTTS modules are not found in the expected directory.
Solution: Ensure that the SparkTTS library is correctly installed and that the sparktts folder is present in the specified path.

"huggingface_hub not available, automatic model download disabled"

Explanation: This message indicates that the Hugging Face Hub library is not installed, preventing automatic model downloads.
Solution: Install the huggingface_hub library using a package manager like pip to enable automatic model downloads.

SparkTTS Voice Creator Related Nodes

Go back to the extension to check out more related nodes.

Comfyui-Spark-TTS

Table of Content

Description
SparkTTS_VoiceCreator:
SparkTTS_VoiceCreator Input Parameters:
SparkTTS_VoiceCreator Output Parameters:
SparkTTS_VoiceCreator Usage Tips:
SparkTTS_VoiceCreator Common Errors and Solutions:
Related Nodes

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

Sonic | Lip-Sync Portrait Animation

Sonic delivers advanced audio-driven lip-sync for portraits with high-quality animation.

Hunyuan3D-2 | Leading-edge 3D Assets Generator

Generate precise textured 3D assets from images with state-of-the-art AI technology.

Wan 2.1 Fun | I2V + T2V

Empower your AI videos with Wan 2.1 Fun.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.