ComfyUI
Playground
Pricing

RunComfy

InfiniteYou | Identity-Preserving Face Generation

Dual-mode identity-preserving generation with Face Combine and Zero-Shot workflows using InfiniteYou.

Wan 2.1 Control LoRA | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Flux Redux | Variation and Restyling

Official Flux Tools - Flux Redux for Image Variation and Restyling

Mochi Edit UnSampling | Video-to-Video

Mochi Edit: Modify Videos Using Text-Based Prompts and Unsampling.

ComfyUI > Nodes > Comfyui-Spark-TTS > SparkTTS Voice Clone

ComfyUI Node: SparkTTS Voice Clone

Class Name

SparkTTS_VoiceClone

Category
🧪AILab/🔊Audio

Author
1038lab (Account age: 774days) Extension
Comfyui-Spark-TTS Latest Updated
2025-04-15 Github Stars
0.09K

Github Ask 1038lab Current Questions Past Questions

Table of Content

Description
SparkTTS_VoiceClone:
SparkTTS_VoiceClone Input Parameters:
SparkTTS_VoiceClone Output Parameters:
SparkTTS_VoiceClone Usage Tips:
SparkTTS_VoiceClone Common Errors and Solutions:
Related Nodes

How to Install Comfyui-Spark-TTS

Install this extension via the ComfyUI Manager by searching for Comfyui-Spark-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Comfyui-Spark-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

SparkTTS Voice Clone Description

Facilitates voice cloning through text-to-speech synthesis for personalized, high-quality speech replication in English and Chinese.

SparkTTS Voice Clone:

The SparkTTS_VoiceClone node is a powerful tool designed to facilitate voice cloning using text-to-speech synthesis. It allows you to replicate a voice from a reference audio sample, enabling the creation of synthetic speech that closely mimics the original speaker's voice. This node is particularly beneficial for applications requiring personalized voice outputs, such as virtual assistants, audiobooks, or any creative project where a specific voice tone is desired. By leveraging advanced machine learning models, SparkTTS_VoiceClone ensures high-quality voice replication, supporting both English and Chinese languages. The node's primary goal is to provide an easy-to-use interface for generating realistic and natural-sounding speech, enhancing the user experience in various audio applications.

SparkTTS Voice Clone Input Parameters:

text

This parameter is a string input where you enter the text you wish to synthesize using the cloned voice. It supports multiline input, allowing you to separate paragraphs with double line breaks. The default text is "This is the SparkTTS voice clone node, you can clone the voice from a reference audio. Enter reference text to improve voice cloning quality. Currently we only support English and Chinese." This input is crucial as it defines the content of the generated speech.

reference_audio

This parameter requires an audio file that serves as the reference for voice cloning. The audio sample should contain the voice you want to replicate. It is essential for the node to analyze and extract the unique characteristics of the speaker's voice, which will be used to synthesize new speech.

reference_text

This string input should contain the exact text spoken in the reference audio. Providing this text significantly enhances the quality of voice cloning by helping the model understand the speaker's pronunciation patterns. It supports multiline input and is left empty by default. Accurate reference text is vital for achieving a high-fidelity voice clone.

max_tokens

This integer parameter controls the maximum length of the generated speech. It ranges from 500 to 5000, with a default value of 3000. Higher values allow for longer text synthesis but require more memory. If you encounter out-of-memory errors, consider reducing this value. Conversely, increase it for synthesizing very long texts.

SparkTTS Voice Clone Output Parameters:

synthesized_audio

The output of the SparkTTS_VoiceClone node is the synthesized_audio, which is an audio file containing the synthesized speech. This output is the result of the voice cloning process, where the input text is spoken in the voice of the reference audio. The quality and naturalness of the output depend on the accuracy of the reference audio and text provided. This audio can be used in various applications, such as voiceovers, virtual assistants, or any project requiring a specific voice tone.

SparkTTS Voice Clone Usage Tips:

Ensure that the reference audio is clear and free from background noise to improve the quality of the voice clone.
Provide accurate reference text that matches the spoken content in the reference audio to enhance pronunciation accuracy.
Adjust the max_tokens parameter based on the length of the text you wish to synthesize, keeping in mind the memory limitations of your system.
Experiment with different text inputs to explore the versatility of the cloned voice in various contexts.

SparkTTS Voice Clone Common Errors and Solutions:

"Out of memory error"

Explanation: This error occurs when the system runs out of memory while processing a large text input.
Solution: Reduce the max_tokens parameter to a lower value to decrease memory usage.

"Reference audio not found"

Explanation: The node cannot locate the specified reference audio file.
Solution: Ensure the correct file path is provided and that the file exists in the specified location.

"Mismatch between reference text and audio"

Explanation: The reference text does not match the spoken content in the reference audio, leading to poor voice cloning quality.
Solution: Verify that the reference text accurately reflects the speech in the reference audio and make necessary corrections.

SparkTTS Voice Clone Related Nodes

Go back to the extension to check out more related nodes.

Comfyui-Spark-TTS

Table of Content

Description
SparkTTS_VoiceClone:
SparkTTS_VoiceClone Input Parameters:
SparkTTS_VoiceClone Output Parameters:
SparkTTS_VoiceClone Usage Tips:
SparkTTS_VoiceClone Common Errors and Solutions:
Related Nodes

Flux Consistent Characters | Input Text

Create consistent characters and ensure they look uniform by inputting text.

Uni3C Video-Referenced Camera & Motion Transfer

Extract camera movements and human motions from reference videos for professional video generation

Flux UltraRealistic LoRA V2

Create stunningly lifelike image with Flux UltraRealistic LoRA V2

OmniGen | Image-To-Image

OmniGen: Modify Images Based on Reference Images and Prompts

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy