ComfyUI > Nodes > Comfyui-Spark-TTS > SparkTTS Voice Clone

ComfyUI Node: SparkTTS Voice Clone

Class Name

SparkTTS_VoiceClone

Category
🧪AILab/🔊Audio
Author
1038lab (Account age: 774days)
Extension
Comfyui-Spark-TTS
Latest Updated
2025-04-15
Github Stars
0.09K

How to Install Comfyui-Spark-TTS

Install this extension via the ComfyUI Manager by searching for Comfyui-Spark-TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter Comfyui-Spark-TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

SparkTTS Voice Clone Description

Facilitates voice cloning through text-to-speech synthesis for personalized, high-quality speech replication in English and Chinese.

SparkTTS Voice Clone:

The SparkTTS_VoiceClone node is a powerful tool designed to facilitate voice cloning using text-to-speech synthesis. It allows you to replicate a voice from a reference audio sample, enabling the creation of synthetic speech that closely mimics the original speaker's voice. This node is particularly beneficial for applications requiring personalized voice outputs, such as virtual assistants, audiobooks, or any creative project where a specific voice tone is desired. By leveraging advanced machine learning models, SparkTTS_VoiceClone ensures high-quality voice replication, supporting both English and Chinese languages. The node's primary goal is to provide an easy-to-use interface for generating realistic and natural-sounding speech, enhancing the user experience in various audio applications.

SparkTTS Voice Clone Input Parameters:

text

This parameter is a string input where you enter the text you wish to synthesize using the cloned voice. It supports multiline input, allowing you to separate paragraphs with double line breaks. The default text is "This is the SparkTTS voice clone node, you can clone the voice from a reference audio. Enter reference text to improve voice cloning quality. Currently we only support English and Chinese." This input is crucial as it defines the content of the generated speech.

reference_audio

This parameter requires an audio file that serves as the reference for voice cloning. The audio sample should contain the voice you want to replicate. It is essential for the node to analyze and extract the unique characteristics of the speaker's voice, which will be used to synthesize new speech.

reference_text

This string input should contain the exact text spoken in the reference audio. Providing this text significantly enhances the quality of voice cloning by helping the model understand the speaker's pronunciation patterns. It supports multiline input and is left empty by default. Accurate reference text is vital for achieving a high-fidelity voice clone.

max_tokens

This integer parameter controls the maximum length of the generated speech. It ranges from 500 to 5000, with a default value of 3000. Higher values allow for longer text synthesis but require more memory. If you encounter out-of-memory errors, consider reducing this value. Conversely, increase it for synthesizing very long texts.

SparkTTS Voice Clone Output Parameters:

synthesized_audio

The output of the SparkTTS_VoiceClone node is the synthesized_audio, which is an audio file containing the synthesized speech. This output is the result of the voice cloning process, where the input text is spoken in the voice of the reference audio. The quality and naturalness of the output depend on the accuracy of the reference audio and text provided. This audio can be used in various applications, such as voiceovers, virtual assistants, or any project requiring a specific voice tone.

SparkTTS Voice Clone Usage Tips:

  • Ensure that the reference audio is clear and free from background noise to improve the quality of the voice clone.
  • Provide accurate reference text that matches the spoken content in the reference audio to enhance pronunciation accuracy.
  • Adjust the max_tokens parameter based on the length of the text you wish to synthesize, keeping in mind the memory limitations of your system.
  • Experiment with different text inputs to explore the versatility of the cloned voice in various contexts.

SparkTTS Voice Clone Common Errors and Solutions:

"Out of memory error"

  • Explanation: This error occurs when the system runs out of memory while processing a large text input.
  • Solution: Reduce the max_tokens parameter to a lower value to decrease memory usage.

"Reference audio not found"

  • Explanation: The node cannot locate the specified reference audio file.
  • Solution: Ensure the correct file path is provided and that the file exists in the specified location.

"Mismatch between reference text and audio"

  • Explanation: The reference text does not match the spoken content in the reference audio, leading to poor voice cloning quality.
  • Solution: Verify that the reference text accurately reflects the speech in the reference audio and make necessary corrections.

SparkTTS Voice Clone Related Nodes

Go back to the extension to check out more related nodes.
Comfyui-Spark-TTS
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.