Empower your AI videos with Wan 2.1 Fun.

FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

IC-Light | Video Relighting | AnimateDiff

Relight your videos with light maps and prompts

ComfyUI > Nodes > Comfyui-Spark-TTS > SparkTTS Advanced Voice Clone

ComfyUI Node: SparkTTS Advanced Voice Clone

Class Name

SparkTTS_AdvVoiceClone

Category
🧪AILab/🔊Audio

Author
1038lab (Account age: 774days) Extension
Comfyui-Spark-TTS Latest Updated
2025-04-15 Github Stars
0.09K

Github Ask 1038lab Current Questions Past Questions

Table of Content

Description
SparkTTS_AdvVoiceClone:
SparkTTS_AdvVoiceClone Input Parameters:
SparkTTS_AdvVoiceClone Output Parameters:
SparkTTS_AdvVoiceClone Usage Tips:
SparkTTS_AdvVoiceClone Common Errors and Solutions:
Related Nodes

How to Install Comfyui-Spark-TTS

Install this extension via the ComfyUI Manager by searching for Comfyui-Spark-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter Comfyui-Spark-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

SparkTTS Advanced Voice Clone Description

Advanced voice cloning node with pitch and speed control for personalized text-to-speech outputs in English and Chinese.

SparkTTS Advanced Voice Clone:

The SparkTTS_AdvVoiceClone node is a powerful tool designed for advanced voice cloning, allowing you to replicate a voice from a reference audio sample with additional control over pitch and speed. This node is particularly beneficial for creating personalized and dynamic text-to-speech outputs, as it enables you to fine-tune the vocal characteristics to match specific needs or artistic visions. By leveraging the capabilities of SparkTTS, this node supports both English and Chinese languages, making it versatile for a wide range of applications. The main goal of this node is to provide a high-quality voice cloning experience that can be customized to suit various creative projects, ensuring that the synthesized speech closely resembles the original speaker's voice while allowing for creative adjustments in tone and tempo.

SparkTTS Advanced Voice Clone Input Parameters:

text

This parameter is the text you wish to synthesize using the cloned voice. It supports multiline input, allowing you to enter longer passages of text. The default text is a placeholder that explains the node's function. You can separate paragraphs with double line breaks to structure the output speech. This input is crucial as it defines the content of the synthesized speech.

reference_audio

The reference_audio parameter is an audio sample from which the voice will be cloned. This audio serves as the basis for capturing the unique vocal characteristics of the speaker, such as tone, accent, and style. Providing a clear and high-quality audio sample will significantly enhance the accuracy of the voice cloning process.

reference_text

This parameter requires the exact text spoken in the reference audio. By providing this text, you help the model understand the speaker's pronunciation patterns, which significantly improves the quality of the voice cloning. It is especially important for capturing nuances in speech and ensuring that the synthesized voice closely matches the original.

pitch

The pitch parameter allows you to adjust the pitch of the synthesized voice. You can choose from options like "very_low," "low," "moderate," "high," and "very_high," with "moderate" being the default. Adjusting the pitch can help match the emotional tone or artistic style you are aiming for in your project.

speed

This parameter controls the speed of the synthesized speech. Similar to pitch, you can select from "very_low," "low," "moderate," "high," and "very_high," with "moderate" as the default. Modifying the speed can be useful for creating different pacing effects, such as a slow, dramatic narration or a fast-paced, energetic delivery.

max_tokens

The max_tokens parameter determines the maximum length of the generated speech in terms of tokens. It ranges from 500 to 5000, with a default value of 3000. Higher values allow for longer text synthesis but require more memory. If you encounter out-of-memory errors, consider reducing this value. Conversely, increase it for very long texts to ensure the entire content is synthesized.

SparkTTS Advanced Voice Clone Output Parameters:

synthesized_audio

The synthesized_audio output is the final audio file generated by the node, containing the text-to-speech synthesis based on the input parameters. This audio reflects the cloned voice characteristics, adjusted pitch, and speed settings, providing a customized and high-quality speech output that can be used in various creative projects.

SparkTTS Advanced Voice Clone Usage Tips:

Ensure the reference audio is clear and free from background noise to improve the accuracy of the voice cloning process.
Use the reference text to match the spoken content in the reference audio closely, as this enhances the model's ability to replicate the speaker's pronunciation and style.
Experiment with different pitch and speed settings to achieve the desired vocal effect, whether it's for artistic expression or matching a specific character's voice.
Adjust the max_tokens parameter based on the length of your text to avoid memory issues and ensure complete synthesis of longer passages.

SparkTTS Advanced Voice Clone Common Errors and Solutions:

"Out of memory error"

Explanation: This error occurs when the system runs out of memory while processing a large text input.
Solution: Reduce the max_tokens parameter to a lower value to decrease memory usage, or try processing shorter text segments.

"Reference audio not found"

Explanation: The node cannot locate the specified reference audio file.
Solution: Ensure the reference audio file path is correct and the file is accessible. Check for any typos or incorrect directory paths.

"Invalid reference text"

Explanation: The reference text does not match the spoken content in the reference audio.
Solution: Verify that the reference text accurately reflects the words spoken in the reference audio to improve cloning quality.

SparkTTS Advanced Voice Clone Related Nodes

Go back to the extension to check out more related nodes.

Comfyui-Spark-TTS

Table of Content

Description
SparkTTS_AdvVoiceClone:
SparkTTS_AdvVoiceClone Input Parameters:
SparkTTS_AdvVoiceClone Output Parameters:
SparkTTS_AdvVoiceClone Usage Tips:
SparkTTS_AdvVoiceClone Common Errors and Solutions:
Related Nodes

Dance Video Transform | Scene Customization & Face Swap

Transform dance videos with scene editing, face-swapping, and motion preservation.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Step1X-Edit | AI Image Editing Tool

Perform 11 editing operations with natural language in Step1X-Edit.

Flux Depth and Canny

Official Flux Tools - Flux Depth and Canny ControlNet Model

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.