ComfyUI > Nodes > ComfyUI-MARS5-TTS > MARS5-TTS Node

ComfyUI Node: MARS5-TTS Node

Class Name

MARS5TTS_Node

Category
AIFSH_MARS5_TTS
Author
AIFSH (Account age: 253days)
Extension
ComfyUI-MARS5-TTS
Latest Updated
2024-07-02
Github Stars
0.02K

How to Install ComfyUI-MARS5-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-MARS5-TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-MARS5-TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

MARS5-TTS Node Description

Text-to-speech node using MARS5-TTS model for high-quality speech synthesis with voice cloning and customization options.

MARS5-TTS Node:

The MARS5TTS_Node is a powerful tool designed to convert text into speech using advanced deep learning models. This node leverages the MARS5-TTS model, which is pre-trained to generate high-quality, natural-sounding speech. The primary goal of this node is to provide a seamless and efficient way to synthesize speech from text, with the added capability of cloning voices from reference audio files. This makes it an invaluable asset for AI artists looking to create personalized and dynamic audio content. The node supports various customization options, allowing you to fine-tune the speech synthesis process to match specific needs, such as adjusting the temperature for more creative outputs or using deep cloning for more accurate voice replication.

MARS5-TTS Node Input Parameters:

text

This parameter represents the text that you want to convert into speech. The input should be a string containing the text content. The quality and naturalness of the generated speech will depend on the clarity and structure of the input text.

ref_voice

This parameter is the file path to a reference audio file containing the voice you want to clone. The reference voice helps the model to mimic the tone, pitch, and style of the provided audio. The file should be in a format supported by the librosa library, such as WAV.

if_deep_clone

This boolean parameter determines whether to use deep cloning for voice replication. When set to True, the model requires a reference transcript to accurately clone the voice. This option is useful for achieving high fidelity in voice replication. Default value is False.

rep_penalty_window

This parameter controls the repetition penalty window size. It helps in reducing repetitive patterns in the generated speech. A larger window size can lead to more varied and natural-sounding speech. The value should be an integer.

top_k

This parameter sets the number of top tokens to consider during the sampling process. A higher value allows for more diversity in the generated speech, while a lower value makes the output more deterministic. The value should be an integer.

temperature

This parameter adjusts the randomness of the speech generation process. A higher temperature results in more creative and varied outputs, while a lower temperature produces more stable and predictable speech. The value should be a float, typically between 0.7 and 1.5.

freq_penalty

This parameter applies a penalty to frequent tokens, encouraging the model to use less common words and phrases. This can help in generating more diverse and interesting speech. The value should be a float.

ref_transcript

This optional parameter is the transcript of the reference audio file. It is required if if_deep_clone is set to True. The transcript helps the model to better understand and replicate the reference voice.

MARS5-TTS Node Output Parameters:

outfile

This parameter is the file path to the generated speech audio file. The output is a WAV file containing the synthesized speech based on the input text and reference voice. The file is saved in the specified output directory with a unique timestamp to avoid overwriting.

MARS5-TTS Node Usage Tips:

  • Ensure that the reference audio file is clear and of high quality to achieve the best voice cloning results.
  • Experiment with the temperature parameter to find the right balance between creativity and stability in the generated speech.
  • Use the rep_penalty_window parameter to reduce repetitive patterns and make the speech sound more natural.
  • If using deep cloning, provide an accurate and well-aligned transcript of the reference audio to improve the fidelity of the voice replication.

MARS5-TTS Node Common Errors and Solutions:

"deep clone need ref_transcript,but you give nothing!"

  • Explanation: This error occurs when if_deep_clone is set to True, but no reference transcript is provided.
  • Solution: Ensure that you provide a valid transcript of the reference audio file when using deep cloning.

"File not found: ref_voice"

  • Explanation: This error indicates that the specified reference audio file could not be found.
  • Solution: Verify the file path and ensure that the reference audio file exists and is accessible.

"Invalid value for temperature"

  • Explanation: This error occurs when the temperature parameter is set to a value outside the acceptable range.
  • Solution: Set the temperature parameter to a float value between 0.7 and 1.5.

"Model loading failed"

  • Explanation: This error indicates that the MARS5-TTS model could not be loaded.
  • Solution: Ensure that the model files are correctly placed in the specified directory and that the device has enough resources to load the model.

MARS5-TTS Node Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-MARS5-TTS
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.