Install this extension via the ComfyUI Manager by searching
for ComfyUI-MARS5-TTS
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-MARS5-TTS in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-MARS5-TTS is a custom node for ComfyUI, integrating the MARS5-TTS text-to-speech system. It enhances ComfyUI by enabling advanced TTS functionalities, leveraging MARS5-TTS's capabilities for improved speech synthesis.
ComfyUI-MARS5-TTS Introduction
ComfyUI-MARS5-TTS is a custom node extension for the ComfyUI interface, designed to integrate the powerful MARS5 Text-to-Speech (TTS) model. This extension allows AI artists to generate high-quality, natural-sounding speech from text inputs using the MARS5 model. Whether you're creating voiceovers for animations, generating dialogue for virtual characters, or experimenting with AI-generated speech, ComfyUI-MARS5-TTS provides a user-friendly way to harness the capabilities of advanced TTS technology.
How ComfyUI-MARS5-TTS Works
At its core, ComfyUI-MARS5-TTS leverages the MARS5 model, which uses a two-stage process to generate speech. The first stage involves an autoregressive (AR) model that generates coarse speech features from the input text and reference audio. The second stage refines these features using a non-autoregressive (NAR) model to produce the final high-quality audio output. This process allows the model to handle complex prosody and diverse speech scenarios, making it suitable for a wide range of applications.
Example Workflow
Input Reference Audio: Provide a short audio clip (2-12 seconds) that the model will use to mimic the voice.
Reference Audio Example
Input Text: Provide the text that you want to be converted into speech.
Example Text:
we're going to make America great again. we're a failing nation right now. we're a seriously failing nation
Output: The model generates the speech audio based on the input text and reference audio.
High-Quality Speech Generation: Produces natural and expressive speech, suitable for various applications.
Voice Cloning: Mimics the voice from a reference audio clip, allowing for personalized speech synthesis.
Customizable Prosody: Adjusts speech patterns using punctuation and capitalization in the input text.
Deep and Shallow Cloning: Offers two modes of operation for different quality and speed requirements.
Customization Options
Deep Clone: Provides higher quality by using both the reference audio and its transcript. This mode is slower but results in more accurate voice cloning.
Shallow Clone: Faster and requires only the reference audio, suitable for quick and less detailed speech generation.
ComfyUI-MARS5-TTS Models
The extension uses the MARS5 model, which includes two main components:
Autoregressive (AR) Model: Generates initial coarse speech features from the input text and reference audio.
Non-Autoregressive (NAR) Model: Refines the coarse features to produce the final high-quality audio output.
When to Use Each Model
AR Model: Best for generating the initial structure of the speech, especially useful for complex text inputs.
NAR Model: Ideal for refining the speech to achieve high-quality and natural-sounding audio.
Troubleshooting ComfyUI-MARS5-TTS
Common Issues and Solutions
Model Not Loading:
Solution: Ensure that all dependencies are installed correctly. Run pip install -r requirements.txt in the ComfyUI-MARS5-TTS directory.
Poor Audio Quality:
Solution: Use a clean and clear reference audio clip between 2-12 seconds. Ensure the input text is well-punctuated and correctly capitalized.
Slow Performance:
Solution: Use the shallow clone mode for faster results. Ensure your hardware meets the necessary requirements for running the model.
Frequently Asked Questions
Q: Can I use any audio clip as a reference?
A: Yes, but for best results, use a clean audio clip between 2-12 seconds.
Q: How do I improve the prosody of the generated speech?
A: Use proper punctuation and capitalization in the input text to guide the model.
Learn More about ComfyUI-MARS5-TTS
For additional resources, tutorials, and community support, check out the following links:
ComfyUI-MARS5-TTS Tutorial Video (https://b23.tv/etjjwVd)
By exploring these resources, you can gain a deeper understanding of how to use ComfyUI-MARS5-TTS effectively and get the most out of its features.