ComfyUI > Nodes > ComfyUI-MARS5-TTS

ComfyUI Extension: ComfyUI-MARS5-TTS

Repo Name

ComfyUI-MARS5-TTS

Author
AIFSH (Account age: 253 days)
Nodes
View all nodes(4)
Latest Updated
2024-07-02
Github Stars
0.02K

How to Install ComfyUI-MARS5-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-MARS5-TTS
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-MARS5-TTS in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-MARS5-TTS Description

ComfyUI-MARS5-TTS is a custom node for ComfyUI, integrating the MARS5-TTS text-to-speech system. It enhances ComfyUI by enabling advanced TTS functionalities, leveraging MARS5-TTS's capabilities for improved speech synthesis.

ComfyUI-MARS5-TTS Introduction

ComfyUI-MARS5-TTS is a custom node extension for the ComfyUI interface, designed to integrate the powerful MARS5 Text-to-Speech (TTS) model. This extension allows AI artists to generate high-quality, natural-sounding speech from text inputs using the MARS5 model. Whether you're creating voiceovers for animations, generating dialogue for virtual characters, or experimenting with AI-generated speech, ComfyUI-MARS5-TTS provides a user-friendly way to harness the capabilities of advanced TTS technology.

How ComfyUI-MARS5-TTS Works

At its core, ComfyUI-MARS5-TTS leverages the MARS5 model, which uses a two-stage process to generate speech. The first stage involves an autoregressive (AR) model that generates coarse speech features from the input text and reference audio. The second stage refines these features using a non-autoregressive (NAR) model to produce the final high-quality audio output. This process allows the model to handle complex prosody and diverse speech scenarios, making it suitable for a wide range of applications.

Example Workflow

  1. Input Reference Audio: Provide a short audio clip (2-12 seconds) that the model will use to mimic the voice. Reference Audio Example

  2. Input Text: Provide the text that you want to be converted into speech.

  • Example Text:
we're going to make America great again. we're a failing nation right now. we're a seriously failing nation
  1. Output: The model generates the speech audio based on the input text and reference audio.

Output Audio Example

ComfyUI-MARS5-TTS Features

Key Features

  • High-Quality Speech Generation: Produces natural and expressive speech, suitable for various applications.
  • Voice Cloning: Mimics the voice from a reference audio clip, allowing for personalized speech synthesis.
  • Customizable Prosody: Adjusts speech patterns using punctuation and capitalization in the input text.
  • Deep and Shallow Cloning: Offers two modes of operation for different quality and speed requirements.

Customization Options

  • Deep Clone: Provides higher quality by using both the reference audio and its transcript. This mode is slower but results in more accurate voice cloning.
  • Shallow Clone: Faster and requires only the reference audio, suitable for quick and less detailed speech generation.

ComfyUI-MARS5-TTS Models

The extension uses the MARS5 model, which includes two main components:

  1. Autoregressive (AR) Model: Generates initial coarse speech features from the input text and reference audio.
  2. Non-Autoregressive (NAR) Model: Refines the coarse features to produce the final high-quality audio output.

When to Use Each Model

  • AR Model: Best for generating the initial structure of the speech, especially useful for complex text inputs.
  • NAR Model: Ideal for refining the speech to achieve high-quality and natural-sounding audio.

Troubleshooting ComfyUI-MARS5-TTS

Common Issues and Solutions

  1. Model Not Loading:
  • Solution: Ensure that all dependencies are installed correctly. Run pip install -r requirements.txt in the ComfyUI-MARS5-TTS directory.
  1. Poor Audio Quality:
  • Solution: Use a clean and clear reference audio clip between 2-12 seconds. Ensure the input text is well-punctuated and correctly capitalized.
  1. Slow Performance:
  • Solution: Use the shallow clone mode for faster results. Ensure your hardware meets the necessary requirements for running the model.

Frequently Asked Questions

  • Q: Can I use any audio clip as a reference?
  • A: Yes, but for best results, use a clean audio clip between 2-12 seconds.
  • Q: How do I improve the prosody of the generated speech?
  • A: Use proper punctuation and capitalization in the input text to guide the model.

Learn More about ComfyUI-MARS5-TTS

For additional resources, tutorials, and community support, check out the following links:

  • MARS5-TTS GitHub Repository
  • MARS5 Model Architecture
  • MARS5-TTS Samples (https://6b1a3a8e53ae.ngrok.app/)
  • ComfyUI-MARS5-TTS Tutorial Video (https://b23.tv/etjjwVd) By exploring these resources, you can gain a deeper understanding of how to use ComfyUI-MARS5-TTS effectively and get the most out of its features.

ComfyUI-MARS5-TTS Related Nodes

RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.