ComfyUI > Nodes > ComfyUI-FunAudioLLM

ComfyUI Extension: ComfyUI-FunAudioLLM

Repo Name

ComfyUI-FunAudioLLM

Author
SpenserCai (Account age: 2873 days)
Nodes
View all nodes(8)
Latest Updated
2024-11-27
Github Stars
0.05K

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FunAudioLLM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-FunAudioLLM Description

ComfyUI-FunAudioLLM is a custom node for integrating FunAudioLLM, including CosyVoice and SenseVoice, into ComfyUI, enhancing audio processing capabilities.

ComfyUI-FunAudioLLM Introduction

ComfyUI-FunAudioLLM is an extension designed to enhance the capabilities of the ComfyUI platform by integrating advanced audio processing models. This extension includes two main components: CosyVoice and SenseVoice. These components are part of the FunAudioLLM suite, which focuses on audio understanding and generation. CosyVoice is tailored for natural voice generation, supporting multiple languages and voice cloning, while SenseVoice excels in audio understanding tasks such as speech recognition and emotion detection. This extension is particularly beneficial for AI artists looking to incorporate sophisticated audio features into their projects, enabling them to create more immersive and interactive audio experiences.

How ComfyUI-FunAudioLLM Works

ComfyUI-FunAudioLLM operates by leveraging pre-trained models to process and generate audio data. The extension uses CosyVoice for generating natural-sounding speech in various languages and styles, and SenseVoice for understanding and analyzing audio inputs. CosyVoice can perform tasks like zero-shot voice generation, where it can generate speech without prior examples, and cross-lingual voice cloning, which allows it to mimic voices across different languages. SenseVoice, on the other hand, can recognize speech, detect emotions, and classify acoustic events, making it a versatile tool for audio analysis. By integrating these models into ComfyUI, users can easily apply these advanced audio capabilities to their creative projects.

ComfyUI-FunAudioLLM Features

CosyVoice

  • Version: 2024-10-04
  • Capabilities: Supports SFT (Supervised Fine-Tuning), zero-shot, cross-lingual, and instruct modes.
  • Models: CosyVoice-300M-25Hz for zero-shot and cross-lingual tasks.
  • Customization: Users can save and load speaker models in zero-shot mode, allowing for personalized voice generation.

SenseVoice

  • Version: 2024-10-04
  • Capabilities: Includes SenseVoice-Small model for efficient audio understanding.
  • Features: Supports punctuation segmentation, which can be toggled by disabling the fast mode for more detailed audio analysis.

ComfyUI-FunAudioLLM Models

The extension includes several models, each tailored for specific tasks:

  • CosyVoice-300M: Ideal for general voice generation tasks.
  • CosyVoice-300M-25Hz: Optimized for zero-shot and cross-lingual voice generation.
  • CosyVoice-300M-SFT: Designed for tasks requiring supervised fine-tuning.
  • CosyVoice-300M-Instruct: Suitable for instruction-following voice generation.
  • SenseVoice-Small: A compact model for efficient speech recognition and emotion detection. These models can be selected based on the specific needs of your project, whether it's generating speech in a new language or analyzing the emotional tone of an audio clip.

Troubleshooting ComfyUI-FunAudioLLM

If you encounter issues while using ComfyUI-FunAudioLLM, here are some common solutions:

  • Model Loading Issues: Ensure that the models are correctly downloaded and placed in the specified directories. Check the paths and filenames for any discrepancies.
  • Audio Processing Errors: Verify that the input audio files are in a supported format and within the recommended duration limits.
  • Performance Problems: If the extension is running slowly, consider using a smaller model like SenseVoice-Small or adjusting the batch size settings. For further assistance, refer to the FunAudioLLM documentation or community forums.

Learn More about ComfyUI-FunAudioLLM

To deepen your understanding of ComfyUI-FunAudioLLM and its capabilities, explore the following resources:

ComfyUI-FunAudioLLM Related Nodes

RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.