Install this extension via the ComfyUI Manager by searching
for ComfyUI-FunAudioLLM
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-FunAudioLLM is a custom node for integrating FunAudioLLM, including CosyVoice and SenseVoice, into ComfyUI, enhancing audio processing capabilities.
ComfyUI-FunAudioLLM Introduction
ComfyUI-FunAudioLLM is an extension designed to enhance the capabilities of the ComfyUI platform by integrating advanced audio processing models. This extension includes two main components: CosyVoice and SenseVoice. These components are part of the FunAudioLLM suite, which focuses on audio understanding and generation. CosyVoice is tailored for natural voice generation, supporting multiple languages and voice cloning, while SenseVoice excels in audio understanding tasks such as speech recognition and emotion detection. This extension is particularly beneficial for AI artists looking to incorporate sophisticated audio features into their projects, enabling them to create more immersive and interactive audio experiences.
How ComfyUI-FunAudioLLM Works
ComfyUI-FunAudioLLM operates by leveraging pre-trained models to process and generate audio data. The extension uses CosyVoice for generating natural-sounding speech in various languages and styles, and SenseVoice for understanding and analyzing audio inputs. CosyVoice can perform tasks like zero-shot voice generation, where it can generate speech without prior examples, and cross-lingual voice cloning, which allows it to mimic voices across different languages. SenseVoice, on the other hand, can recognize speech, detect emotions, and classify acoustic events, making it a versatile tool for audio analysis. By integrating these models into ComfyUI, users can easily apply these advanced audio capabilities to their creative projects.
ComfyUI-FunAudioLLM Features
CosyVoice
Version: 2024-10-04
Capabilities: Supports SFT (Supervised Fine-Tuning), zero-shot, cross-lingual, and instruct modes.
Models: CosyVoice-300M-25Hz for zero-shot and cross-lingual tasks.
Customization: Users can save and load speaker models in zero-shot mode, allowing for personalized voice generation.
SenseVoice
Version: 2024-10-04
Capabilities: Includes SenseVoice-Small model for efficient audio understanding.
Features: Supports punctuation segmentation, which can be toggled by disabling the fast mode for more detailed audio analysis.
ComfyUI-FunAudioLLM Models
The extension includes several models, each tailored for specific tasks:
CosyVoice-300M: Ideal for general voice generation tasks.
CosyVoice-300M-25Hz: Optimized for zero-shot and cross-lingual voice generation.
CosyVoice-300M-SFT: Designed for tasks requiring supervised fine-tuning.
CosyVoice-300M-Instruct: Suitable for instruction-following voice generation.
SenseVoice-Small: A compact model for efficient speech recognition and emotion detection.
These models can be selected based on the specific needs of your project, whether it's generating speech in a new language or analyzing the emotional tone of an audio clip.
Troubleshooting ComfyUI-FunAudioLLM
If you encounter issues while using ComfyUI-FunAudioLLM, here are some common solutions:
Model Loading Issues: Ensure that the models are correctly downloaded and placed in the specified directories. Check the paths and filenames for any discrepancies.
Audio Processing Errors: Verify that the input audio files are in a supported format and within the recommended duration limits.
Performance Problems: If the extension is running slowly, consider using a smaller model like SenseVoice-Small or adjusting the batch size settings.
For further assistance, refer to the FunAudioLLM documentation or community forums.
Learn More about ComfyUI-FunAudioLLM
To deepen your understanding of ComfyUI-FunAudioLLM and its capabilities, explore the following resources:
Modelscope Demos for SenseVoice
These resources provide tutorials, documentation, and community support to help you make the most of the ComfyUI-FunAudioLLM extension in your creative projects.