FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

Wan 2.1 Control LoRA | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

ComfyUI > Nodes > ComfyUI-FunAudioLLM

ComfyUI Extension: ComfyUI-FunAudioLLM

Repo Name

ComfyUI-FunAudioLLM

Author
SpenserCai (Account age: 3000 days) Nodes
View all nodes(8) Latest Updated
2024-11-27 Github Stars
0.08K

Github Ask SpenserCai Current Questions Past Questions

Table of Content

Description
ComfyUI-FunAudioLLM Introduction
How ComfyUI-FunAudioLLM Works
ComfyUI-FunAudioLLM Features
ComfyUI-FunAudioLLM Models
Troubleshooting ComfyUI-FunAudioLLM
Learn More about ComfyUI-FunAudioLLM
Related Nodes

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-FunAudioLLM Description

ComfyUI-FunAudioLLM is a custom node for integrating FunAudioLLM, including CosyVoice and SenseVoice, into ComfyUI, enhancing audio processing capabilities.

ComfyUI-FunAudioLLM Introduction

ComfyUI-FunAudioLLM is an extension designed to enhance the capabilities of the ComfyUI platform by integrating advanced audio processing models. This extension includes two main components: CosyVoice and SenseVoice. These components are part of the FunAudioLLM suite, which focuses on audio understanding and generation. CosyVoice is tailored for natural voice generation, supporting multiple languages and voice cloning, while SenseVoice excels in audio understanding tasks such as speech recognition and emotion detection. This extension is particularly beneficial for AI artists looking to incorporate sophisticated audio features into their projects, enabling them to create more immersive and interactive audio experiences.

How ComfyUI-FunAudioLLM Works

ComfyUI-FunAudioLLM operates by leveraging pre-trained models to process and generate audio data. The extension uses CosyVoice for generating natural-sounding speech in various languages and styles, and SenseVoice for understanding and analyzing audio inputs. CosyVoice can perform tasks like zero-shot voice generation, where it can generate speech without prior examples, and cross-lingual voice cloning, which allows it to mimic voices across different languages. SenseVoice, on the other hand, can recognize speech, detect emotions, and classify acoustic events, making it a versatile tool for audio analysis. By integrating these models into ComfyUI, users can easily apply these advanced audio capabilities to their creative projects.

ComfyUI-FunAudioLLM Features

CosyVoice

Version: 2024-10-04
Capabilities: Supports SFT (Supervised Fine-Tuning), zero-shot, cross-lingual, and instruct modes.
Models: CosyVoice-300M-25Hz for zero-shot and cross-lingual tasks.
Customization: Users can save and load speaker models in zero-shot mode, allowing for personalized voice generation.

SenseVoice

Version: 2024-10-04
Capabilities: Includes SenseVoice-Small model for efficient audio understanding.
Features: Supports punctuation segmentation, which can be toggled by disabling the fast mode for more detailed audio analysis.

ComfyUI-FunAudioLLM Models

The extension includes several models, each tailored for specific tasks:

CosyVoice-300M: Ideal for general voice generation tasks.
CosyVoice-300M-25Hz: Optimized for zero-shot and cross-lingual voice generation.
CosyVoice-300M-SFT: Designed for tasks requiring supervised fine-tuning.
CosyVoice-300M-Instruct: Suitable for instruction-following voice generation.
SenseVoice-Small: A compact model for efficient speech recognition and emotion detection.

These models can be selected based on the specific needs of your project, whether it's generating speech in a new language or analyzing the emotional tone of an audio clip.

Troubleshooting ComfyUI-FunAudioLLM

If you encounter issues while using ComfyUI-FunAudioLLM, here are some common solutions:

Model Loading Issues: Ensure that the models are correctly downloaded and placed in the specified directories. Check the paths and filenames for any discrepancies.
Audio Processing Errors: Verify that the input audio files are in a supported format and within the recommended duration limits.
Performance Problems: If the extension is running slowly, consider using a smaller model like SenseVoice-Small or adjusting the batch size settings.

For further assistance, refer to the FunAudioLLM documentation or community forums.

Learn More about ComfyUI-FunAudioLLM

To deepen your understanding of ComfyUI-FunAudioLLM and its capabilities, explore the following resources:

These resources provide tutorials, documentation, and community support to help you make the most of the ComfyUI-FunAudioLLM extension in your creative projects.

ComfyUI-FunAudioLLM Related Nodes

CosyVoice 跨语言克隆

CosyVoice 自然语言控制

CosyVoice 从URL加载说话人模型

CosyVoice 加载说话人模型

CosyVoice 预训练音色

CosyVoice 保存说话人模型

CosyVoice 3s极速克隆

SenseVoice 语音识别

Table of Content

Description
ComfyUI-FunAudioLLM Introduction
How ComfyUI-FunAudioLLM Works
ComfyUI-FunAudioLLM Features
ComfyUI-FunAudioLLM Models
Troubleshooting ComfyUI-FunAudioLLM
Learn More about ComfyUI-FunAudioLLM
Related Nodes

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Hunyuan Image to Video | Breathtaking Motion Creator

Create magnificent movies out of still images through cinematic motion and customizable effects.

FLUX IPAdapter V2 | XLabs

Explore XLabs FLUX IPAdapter V2 model compared to V1 for your creative goals.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.