Trellis is an advanced Image-to-3D model for high-quality 3D assets generation.

FLUX | A New Art Image Generation

A new image generation model developed by Black Forest Labs

Audioreactive Dancers Evolved

Transform your subject with an audioreactive background made of intricate geometries.

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

ComfyUI > Nodes > ComfyUI-FunAudioLLM > SenseVoice 语音识别

ComfyUI Node: SenseVoice 语音识别

Class Name

SenseVoiceNode

Category
FunAudioLLM - SenseVoice

Author
SpenserCai (Account age: 3000days) Extension
ComfyUI-FunAudioLLM Latest Updated
2024-11-27 Github Stars
0.08K

Github Ask SpenserCai Current Questions Past Questions

Table of Content

Description
SenseVoiceNode:
SenseVoiceNode Input Parameters:
SenseVoiceNode Output Parameters:
SenseVoiceNode Usage Tips:
SenseVoiceNode Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

SenseVoice 语音识别 Description

Powerful speech recognition tool for audio-to-text conversion in ComfyUI-FunAudioLLM suite, ensuring high accuracy and reliability.

SenseVoiceNode:

SenseVoiceNode is a powerful tool designed for speech recognition, enabling the conversion of audio input into text. This node is part of the ComfyUI-FunAudioLLM suite and leverages advanced models to process and transcribe spoken language efficiently. It is particularly beneficial for applications requiring real-time or near-real-time speech-to-text conversion, such as voice-controlled interfaces, transcription services, and accessibility tools. The node supports various configurations to optimize performance based on the length of the audio and the need for punctuation segmentation. By utilizing sophisticated models and techniques, SenseVoiceNode ensures high accuracy and reliability in recognizing and transcribing speech, making it an essential component for developers and AI artists looking to integrate voice recognition capabilities into their projects.

SenseVoiceNode Input Parameters:

audio

The audio parameter is a dictionary containing the audio data to be processed. It includes the waveform of the audio and its sample rate. This parameter is crucial as it provides the raw input that the node will transcribe into text. The waveform should be properly formatted and sampled to ensure accurate transcription results.

use_fast_mode

The use_fast_mode parameter is a boolean that determines whether the node should operate in a faster processing mode. When set to True, the node processes audio more quickly but may impose restrictions on the length of the audio input, specifically if it exceeds 30 seconds. This mode is ideal for applications where speed is prioritized over handling longer audio segments.

punc_segment

The punc_segment parameter is a boolean that indicates whether punctuation segmentation should be applied to the transcribed text. When enabled, the node uses a punctuation model to enhance the readability of the output by adding appropriate punctuation marks. This is particularly useful for generating more natural and human-readable transcriptions.

SenseVoiceNode Output Parameters:

rich_transcription

The rich_transcription output parameter provides the transcribed text from the audio input, enhanced with punctuation and other post-processing features. This output is the final result of the node's processing and is designed to be as accurate and readable as possible, making it suitable for direct use in applications that require text representation of spoken language.

SenseVoiceNode Usage Tips:

To optimize performance for short audio clips, enable use_fast_mode to reduce processing time while ensuring the audio length does not exceed 30 seconds.
Enable punc_segment to improve the readability of the transcribed text by adding punctuation, which is especially useful for creating transcripts that are easy to understand and follow.

SenseVoiceNode Common Errors and Solutions:

Audio length is too long, please set use_fast_mode to False.

Explanation: This error occurs when the audio input exceeds the length limit of 30 seconds while use_fast_mode is enabled.
Solution: Disable use_fast_mode to allow processing of longer audio segments, or shorten the audio input to meet the length requirement.

Model loading error

Explanation: This error might occur if there is an issue with downloading or loading the required model files.
Solution: Ensure that the model files are correctly downloaded and accessible. Check your internet connection and the file paths specified in the configuration.

SenseVoice 语音识别 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FunAudioLLM

Table of Content

Description
SenseVoiceNode:
SenseVoiceNode Input Parameters:
SenseVoiceNode Output Parameters:
SenseVoiceNode Usage Tips:
SenseVoiceNode Common Errors and Solutions:
Related Nodes

ComfyUI Phantom | Subject to Video

Reference-driven video generation using Wan2.1 14B

CatVTON | Amazing Virtual Try-On

CatVTON for easy and accurate virtual try-on.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

Hunyuan Image to Video | Breathtaking Motion Creator

Create magnificent movies out of still images through cinematic motion and customizable effects.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.