ComfyUI > Nodes > ComfyUI-FunAudioLLM > SenseVoice 语音识别

ComfyUI Node: SenseVoice 语音识别

Class Name

SenseVoiceNode

Category
FunAudioLLM - SenseVoice
Author
SpenserCai (Account age: 2873days)
Extension
ComfyUI-FunAudioLLM
Latest Updated
2024-11-27
Github Stars
0.05K

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FunAudioLLM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

SenseVoice 语音识别 Description

Powerful speech recognition tool for audio-to-text conversion in ComfyUI-FunAudioLLM suite, ensuring high accuracy and reliability.

SenseVoice 语音识别:

SenseVoiceNode is a powerful tool designed for speech recognition, enabling the conversion of audio input into text. This node is part of the ComfyUI-FunAudioLLM suite and leverages advanced models to process and transcribe spoken language efficiently. It is particularly beneficial for applications requiring real-time or near-real-time speech-to-text conversion, such as voice-controlled interfaces, transcription services, and accessibility tools. The node supports various configurations to optimize performance based on the length of the audio and the need for punctuation segmentation. By utilizing sophisticated models and techniques, SenseVoiceNode ensures high accuracy and reliability in recognizing and transcribing speech, making it an essential component for developers and AI artists looking to integrate voice recognition capabilities into their projects.

SenseVoice 语音识别 Input Parameters:

audio

The audio parameter is a dictionary containing the audio data to be processed. It includes the waveform of the audio and its sample rate. This parameter is crucial as it provides the raw input that the node will transcribe into text. The waveform should be properly formatted and sampled to ensure accurate transcription results.

use_fast_mode

The use_fast_mode parameter is a boolean that determines whether the node should operate in a faster processing mode. When set to True, the node processes audio more quickly but may impose restrictions on the length of the audio input, specifically if it exceeds 30 seconds. This mode is ideal for applications where speed is prioritized over handling longer audio segments.

punc_segment

The punc_segment parameter is a boolean that indicates whether punctuation segmentation should be applied to the transcribed text. When enabled, the node uses a punctuation model to enhance the readability of the output by adding appropriate punctuation marks. This is particularly useful for generating more natural and human-readable transcriptions.

SenseVoice 语音识别 Output Parameters:

rich_transcription

The rich_transcription output parameter provides the transcribed text from the audio input, enhanced with punctuation and other post-processing features. This output is the final result of the node's processing and is designed to be as accurate and readable as possible, making it suitable for direct use in applications that require text representation of spoken language.

SenseVoice 语音识别 Usage Tips:

  • To optimize performance for short audio clips, enable use_fast_mode to reduce processing time while ensuring the audio length does not exceed 30 seconds.
  • Enable punc_segment to improve the readability of the transcribed text by adding punctuation, which is especially useful for creating transcripts that are easy to understand and follow.

SenseVoice 语音识别 Common Errors and Solutions:

Audio length is too long, please set use_fast_mode to False.

  • Explanation: This error occurs when the audio input exceeds the length limit of 30 seconds while use_fast_mode is enabled.
  • Solution: Disable use_fast_mode to allow processing of longer audio segments, or shorten the audio input to meet the length requirement.

Model loading error

  • Explanation: This error might occur if there is an issue with downloading or loading the required model files.
  • Solution: Ensure that the model files are correctly downloaded and accessible. Check your internet connection and the file paths specified in the configuration.

SenseVoice 语音识别 Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FunAudioLLM
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.