Visit ComfyUI Online for ready-to-use ComfyUI environment
Convert spoken language from audio to text with timestamps, spell-checking, and advanced models for efficient transcription.
The Speech Recognition node is designed to convert spoken language from audio files into written text, making it an invaluable tool for AI artists who need to transcribe audio content efficiently. Utilizing advanced models like Wav2Vec2, this node processes audio data to generate accurate transcriptions, even including timestamps for each word. Additionally, it offers spell-checking capabilities to ensure the transcriptions are polished and error-free. This node is particularly beneficial for tasks such as creating subtitles, transcribing interviews, or converting spoken notes into text, thereby saving time and enhancing productivity.
This parameter specifies the path to the audio file that you want to transcribe. The audio file should be in a format supported by the librosa
library, such as WAV. The quality and clarity of the audio file can significantly impact the accuracy of the transcription.
This parameter indicates the specific Wav2Vec2 model to be used for transcription. Different models may offer varying levels of accuracy and performance, so selecting the appropriate model can influence the quality of the transcription.
This parameter sets the language for spell-checking the transcription. It accepts language names like "English", "Spanish", "French", etc. The spell checker will correct the transcription based on the selected language, improving the overall accuracy and readability of the text.
This parameter defines the maximum number of characters allowed per frame in the transcription output. It helps in structuring the transcription into manageable segments, especially useful for creating subtitles or other time-coded text formats.
This optional parameter sets the frames per second for the transcription output. The default value is 30 fps. Adjusting this value can help synchronize the transcription with video content more accurately.
This optional parameter determines whether the transcription should be converted to uppercase. If set to True
, the entire transcription will be in uppercase letters. This can be useful for specific formatting requirements.
This output parameter returns the path to the transcribed audio file. The file will contain the transcription in a structured format, including timestamps and any applied spell-check corrections.
spell_check_language
parameter to automatically correct common spelling errors in the transcription.framestamps_max_chars
and fps
parameters to better align the transcription with video content, if applicable.uppercase
parameter to True
if you need the transcription in uppercase for specific formatting purposes.pip install pyspellchecker
and ensure it is accessible in your environment.© Copyright 2024 RunComfy. All Rights Reserved.