Install this extension via the ComfyUI Manager by searching
for ComfyUI-WhisperX
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-WhisperX in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-WhisperX is a custom node for ComfyUI that facilitates audio subtitling by integrating functionalities from WhisperX and Translators repositories.
ComfyUI-WhisperX Introduction
ComfyUI-WhisperX is an extension designed to enhance the capabilities of ComfyUI by integrating advanced audio subtitling features. This extension leverages the power of WhisperX and Translators to provide accurate and efficient transcription and translation of audio files. It is particularly useful for AI artists who need to generate subtitles for their audio content, offering a seamless way to create and translate subtitles with multiple speaker identification.
How ComfyUI-WhisperX Works
ComfyUI-WhisperX works by processing audio files to generate subtitles and translations. Here’s a simplified breakdown of how it operates:
Audio Input: You provide an audio file to the extension.
Transcription: The extension uses WhisperX to transcribe the audio into text. WhisperX is known for its high accuracy and speed, making it ideal for real-time applications.
Translation: If needed, the transcribed text can be translated into multiple languages using the Translators library, which supports a wide range of translation engines.
Speaker Diarization: The extension can identify and label different speakers in the audio using Pyannote-Audio, which helps in creating more organized and understandable subtitles.
Output: The final output can be exported as an SRT file, which is a common format for subtitles.
ComfyUI-WhisperX Features
Export SRT Files: The extension supports exporting subtitles in the SRT format, which is widely used for video subtitles.
Translation Support: With the help of the Translators library, the extension can translate subtitles into multiple languages, making your content accessible to a global audience.
Multiple Speaker Diarization: Using Pyannote-Audio, the extension can distinguish between different speakers in the audio, providing more detailed and accurate subtitles.
Custom Nodes Integration: ComfyUI-WhisperX allows the integration of custom nodes, enabling you to extend its functionality according to your specific needs.
ComfyUI-WhisperX Models
ComfyUI-WhisperX utilizes different models for transcription and speaker diarization:
WhisperX Models: These models are used for transcribing audio into text. They are known for their high accuracy and speed.
Pyannote-Audio Models: These models are used for speaker diarization, which helps in identifying and labeling different speakers in the audio.
When to Use Each Model
WhisperX Models: Use these models when you need accurate and fast transcription of audio files.
Pyannote-Audio Models: Use these models when your audio contains multiple speakers, and you need to identify and label each speaker separately.
Troubleshooting ComfyUI-WhisperX
Here are some common issues you might encounter while using ComfyUI-WhisperX and their solutions:
Common Issues and Solutions
FFmpeg Not Working:
Solution: Ensure that FFmpeg is installed and accessible from the command line. For Linux, you can install it using apt install ffmpeg. For Windows, you can use WingetUI to install it automatically.
Hugging Face Weights Not Downloading:
Solution: Make sure your internet connection is stable and that you have access to Hugging Face. If you are in China, you might need to configure your environment to use hf-mirror (https://hf-mirror.com/).
Speaker Diarization Not Working:
Solution: Ensure you have accepted the user conditions for the required Pyannote models and created an access token on Hugging Face. Follow the steps provided in the setup instructions.
Frequently Asked Questions
Q: How do I install FFmpeg?
A: For Linux, use apt install ffmpeg. For Windows, use WingetUI to install it automatically.
Q: How do I get the Hugging Face access token?
A: Create an access token at Hugging Face Tokens (https://hf.co/settings/tokens) and use it in your configuration.
Learn More about ComfyUI-WhisperX
To learn more about ComfyUI-WhisperX, you can explore the following resources:
Demo Video: Watch a demo to see ComfyUI-WhisperX in action.
By leveraging these resources, you can get the most out of ComfyUI-WhisperX and enhance your audio subtitling projects.