ComfyUI > Nodes > ComfyUI-WhisperX

ComfyUI Extension: ComfyUI-WhisperX

Repo Name

ComfyUI-WhisperX

Author
AIFSH (Account age: 271 days)
Nodes
View all nodes(2)
Latest Updated
2024-06-14
Github Stars
0.03K

How to Install ComfyUI-WhisperX

Install this extension via the ComfyUI Manager by searching for ComfyUI-WhisperX
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-WhisperX in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-WhisperX Description

ComfyUI-WhisperX is a custom node for ComfyUI that facilitates audio subtitling by integrating functionalities from WhisperX and Translators repositories.

ComfyUI-WhisperX Introduction

ComfyUI-WhisperX is an extension designed to enhance the capabilities of ComfyUI by integrating advanced audio subtitling features. This extension leverages the power of WhisperX and Translators to provide accurate and efficient transcription and translation of audio files. It is particularly useful for AI artists who need to generate subtitles for their audio content, offering a seamless way to create and translate subtitles with multiple speaker identification.

How ComfyUI-WhisperX Works

ComfyUI-WhisperX works by processing audio files to generate subtitles and translations. Here’s a simplified breakdown of how it operates:

  1. Audio Input: You provide an audio file to the extension.
  2. Transcription: The extension uses WhisperX to transcribe the audio into text. WhisperX is known for its high accuracy and speed, making it ideal for real-time applications.
  3. Translation: If needed, the transcribed text can be translated into multiple languages using the Translators library, which supports a wide range of translation engines.
  4. Speaker Diarization: The extension can identify and label different speakers in the audio using Pyannote-Audio, which helps in creating more organized and understandable subtitles.
  5. Output: The final output can be exported as an SRT file, which is a common format for subtitles.

ComfyUI-WhisperX Features

  • Export SRT Files: The extension supports exporting subtitles in the SRT format, which is widely used for video subtitles.
  • Translation Support: With the help of the Translators library, the extension can translate subtitles into multiple languages, making your content accessible to a global audience.
  • Multiple Speaker Diarization: Using Pyannote-Audio, the extension can distinguish between different speakers in the audio, providing more detailed and accurate subtitles.
  • Custom Nodes Integration: ComfyUI-WhisperX allows the integration of custom nodes, enabling you to extend its functionality according to your specific needs.

ComfyUI-WhisperX Models

ComfyUI-WhisperX utilizes different models for transcription and speaker diarization:

  • WhisperX Models: These models are used for transcribing audio into text. They are known for their high accuracy and speed.
  • Pyannote-Audio Models: These models are used for speaker diarization, which helps in identifying and labeling different speakers in the audio.

When to Use Each Model

  • WhisperX Models: Use these models when you need accurate and fast transcription of audio files.
  • Pyannote-Audio Models: Use these models when your audio contains multiple speakers, and you need to identify and label each speaker separately.

Troubleshooting ComfyUI-WhisperX

Here are some common issues you might encounter while using ComfyUI-WhisperX and their solutions:

Common Issues and Solutions

  1. FFmpeg Not Working:
  • Solution: Ensure that FFmpeg is installed and accessible from the command line. For Linux, you can install it using apt install ffmpeg. For Windows, you can use WingetUI to install it automatically.
  1. Hugging Face Weights Not Downloading:
  • Solution: Make sure your internet connection is stable and that you have access to Hugging Face. If you are in China, you might need to configure your environment to use hf-mirror (https://hf-mirror.com/).
  1. Speaker Diarization Not Working:
  • Solution: Ensure you have accepted the user conditions for the required Pyannote models and created an access token on Hugging Face. Follow the steps provided in the setup instructions.

Frequently Asked Questions

  • Q: How do I install FFmpeg?
  • A: For Linux, use apt install ffmpeg. For Windows, use WingetUI to install it automatically.
  • Q: How do I get the Hugging Face access token?
  • A: Create an access token at Hugging Face Tokens (https://hf.co/settings/tokens) and use it in your configuration.

Learn More about ComfyUI-WhisperX

To learn more about ComfyUI-WhisperX, you can explore the following resources:

ComfyUI-WhisperX Related Nodes

RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.