RunComfy

Wan 2.2 FLF2V | First-Last Frame Video Generation

Generate smooth videos from a start and end frame using Wan 2.2 FLF2V.

AnimateDiff + ControlNet + AutoMask | Comic Style

Effortlessly restyle videos, converting realistic characters into anime while keeping the original backgrounds intact.

Easy Video Upscaler for Footage | Pro HD Enhancement

Turn low-res clips into sharp, natural HD videos fast.

One to All Animation | Pose-Based Video Maker

Make smooth pose-following videos with stunning motion consistency.

ComfyUI > Nodes > ComfyUI-WhisperX

ComfyUI Extension: ComfyUI-WhisperX

Repo Name

ComfyUI-WhisperX

Author
AIFSH (Account age: 516 days) Nodes
View all nodes(2) Latest Updated
2025-04-01 Github Stars
0.04K

Github Ask AIFSH Current Questions Past Questions

Table of Content

Description
How ComfyUI-WhisperX Works
ComfyUI-WhisperX Features
ComfyUI-WhisperX Models
Troubleshooting ComfyUI-WhisperX
Learn More about ComfyUI-WhisperX
Related Nodes

How to Install ComfyUI-WhisperX

Install this extension via the ComfyUI Manager by searching for ComfyUI-WhisperX

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-WhisperX in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-WhisperX Description

ComfyUI-WhisperX is a custom node for ComfyUI that facilitates audio subtitling by integrating functionalities from WhisperX and Translators repositories.

ComfyUI-WhisperX Introduction

ComfyUI-WhisperX is an extension designed to enhance the capabilities of ComfyUI by integrating advanced audio subtitling features. This extension leverages the power of WhisperX and Translators to provide accurate and efficient transcription and translation of audio files. It is particularly useful for AI artists who need to generate subtitles for their audio content, offering a seamless way to create and translate subtitles with multiple speaker identification.

How ComfyUI-WhisperX Works

ComfyUI-WhisperX works by processing audio files to generate subtitles and translations. Here’s a simplified breakdown of how it operates:

Audio Input: You provide an audio file to the extension.
Transcription: The extension uses WhisperX to transcribe the audio into text. WhisperX is known for its high accuracy and speed, making it ideal for real-time applications.
Translation: If needed, the transcribed text can be translated into multiple languages using the Translators library, which supports a wide range of translation engines.
Speaker Diarization: The extension can identify and label different speakers in the audio using Pyannote-Audio, which helps in creating more organized and understandable subtitles.
Output: The final output can be exported as an SRT file, which is a common format for subtitles.

ComfyUI-WhisperX Features

Export SRT Files: The extension supports exporting subtitles in the SRT format, which is widely used for video subtitles.
Translation Support: With the help of the Translators library, the extension can translate subtitles into multiple languages, making your content accessible to a global audience.
Multiple Speaker Diarization: Using Pyannote-Audio, the extension can distinguish between different speakers in the audio, providing more detailed and accurate subtitles.
Custom Nodes Integration: ComfyUI-WhisperX allows the integration of custom nodes, enabling you to extend its functionality according to your specific needs.

ComfyUI-WhisperX Models

ComfyUI-WhisperX utilizes different models for transcription and speaker diarization:

WhisperX Models: These models are used for transcribing audio into text. They are known for their high accuracy and speed.
Pyannote-Audio Models: These models are used for speaker diarization, which helps in identifying and labeling different speakers in the audio.

When to Use Each Model

WhisperX Models: Use these models when you need accurate and fast transcription of audio files.
Pyannote-Audio Models: Use these models when your audio contains multiple speakers, and you need to identify and label each speaker separately.

Troubleshooting ComfyUI-WhisperX

Here are some common issues you might encounter while using ComfyUI-WhisperX and their solutions:

Common Issues and Solutions

FFmpeg Not Working:

Solution: Ensure that FFmpeg is installed and accessible from the command line. For Linux, you can install it using apt install ffmpeg. For Windows, you can use WingetUI to install it automatically.

Hugging Face Weights Not Downloading:

Solution: Make sure your internet connection is stable and that you have access to Hugging Face. If you are in China, you might need to configure your environment to use hf-mirror (https://hf-mirror.com/).

Speaker Diarization Not Working:

Solution: Ensure you have accepted the user conditions for the required Pyannote models and created an access token on Hugging Face. Follow the steps provided in the setup instructions.

Frequently Asked Questions

Q: How do I install FFmpeg?
A: For Linux, use apt install ffmpeg. For Windows, use WingetUI to install it automatically.
Q: How do I get the Hugging Face access token?
A: Create an access token at Hugging Face Tokens (https://hf.co/settings/tokens) and use it in your configuration.

Learn More about ComfyUI-WhisperX

To learn more about ComfyUI-WhisperX, you can explore the following resources:

WhisperX GitHub Repository: For detailed information on WhisperX and its capabilities.
Translators GitHub Repository: To understand the translation capabilities and supported languages.
Pyannote-Audio GitHub Repository: For more information on speaker diarization and related models.
Demo Video: Watch a demo to see ComfyUI-WhisperX in action. By leveraging these resources, you can get the most out of ComfyUI-WhisperX and enhance your audio subtitling projects.

ComfyUI-WhisperX Related Nodes

PreView SRT

WhisperX Node

Table of Content

Description
How ComfyUI-WhisperX Works
ComfyUI-WhisperX Features
ComfyUI-WhisperX Models
Troubleshooting ComfyUI-WhisperX
Learn More about ComfyUI-WhisperX
Related Nodes

SeedVR2 V2.5 | AI Video Upscaling Workflow

Upscale videos fast with sharp, smooth, cinematic results.

HiDream-I1 | T2I

High-quality image generation using a 17B parameter model.

Push-In Camera - A Motion LoRA for Wan 2.1

One image in, blockbuster push-in shots out. Zero complexity.

Wan 2.2 Lightning T2V I2V | 4-Step Ultra Fast

Wan 2.2 now 20x faster! T2V + I2V in 4 steps.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Models, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy