Install this extension via the ComfyUI Manager by searching
for ComfyUI Whisper
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI Whisper in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI Whisper enables audio transcription and video subtitling within ComfyUI, streamlining the process of converting spoken content into text and adding accurate subtitles to video files.
ComfyUI Whisper Introduction
ComfyUI-Whisper is an extension that allows you to transcribe audio and add subtitles to videos using the Whisper model by OpenAI, integrated within the ComfyUI framework. This extension is particularly useful for AI artists who want to add subtitles to their video content effortlessly. Whether you are creating tutorials, art videos, or any other type of multimedia content, ComfyUI-Whisper can help you generate accurate transcriptions and subtitles, enhancing the accessibility and reach of your work.
ComfyUI-Whisper leverages the Whisper model, a state-of-the-art speech recognition system developed by OpenAI. The model processes audio input to generate text transcriptions and timestamps for each segment and word. These transcriptions can then be overlaid onto video frames as subtitles. The extension simplifies this process by providing easy-to-use nodes within the ComfyUI environment, allowing you to focus on your creative work without worrying about the technical details.
Basic Workflow
Audio Input: The audio from your video is extracted and fed into the Whisper model.
Transcription: The Whisper model transcribes the audio, generating text and timestamps.
Subtitle Overlay: The transcriptions are then added to the video frames as subtitles, which can be customized in terms of font, color, and position.
ComfyUI Whisper Features
Apply Whisper
Function: Transcribes audio and provides timestamps for each segment and word.
Customization: You can choose different models based on your needs (e.g., faster transcription vs. higher accuracy).
Add Subtitles To Frames
Function: Adds subtitles directly onto video frames.
Customization: You can specify the font family, font color, and x/y positions of the subtitles.
Example: Adjusting the font size and color to match the aesthetic of your video.
Add Subtitles To Background (Experimental)
Function: Adds subtitles like a word cloud on blank frames.
Customization: This feature is experimental and may require some tweaking to get the desired effect.
ComfyUI Whisper Models
ComfyUI-Whisper supports various models from the Whisper suite, each offering different trade-offs between speed and accuracy:
Tiny: Fastest but less accurate, suitable for quick transcriptions.
Base: Balanced speed and accuracy.
Small: More accurate, slower than Tiny and Base.
Medium: High accuracy, slower processing.
Large: Most accurate, slowest processing, and requires the most VRAM.
When to Use Each Model
Tiny/Base: Use these models for quick drafts or when working with high-quality audio.
Small/Medium: Ideal for more detailed work where accuracy is important.
Large: Best for final transcriptions where the highest accuracy is required.
Troubleshooting ComfyUI Whisper
Common Issues and Solutions
Model Loading Errors:
Solution: Ensure you have enough VRAM available. Try using a smaller model if you encounter memory issues.
Inaccurate Transcriptions:
Solution: Use a higher accuracy model like Medium or Large. Ensure your audio quality is good and clear.
Subtitle Positioning Issues:
Solution: Adjust the x/y positions in the Add Subtitles To Frames node to better fit your video layout.
Frequently Asked Questions
Q: Can I use ComfyUI-Whisper for non-English languages?
A: Yes, Whisper supports multiple languages. Make sure to select the appropriate model for your language.
Q: How do I improve the accuracy of the transcriptions?
A: Use higher accuracy models and ensure your audio is clear and free from background noise.
Learn More about ComfyUI Whisper
For more detailed tutorials, documentation, and community support, you can explore the following resources: