Design motion paths to animate still photos into videos.

Hunyuan Video | Text to Video

Generates videos from text prompts.

Product Relighting | Magnific.AI Relight Alternative

Elevate your product photography effortlessly, a top alternative to Magnific.AI Relight.

FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

ComfyUI > Nodes > ComfyUI Whisper

ComfyUI Extension: ComfyUI Whisper

Repo Name

ComfyUI-Whisper

Author
yuvraj108c (Account age: 2437 days) Nodes
View all nodes(4) Latest Updated
2024-08-06 Github Stars
0.1K

Github Ask yuvraj108c Current Questions Past Questions

Table of Content

Description
How ComfyUI Whisper Works
ComfyUI Whisper Features
ComfyUI Whisper Models
Troubleshooting ComfyUI Whisper
Learn More about ComfyUI Whisper
Related Nodes

How to Install ComfyUI Whisper

Install this extension via the ComfyUI Manager by searching for ComfyUI Whisper

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI Whisper in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI Whisper Description

ComfyUI Whisper enables audio transcription and video subtitling within ComfyUI, streamlining the process of converting spoken content into text and adding accurate subtitles to video files.

ComfyUI Whisper Introduction

ComfyUI-Whisper is an extension that allows you to transcribe audio and add subtitles to videos using the Whisper model by OpenAI, integrated within the ComfyUI framework. This extension is particularly useful for AI artists who want to add subtitles to their video content effortlessly. Whether you are creating tutorials, art videos, or any other type of multimedia content, ComfyUI-Whisper can help you generate accurate transcriptions and subtitles, enhancing the accessibility and reach of your work.

demo-image

How ComfyUI Whisper Works

ComfyUI-Whisper leverages the Whisper model, a state-of-the-art speech recognition system developed by OpenAI. The model processes audio input to generate text transcriptions and timestamps for each segment and word. These transcriptions can then be overlaid onto video frames as subtitles. The extension simplifies this process by providing easy-to-use nodes within the ComfyUI environment, allowing you to focus on your creative work without worrying about the technical details.

Basic Workflow

Audio Input: The audio from your video is extracted and fed into the Whisper model.
Transcription: The Whisper model transcribes the audio, generating text and timestamps.
Subtitle Overlay: The transcriptions are then added to the video frames as subtitles, which can be customized in terms of font, color, and position.

ComfyUI Whisper Features

Apply Whisper

Function: Transcribes audio and provides timestamps for each segment and word.
Customization: You can choose different models based on your needs (e.g., faster transcription vs. higher accuracy).

Add Subtitles To Frames

Function: Adds subtitles directly onto video frames.
Customization: You can specify the font family, font color, and x/y positions of the subtitles.
Example: Adjusting the font size and color to match the aesthetic of your video.

Add Subtitles To Background (Experimental)

Function: Adds subtitles like a word cloud on blank frames.
Customization: This feature is experimental and may require some tweaking to get the desired effect.

ComfyUI Whisper Models

ComfyUI-Whisper supports various models from the Whisper suite, each offering different trade-offs between speed and accuracy:

Tiny: Fastest but less accurate, suitable for quick transcriptions.
Base: Balanced speed and accuracy.
Small: More accurate, slower than Tiny and Base.
Medium: High accuracy, slower processing.
Large: Most accurate, slowest processing, and requires the most VRAM.

When to Use Each Model

Tiny/Base: Use these models for quick drafts or when working with high-quality audio.
Small/Medium: Ideal for more detailed work where accuracy is important.
Large: Best for final transcriptions where the highest accuracy is required.

Troubleshooting ComfyUI Whisper

Common Issues and Solutions

Model Loading Errors:

Solution: Ensure you have enough VRAM available. Try using a smaller model if you encounter memory issues.

Inaccurate Transcriptions:

Solution: Use a higher accuracy model like Medium or Large. Ensure your audio quality is good and clear.

Subtitle Positioning Issues:

Solution: Adjust the x/y positions in the Add Subtitles To Frames node to better fit your video layout.

Frequently Asked Questions

Q: Can I use ComfyUI-Whisper for non-English languages?
A: Yes, Whisper supports multiple languages. Make sure to select the appropriate model for your language.
Q: How do I improve the accuracy of the transcriptions?
A: Use higher accuracy models and ensure your audio is clear and free from background noise.

Learn More about ComfyUI Whisper

For more detailed tutorials, documentation, and community support, you can explore the following resources:

Whisper GitHub Repository
ComfyUI GitHub Repository
Whisper Blog
Whisper Paper
ComfyUI Community Manual These resources provide comprehensive guides and examples to help you get the most out of ComfyUI-Whisper.

ComfyUI Whisper Related Nodes

Add Subtitles To Background

Add Subtitles To Frames

Apply Whisper

Resize Cropped Subtitles

Table of Content

Description
How ComfyUI Whisper Works
ComfyUI Whisper Features
ComfyUI Whisper Models
Troubleshooting ComfyUI Whisper
Learn More about ComfyUI Whisper
Related Nodes

MMAudio | Video-to-Audio

MMAudio: Advanced video-to-audio model for high-quality audio generation.

MatAnyone Video Matting | Single Mask Removal

Remove video backgrounds with one mask frame for perfect subject isolation.

SkyReels V1 | Human-Focused Video Creation

Generate cinematic human videos with genuine facial expressions and natural movements from text or images.

Era3D | ComfyUI 3D Pack

Generate 3D content, from multi-view images to detailed meshes.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.