ComfyUI  >  Nodes  >  ComfyUI Whisper

ComfyUI Extension: ComfyUI Whisper

Repo Name


yuvraj108c (Account age: 2153 days)
View all nodes (4)
Latest Updated
Github Stars

How to Install ComfyUI Whisper

Install this extension via the ComfyUI Manager by searching for  ComfyUI Whisper
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI Whisper in the search bar
After installation, click the  Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Cloud for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI Whisper Description

ComfyUI Whisper enables audio transcription and video subtitling within ComfyUI, streamlining the process of converting spoken content into text and adding accurate subtitles to video files.

ComfyUI Whisper Introduction

ComfyUI-Whisper is an extension that allows you to transcribe audio and add subtitles to videos using the Whisper model by OpenAI, integrated within the ComfyUI framework. This extension is particularly useful for AI artists who want to add subtitles to their video content effortlessly. Whether you are creating tutorials, art videos, or any other type of multimedia content, ComfyUI-Whisper can help you generate accurate transcriptions and subtitles, enhancing the accessibility and reach of your work.

How ComfyUI Whisper Works

ComfyUI-Whisper leverages the Whisper model, a state-of-the-art speech recognition system developed by OpenAI. The model processes audio input to generate text transcriptions and timestamps for each segment and word. These transcriptions can then be overlaid onto video frames as subtitles. The extension simplifies this process by providing easy-to-use nodes within the ComfyUI environment, allowing you to focus on your creative work without worrying about the technical details.

Basic Workflow

  1. Audio Input: The audio from your video is extracted and fed into the Whisper model.
  2. Transcription: The Whisper model transcribes the audio, generating text and timestamps.
  3. Subtitle Overlay: The transcriptions are then added to the video frames as subtitles, which can be customized in terms of font, color, and position.

ComfyUI Whisper Features

Apply Whisper

  • Function: Transcribes audio and provides timestamps for each segment and word.
  • Customization: You can choose different models based on your needs (e.g., faster transcription vs. higher accuracy).

Add Subtitles To Frames

  • Function: Adds subtitles directly onto video frames.
  • Customization: You can specify the font family, font color, and x/y positions of the subtitles.
  • Example: Adjusting the font size and color to match the aesthetic of your video.

Add Subtitles To Background (Experimental)

  • Function: Adds subtitles like a word cloud on blank frames.
  • Customization: This feature is experimental and may require some tweaking to get the desired effect.

ComfyUI Whisper Models

ComfyUI-Whisper supports various models from the Whisper suite, each offering different trade-offs between speed and accuracy:

  • Tiny: Fastest but less accurate, suitable for quick transcriptions.
  • Base: Balanced speed and accuracy.
  • Small: More accurate, slower than Tiny and Base.
  • Medium: High accuracy, slower processing.
  • Large: Most accurate, slowest processing, and requires the most VRAM.

When to Use Each Model

  • Tiny/Base: Use these models for quick drafts or when working with high-quality audio.
  • Small/Medium: Ideal for more detailed work where accuracy is important.
  • Large: Best for final transcriptions where the highest accuracy is required.

Troubleshooting ComfyUI Whisper

Common Issues and Solutions

  1. Model Loading Errors:
  • Solution: Ensure you have enough VRAM available. Try using a smaller model if you encounter memory issues.
  1. Inaccurate Transcriptions:
  • Solution: Use a higher accuracy model like Medium or Large. Ensure your audio quality is good and clear.
  1. Subtitle Positioning Issues:
  • Solution: Adjust the x/y positions in the Add Subtitles To Frames node to better fit your video layout.

Frequently Asked Questions

  • Q: Can I use ComfyUI-Whisper for non-English languages?
  • A: Yes, Whisper supports multiple languages. Make sure to select the appropriate model for your language.
  • Q: How do I improve the accuracy of the transcriptions?
  • A: Use higher accuracy models and ensure your audio is clear and free from background noise.

Learn More about ComfyUI Whisper

For more detailed tutorials, documentation, and community support, you can explore the following resources:

  • These resources provide comprehensive guides and examples to help you get the most out of ComfyUI-Whisper.

ComfyUI Whisper Related Nodes


© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.