High-quality image generation using a 17B parameter model.

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

Create consistent, high-resolution character designs from multiple angles with full control over emotions, lighting, and environments.

Flux Fill | Inpaint and Outpaint

Official Flux Tools - Flux Fill for Inpainting and Outpainting

ComfyUI > Nodes > ComfyUI-IF_AI_WishperSpeechNode > IF Whisper Speech🌬️

ComfyUI Node: IF Whisper Speech🌬️

Class Name

IF_WhisperSpeech

Category
ImpactFrames💥🎞️

Author
if-ai (Account age: 3147days) Extension
ComfyUI-IF_AI_WishperSpeechNode Latest Updated
2025-03-09 Github Stars
0.04K

Github Ask if-ai Current Questions Past Questions

Table of Content

Description
IF Whisper Speech🌬️:
IF Whisper Speech🌬️ Input Parameters:
IF Whisper Speech🌬️ Output Parameters:
IF Whisper Speech🌬️ Usage Tips:
IF Whisper Speech🌬️ Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-IF_AI_WishperSpeechNode

Install this extension via the ComfyUI Manager by searching for ComfyUI-IF_AI_WishperSpeechNode

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-IF_AI_WishperSpeechNode in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

IF Whisper Speech🌬️ Description

Convert text to high-quality speech audio with AI using WhisperSpeech pipeline for natural-sounding output customization.

IF Whisper Speech🌬️:

The IF_WhisperSpeech node is designed to convert text into high-quality speech audio using advanced AI techniques. This node leverages the capabilities of the WhisperSpeech pipeline to generate natural-sounding speech from the provided text input. It allows you to specify various parameters such as the speaker's voice, the speed of speech, and the overlap between audio chunks to fine-tune the output. The node is particularly useful for creating voiceovers, narrations, and other audio content where natural and clear speech is required. By using this node, you can automate the process of generating speech, saving time and effort while ensuring consistent audio quality.

IF Whisper Speech🌬️ Input Parameters:

text

This parameter accepts the text that you want to convert into speech. It supports multiline input, allowing you to provide long passages of text. The default value is a sample text about electromagnetism. The text you input here will be processed and converted into audio.

file_name

This parameter specifies the base name for the output audio file. The node will append a timestamp to this base name to create a unique file name for each generated audio. The default value is IF_whisper_speech.

speaker

This parameter allows you to choose the voice of the speaker from a list of available audio files. The options include various pre-recorded voices stored in the whisperspeech/audio directory. The default option is None, which uses the default speaker voice.

torch_compile

This boolean parameter determines whether to use Torch's compile feature for optimizing the model's performance. The default value is False. Enabling this option can improve the speed of audio generation but may require additional computational resources.

cps

This optional parameter stands for "characters per second" and controls the speed of the generated speech. The default value is 14.0, with a minimum of 10.0 and a maximum of 20.0. Adjusting this value allows you to make the speech faster or slower.

overlap

This optional parameter specifies the overlap between audio chunks in milliseconds. The default value is 100.0, with a minimum of 0.0 and a maximum of 200.0. Increasing the overlap can help create smoother transitions between chunks, improving the naturalness of the speech.

IF Whisper Speech🌬️ Output Parameters:

audios

This output parameter contains the generated audio data in a format that can be further processed or directly used in your projects. The audio is generated based on the input text and the specified parameters, ensuring high-quality and natural-sounding speech.

wav_16k_path

This output parameter provides the file path to the generated audio file, resampled to 16kHz. This file is saved in the output directory with a unique name based on the provided file_name and a timestamp. The 16kHz resampling ensures compatibility with various audio processing tools and applications.

IF Whisper Speech🌬️ Usage Tips:

To achieve the best results, provide clear and well-punctuated text. This helps the model generate more natural and accurate speech.
Experiment with different cps values to find the optimal speech speed for your specific use case. A lower value will result in slower speech, while a higher value will make the speech faster.
Use the overlap parameter to smooth out transitions between audio chunks, especially for longer texts. This can significantly enhance the naturalness of the generated speech.
If you have specific voice requirements, select an appropriate speaker from the available audio files. This allows you to customize the voice to better match your project's needs.

IF Whisper Speech🌬️ Common Errors and Solutions:

FileNotFoundError: [Errno 2] No such file or directory: 'whisperspeech/audio/`<speaker>`'

Explanation: This error occurs when the specified speaker file is not found in the whisperspeech/audio directory.
Solution: Ensure that the speaker file exists in the specified directory and that the file name is correct. If the file is missing, you may need to add it to the directory or choose a different speaker.

ValueError: Invalid cps value

Explanation: This error occurs when the cps value is outside the allowed range of 10.0 to 20.0.
Solution: Adjust the cps value to be within the specified range. The default value is 14.0, which is a good starting point.

RuntimeError: CUDA out of memory

Explanation: This error occurs when the GPU runs out of memory during the audio generation process.
Solution: Try reducing the complexity of the input text or disabling the torch_compile option. You can also try running the node on a machine with more GPU memory.

AssertionError: Length of stoks exceeds limit

Explanation: This error occurs when the length of the generated tokens exceeds the allowed limit, especially when overlap is too high.
Solution: Reduce the overlap value to ensure that the length of the tokens stays within the allowed limit. The default value of 100.0 is usually sufficient.

IF Whisper Speech🌬️ Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-IF_AI_WishperSpeechNode

Table of Content

Description
IF Whisper Speech🌬️:
IF Whisper Speech🌬️ Input Parameters:
IF Whisper Speech🌬️ Output Parameters:
IF Whisper Speech🌬️ Usage Tips:
IF Whisper Speech🌬️ Common Errors and Solutions:
Related Nodes

Hunyuan Video | Video to Video

Combine text prompt and source video to generate new video.

Flux Consistent Characters | Input Image

Create consistent characters and ensure they look uniform using your images.

EchoMimic | Audio-driven Portrait Animations

Generate realistic talking heads and body gestures synced with the provided audio.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.