RunComfy

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

SkyReels-A2 | Multi-Element Video Generation

Combine multi elements into dynamic videos with precision.

CogvideoX Fun | Video-to-Video Model

CogVideoX Fun: Advanced video-to-video model for high-quality video generation.

Hunyuan Video | Text to Video

Generates videos from text prompts.

ComfyUI > Nodes > ComfyUI-F5-TTS > F5-TTS Audio

ComfyUI Node: F5-TTS Audio

Class Name

F5TTSAudio

Category
audio

Author
niknah (Account age: 5004days) Extension
ComfyUI-F5-TTS Latest Updated
2025-04-05 Github Stars
0.16K

Github Ask niknah Current Questions Past Questions

Table of Content

Description
F5TTSAudio:
F5TTSAudio Input Parameters:
F5TTSAudio Output Parameters:
F5TTSAudio Usage Tips:
F5TTSAudio Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-F5-TTS

Install this extension via the ComfyUI Manager by searching for ComfyUI-F5-TTS

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-F5-TTS in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

F5-TTS Audio Description

Generate high-quality audio from text inputs using advanced TTS technology for realistic speech synthesis customization.

F5-TTS Audio:

F5TTSAudio is a node designed to facilitate the generation of high-quality audio from text inputs using advanced text-to-speech (TTS) technology. This node leverages sophisticated models to synthesize speech that closely mimics natural human intonation and rhythm, providing a seamless and realistic auditory experience. The primary goal of F5TTSAudio is to transform written text into spoken words, making it an invaluable tool for applications such as voiceovers, audiobooks, and interactive AI systems. By utilizing this node, you can achieve a high degree of customization in speech synthesis, including adjustments to speed and cross-fade duration, ensuring that the generated audio meets specific requirements and preferences.

F5-TTS Audio Input Parameters:

ref_audio_orig

This parameter represents the original reference audio input, which is used to guide the synthesis process. It helps in maintaining consistency in voice characteristics and style. The quality and characteristics of this audio can significantly impact the final output, as it serves as a template for the generated speech.

ref_text

The reference text input is the original text that corresponds to the reference audio. It is used to align the generated speech with the intended content and style. This parameter ensures that the synthesized audio accurately reflects the nuances and context of the original text.

gen_text

This parameter is the text that you want to convert into speech. It is the primary content that will be synthesized into audio. The clarity and structure of this text can affect the intelligibility and naturalness of the generated speech.

model

The model parameter allows you to select the TTS model to be used for synthesis. Options typically include models like "F5-TTS" and "E2-TTS," each offering different characteristics and capabilities. Choosing the right model can influence the quality and style of the synthesized audio.

remove_silence

This boolean parameter determines whether silence should be removed from the generated audio. Enabling this option can result in a more concise and fluid audio output, which is particularly useful for applications requiring continuous speech.

cross_fade_duration

This parameter specifies the duration of cross-fading between audio segments, measured in seconds. It helps in smoothing transitions and reducing abrupt changes in the audio, enhancing the overall listening experience. The default value is typically set to 0.15 seconds.

speed

The speed parameter controls the playback speed of the synthesized audio. Adjusting this value allows you to speed up or slow down the speech, providing flexibility in matching the desired pacing and tempo. The default speed is usually set to 1, representing normal speed.

F5-TTS Audio Output Parameters:

final_sample_rate

This output parameter indicates the sample rate of the synthesized audio, which is a measure of the number of samples of audio carried per second. It is crucial for ensuring compatibility with various audio playback systems and maintaining audio quality.

final_wave

The final_wave parameter represents the actual audio waveform data of the synthesized speech. This data can be used for playback, further processing, or storage. It is the primary output of the TTS process, encapsulating the generated speech in a format ready for use.

spectrogram_path

This parameter provides the file path to the spectrogram image of the synthesized audio. A spectrogram is a visual representation of the spectrum of frequencies in the audio signal as it varies with time. It is useful for analyzing the audio characteristics and verifying the synthesis quality.

F5-TTS Audio Usage Tips:

Ensure that the reference audio and text are well-aligned to achieve the best synthesis results, as discrepancies can lead to unnatural speech output.
Experiment with different models to find the one that best suits your needs, as each model may offer unique advantages in terms of voice quality and style.
Use the cross_fade_duration parameter to smooth out transitions in the audio, especially when synthesizing long passages of text.

F5-TTS Audio Common Errors and Solutions:

"Model not found"

Explanation: This error occurs when the specified TTS model is not available or incorrectly specified.
Solution: Verify that the model name is correctly entered and that the model is installed and accessible in your environment.

"Audio processing failed"

Explanation: This error indicates a problem during the audio synthesis process, possibly due to incompatible input parameters or system limitations.
Solution: Check the input parameters for correctness and ensure that your system meets the necessary requirements for audio processing. Adjust parameters like speed and cross-fade duration if needed.

F5-TTS Audio Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-F5-TTS

Table of Content

Description
F5TTSAudio:
F5TTSAudio Input Parameters:
F5TTSAudio Output Parameters:
F5TTSAudio Usage Tips:
F5TTSAudio Common Errors and Solutions:
Related Nodes

ComfyUI Vid2Vid Dance Transfer

Transfers the motion and style from a source video onto a target image or object.

IC-Light | Video Relighting | AnimateDiff

Relight your videos with light maps and prompts

Wan 2.1 Control LoRA | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

HiDream-I1 | T2I

High-quality image generation using a 17B parameter model.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.

Support

Resources

Legal

RunComfy