Visit ComfyUI Online for ready-to-use ComfyUI environment
Powerful text-to-speech node with advanced voice synthesis for high-quality spoken audio in AI projects.
PiperTTS is a powerful text-to-speech (TTS) node designed to convert written text into high-quality spoken audio. This node leverages advanced voice synthesis models to generate natural-sounding speech, making it an invaluable tool for AI artists looking to add a vocal element to their projects. Whether you are creating voiceovers for animations, generating audio for interactive applications, or simply experimenting with TTS technology, PiperTTS provides a seamless and efficient way to produce professional-grade audio. The node supports multiple voices and quality settings, allowing you to customize the output to suit your specific needs. By automating the process of downloading and managing voice models, PiperTTS ensures that you have access to the latest and most accurate TTS capabilities without the need for extensive technical knowledge.
The text
parameter is a string input that contains the written content you want to convert into speech. This parameter is essential as it forms the basis of the audio output. The text can be multiline, allowing for longer passages to be synthesized. It is important to ensure that the text is not empty, as the node will raise a ValueError if no text is provided. There are no specific minimum or maximum values for the length of the text, but practical limits may be imposed by the system's memory and processing capabilities.
The voice
parameter allows you to select the desired voice for the TTS output. This parameter is populated with a list of available voices, excluding quality specifications. The choice of voice significantly impacts the character and tone of the synthesized speech. The available voices are dynamically retrieved from the TTS voice repository, ensuring you have access to a variety of options. There is no default value, so you must select a voice from the provided list.
The quality
parameter determines the quality level of the synthesized speech. It offers three options: "high," "medium," and "low," with "high" being the default setting. The quality setting affects the clarity and naturalness of the audio output, with higher quality settings providing more refined and realistic speech at the cost of increased processing time and resource usage. Selecting the appropriate quality level depends on the specific requirements of your project and the available computational resources.
The audio_path
parameter is a string output that provides the file path to the generated audio file. This path points to a .wav file containing the synthesized speech based on the input text, voice, and quality settings. The audio file is saved in a designated output directory, making it easy to locate and use in your projects. The audio_path
is crucial for accessing and utilizing the generated audio, whether for playback, further processing, or integration into other applications.
text
parameter is not empty to avoid errors and ensure meaningful audio output.voice
options to find the one that best suits the tone and style of your project.quality
setting based on your needs; use "high" for the best audio quality and "low" for faster processing times.audio_path
to easily locate and manage your generated audio files.text
parameter is left empty.text
parameter.<voice_with_quality>
does not exist. Refer to https://github.com/rhasspy/piper/blob/master/VOICES.mdvoice
and quality
does not match any available models.© Copyright 2024 RunComfy. All Rights Reserved.