ComfyUI > Nodes > ComfyUI-Zonos > Zonos Generate

ComfyUI Node: Zonos Generate

Class Name

ZonosGenerate

Category
audio
Author
BuffMcBigHuge (Account age: 3170days)
Extension
ComfyUI-Zonos
Latest Updated
2025-03-07
Github Stars
0.05K

How to Install ComfyUI-Zonos

Install this extension via the ComfyUI Manager by searching for ComfyUI-Zonos
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Zonos in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Zonos Generate Description

Node for generating expressive audio using advanced text-to-speech technology in Zonos suite.

Zonos Generate:

ZonosGenerate is a node designed to facilitate the generation of audio content by leveraging advanced text-to-speech (TTS) capabilities. This node is part of the Zonos suite, which focuses on creating high-quality, emotion-infused audio outputs. The primary function of ZonosGenerate is to synthesize audio segments from given input conditions, allowing for the creation of dynamic and expressive audio content. By utilizing sophisticated models and techniques, ZonosGenerate can produce audio that captures a wide range of emotions, making it an invaluable tool for AI artists looking to enhance their projects with rich auditory experiences. The node's ability to handle complex input parameters and generate seamless audio outputs makes it a powerful asset in any creative workflow.

Zonos Generate Input Parameters:

prefix_conditioning

The prefix_conditioning parameter is a tensor that serves as the initial condition for the audio generation process. It is crucial for setting the context or theme of the audio output, influencing the overall tone and style. This parameter typically has a shape of [bsz, cond_seq_len, d_model], where bsz is the batch size, cond_seq_len is the sequence length of the conditioning input, and d_model is the dimensionality of the model. The values provided in this tensor directly impact the generated audio's characteristics, making it essential for achieving the desired emotional and thematic effects.

audio_prefix_codes

The audio_prefix_codes parameter is an optional tensor that provides additional audio context to the generation process. It has a shape of [bsz, 9, prefix_audio_seq_len] and can be used to guide the model in producing audio that aligns with specific audio patterns or sequences. This parameter is particularly useful when you want to maintain consistency with existing audio content or when you need to integrate specific audio motifs into the generated output. If not provided, the model will rely solely on the prefix_conditioning for guidance.

max_new_tokens

The max_new_tokens parameter defines the maximum number of new tokens that the model can generate during the audio synthesis process. It is set to a default value of 86 * 30, which determines the length of the generated audio segment. Adjusting this parameter allows you to control the duration of the output, with higher values resulting in longer audio clips. This parameter is essential for tailoring the audio length to fit specific project requirements.

cfg_scale

The cfg_scale parameter is a float that influences the model's creativity and adherence to the input conditions. With a default value of 2.0, this parameter balances the trade-off between generating novel audio content and staying true to the provided input. A higher cfg_scale encourages more creative outputs, while a lower value ensures closer alignment with the input conditions. This parameter is key for fine-tuning the expressiveness of the generated audio.

batch_size

The batch_size parameter specifies the number of audio samples to generate in a single batch. It is set to a default value of 1, meaning that the model will generate one audio sample per execution. Increasing the batch size can speed up the generation process when multiple samples are needed, but it may also require more computational resources. This parameter is important for optimizing the efficiency of the audio generation workflow.

sampling_params

The sampling_params parameter is a dictionary that contains additional settings for the sampling process. By default, it includes a min_p value of 0.1, which affects the diversity of the generated audio. This parameter allows for further customization of the sampling strategy, enabling you to achieve the desired balance between diversity and coherence in the audio output.

progress_bar

The progress_bar parameter is a boolean that determines whether a progress bar is displayed during the audio generation process. With a default value of True, this parameter provides visual feedback on the progress of the generation, making it easier to monitor and manage longer tasks. Disabling the progress bar can be useful in automated or batch processing scenarios where visual feedback is not necessary.

disable_torch_compile

The disable_torch_compile parameter is a boolean that controls whether the Torch compilation is disabled during the generation process. By default, it is set to False, allowing the model to leverage Torch's compilation features for optimized performance. Disabling this option can be useful for debugging or when encountering compatibility issues with specific hardware or software configurations.

callback

The callback parameter is an optional callable function that can be used to execute custom code during the audio generation process. It accepts a tensor, an integer, and another integer as inputs, providing a flexible mechanism for integrating additional logic or monitoring into the generation workflow. This parameter is particularly useful for advanced users who need to implement custom behaviors or track specific metrics during the audio synthesis.

Zonos Generate Output Parameters:

final_wave

The final_wave output parameter is a tensor representing the generated audio waveform. It is the primary output of the ZonosGenerate node, encapsulating the synthesized audio content in a format suitable for playback or further processing. The final_wave tensor is crucial for delivering the final audio product, capturing the nuances and emotional depth intended by the input parameters.

sampling_rate

The sampling_rate output parameter is an integer that indicates the sampling rate of the generated audio. It is derived from the model's autoencoder and ensures that the audio output is compatible with standard playback systems. The sampling_rate is essential for maintaining audio quality and ensuring that the generated content can be seamlessly integrated into various media projects.

Zonos Generate Usage Tips:

  • Experiment with different prefix_conditioning values to explore a wide range of emotional expressions in your audio outputs.
  • Adjust the cfg_scale parameter to find the right balance between creativity and adherence to input conditions, depending on your project's needs.
  • Utilize the audio_prefix_codes parameter to maintain consistency with existing audio content or to incorporate specific audio motifs.
  • Increase the batch_size if you need to generate multiple audio samples quickly, but be mindful of the computational resources required.

Zonos Generate Common Errors and Solutions:

"Invalid tensor shape for prefix_conditioning"

  • Explanation: This error occurs when the prefix_conditioning tensor does not match the expected shape [bsz, cond_seq_len, d_model].
  • Solution: Ensure that the input tensor is correctly shaped and matches the model's requirements.

"Audio generation exceeded max_new_tokens limit"

  • Explanation: The generated audio segment exceeds the specified max_new_tokens limit.
  • Solution: Increase the max_new_tokens parameter to allow for longer audio generation, or adjust the input conditions to fit within the current limit.

"Torch compilation error"

  • Explanation: An error related to Torch's compilation features, possibly due to hardware or software compatibility issues.
  • Solution: Set disable_torch_compile to True to bypass Torch compilation and resolve compatibility problems.

Zonos Generate Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-Zonos
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.