FramePack Wrapper | Efficient long Video Generation

Create stable, 60s+ long videos with minimal cloud resources.

VACE 14B: All-in-One Video Creation & Editing

Create, edit and transform videos with the powerful VACE Wan2.1 14B.

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

Hunyuan Video | Image-Prompt to Video

Convert an image and a text prompt into a dynamic video.

ComfyUI > Nodes > CosyVoice2 for ComfyUI > NTCosyVoiceInstruct2Sampler

ComfyUI Node: NTCosyVoiceInstruct2Sampler

Class Name

NTCosyVoiceInstruct2Sampler

Category
Nineton Nodes

Author
muxueChen (Account age: 3245days) Extension
CosyVoice2 for ComfyUI Latest Updated
2025-02-11 Github Stars
0.12K

Github Ask muxueChen Current Questions Past Questions

Table of Content

Description
NTCosyVoiceInstruct2Sampler:
NTCosyVoiceInstruct2Sampler Input Parameters:
NTCosyVoiceInstruct2Sampler Output Parameters:
NTCosyVoiceInstruct2Sampler Usage Tips:
NTCosyVoiceInstruct2Sampler Common Errors and Solutions:
Related Nodes

How to Install CosyVoice2 for ComfyUI

Install this extension via the ComfyUI Manager by searching for CosyVoice2 for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter CosyVoice2 for ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

NTCosyVoiceInstruct2Sampler Description

Sophisticated node for transforming text instructions into audio using advanced TTS capabilities.

NTCosyVoiceInstruct2Sampler:

The NTCosyVoiceInstruct2Sampler is a sophisticated node designed to transform textual instructions into audio outputs, leveraging advanced text-to-speech (TTS) capabilities. This node is particularly beneficial for users who wish to generate speech from text with specific instructions, allowing for a more nuanced and controlled audio output. By integrating both text and instructive prompts, it provides a versatile tool for creating dynamic and contextually rich audio content. The node is part of the Nineton Nodes collection, which is known for its innovative approach to audio processing, making it an essential component for AI artists looking to enhance their projects with high-quality speech synthesis.

NTCosyVoiceInstruct2Sampler Input Parameters:

audio

The audio parameter is a required input that provides the initial audio waveform and its sample rate. This parameter is crucial as it serves as the base audio from which the node will generate the new speech output. The waveform should be a tensor, and the sample rate should be an integer, typically representing the number of samples per second. This input allows the node to align the generated speech with the provided audio characteristics.

speed

The speed parameter controls the playback speed of the generated speech. It is a float value with a default of 1.0, allowing for a range between 0.5 and 1.5, with increments of 0.1. Adjusting this parameter affects the tempo of the speech, where values below 1.0 slow down the speech and values above 1.0 speed it up. This flexibility enables users to tailor the speech output to match specific timing requirements or artistic preferences.

text

The text parameter is a multiline string input that contains the primary text to be converted into speech. This parameter is essential as it defines the content of the speech output. Users can input any text they wish to be spoken, and the node will process this text to generate the corresponding audio. The ability to input multiline text allows for the creation of complex and detailed speech outputs.

instruct

The instruct parameter is another multiline string input that provides additional instructions or context for the speech synthesis process. This parameter allows users to influence the style, tone, or other characteristics of the generated speech, offering a higher degree of customization. By providing specific instructions, users can achieve more personalized and context-aware audio outputs.

NTCosyVoiceInstruct2Sampler Output Parameters:

tts_speech

The tts_speech output parameter is the resulting audio generated by the node, encapsulated in a dictionary containing the waveform and sample rate. This output represents the synthesized speech based on the provided text and instructions, processed at the specified speed. The waveform is a tensor that can be further used or manipulated in audio applications, while the sample rate ensures compatibility with various audio playback systems. This output is crucial for users who need high-quality, contextually accurate speech synthesis for their projects.

NTCosyVoiceInstruct2Sampler Usage Tips:

To achieve the best results, ensure that the audio input is of high quality and matches the desired sample rate for your project. This will help maintain the clarity and fidelity of the generated speech.
Experiment with the speed parameter to find the optimal tempo for your speech output. This can significantly impact the delivery and perception of the synthesized audio, especially in artistic or narrative contexts.

NTCosyVoiceInstruct2Sampler Common Errors and Solutions:

"Invalid audio input format"

Explanation: This error occurs when the audio input does not conform to the expected format, such as an incorrect waveform tensor or sample rate.
Solution: Ensure that the audio input is a valid tensor with the correct dimensions and that the sample rate is an integer representing the number of samples per second.

"Text input is empty"

Explanation: This error is triggered when the text parameter is left empty, which prevents the node from generating any speech output.
Solution: Provide a valid string in the text parameter to enable the node to process and generate the desired speech.

"Instruct input is empty"

Explanation: This error occurs when the instruct parameter is not provided, which may lead to less customized speech output.
Solution: Include relevant instructions in the instruct parameter to enhance the customization and context of the generated speech.

NTCosyVoiceInstruct2Sampler Related Nodes

Go back to the extension to check out more related nodes.

CosyVoice2 for ComfyUI

Table of Content

Description
NTCosyVoiceInstruct2Sampler:
NTCosyVoiceInstruct2Sampler Input Parameters:
NTCosyVoiceInstruct2Sampler Output Parameters:
NTCosyVoiceInstruct2Sampler Usage Tips:
NTCosyVoiceInstruct2Sampler Common Errors and Solutions:
Related Nodes

CatVTON | Amazing Virtual Try-On

CatVTON for easy and accurate virtual try-on.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Wan 2.1 Fun | ControlNet Video Generation

Generate videos with ControlNet-style visual passes like Depth, Canny, and OpenPose.

Step1X-Edit | AI Image Editing Tool

Perform 11 editing operations with natural language in Step1X-Edit.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.