MMAudio: Advanced video-to-audio model for high-quality audio generation.

ComfyUI Vid2Vid Dance Transfer

Transfers the motion and style from a source video onto a target image or object.

IC-Light | Video Relighting | AnimateDiff

Relight your videos with light maps and prompts

Flux Redux | Variation and Restyling

Official Flux Tools - Flux Redux for Image Variation and Restyling

ComfyUI > Nodes > CosyVoice2 for ComfyUI > NTCosyVoiceCrossLingualSampler

ComfyUI Node: NTCosyVoiceCrossLingualSampler

Class Name

NTCosyVoiceCrossLingualSampler

Category
Nineton Nodes

Author
muxueChen (Account age: 3245days) Extension
CosyVoice2 for ComfyUI Latest Updated
2025-02-11 Github Stars
0.12K

Github Ask muxueChen Current Questions Past Questions

Table of Content

Description
NTCosyVoiceCrossLingualSampler:
NTCosyVoiceCrossLingualSampler Input Parameters:
NTCosyVoiceCrossLingualSampler Output Parameters:
NTCosyVoiceCrossLingualSampler Usage Tips:
NTCosyVoiceCrossLingualSampler Common Errors and Solutions:
Related Nodes

How to Install CosyVoice2 for ComfyUI

Install this extension via the ComfyUI Manager by searching for CosyVoice2 for ComfyUI

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter CosyVoice2 for ComfyUI in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

NTCosyVoiceCrossLingualSampler Description

Sophisticated node for cross-lingual text-to-speech synthesis using CosyVoice2 model for multilingual support.

NTCosyVoiceCrossLingualSampler:

The NTCosyVoiceCrossLingualSampler is a sophisticated node designed to facilitate cross-lingual text-to-speech synthesis. It leverages the CosyVoice2 model to convert text input into speech, allowing for seamless language transitions and natural-sounding audio output. This node is particularly beneficial for applications requiring multilingual support, as it can generate speech in different languages using a single model. By utilizing a pre-trained model, it ensures high-quality speech synthesis with minimal setup. The node's primary function is to take textual input and produce corresponding audio output, making it an essential tool for developers and artists looking to integrate voice synthesis into their projects.

NTCosyVoiceCrossLingualSampler Input Parameters:

audio

The audio parameter is a required input that provides the initial audio data to the node. It is expected to be in the form of a dictionary containing a waveform and a sample rate. This audio serves as a prompt for the cross-lingual synthesis process, helping the model to generate speech that matches the characteristics of the input audio. The waveform should be a tensor, and the sample rate should be an integer representing the number of samples per second.

speed

The speed parameter controls the playback speed of the synthesized speech. It is a floating-point value that can range from 0.5 to 1.5, with a default value of 1.0. Adjusting this parameter allows you to speed up or slow down the speech output, which can be useful for matching the tempo of the audio to specific requirements or preferences. A lower value will slow down the speech, while a higher value will speed it up.

text

The text parameter is a required input that contains the textual content to be converted into speech. It is a string that can support multiline input, allowing for the synthesis of longer passages of text. This parameter is crucial as it defines the content of the speech output, and the node will process this text to generate the corresponding audio in the desired language.

NTCosyVoiceCrossLingualSampler Output Parameters:

tts_speech

The tts_speech output parameter is the result of the text-to-speech synthesis process. It is an audio object containing the waveform of the synthesized speech and the sample rate at which it was generated. This output is crucial for applications that require audio playback or further audio processing, as it provides the final speech output that can be used in various multimedia projects.

NTCosyVoiceCrossLingualSampler Usage Tips:

Ensure that the input audio is of high quality and matches the desired characteristics of the output speech to achieve the best results.
Experiment with the speed parameter to find the optimal playback speed for your specific application, as this can significantly affect the naturalness and intelligibility of the synthesized speech.

NTCosyVoiceCrossLingualSampler Common Errors and Solutions:

Model not found at specified path

Explanation: This error occurs when the CosyVoice2 model cannot be located at the specified path.
Solution: Verify that the model path is correct and that the model files are present in the expected directory. Ensure that the nor_dir variable is set correctly in your environment.

Invalid audio input format

Explanation: This error arises when the input audio does not conform to the expected format, such as missing waveform or sample rate.
Solution: Check that the input audio is a dictionary containing both a waveform tensor and a sample rate integer. Ensure that the waveform is correctly formatted and the sample rate is a valid integer value.

Text input is empty

Explanation: This error occurs when the text input provided to the node is empty or null.
Solution: Ensure that the text parameter contains valid content. Provide a non-empty string to the text input to enable the synthesis process.

NTCosyVoiceCrossLingualSampler Related Nodes

Go back to the extension to check out more related nodes.

CosyVoice2 for ComfyUI

Table of Content

Description
NTCosyVoiceCrossLingualSampler:
NTCosyVoiceCrossLingualSampler Input Parameters:
NTCosyVoiceCrossLingualSampler Output Parameters:
NTCosyVoiceCrossLingualSampler Usage Tips:
NTCosyVoiceCrossLingualSampler Common Errors and Solutions:
Related Nodes

Flux TTP Upscale | 4K Face Restore

Repair distorted faces and upscale images to 4K resolution.

Flux PuLID for Face Swapping

Take your face swapping projects to new heights with Flux PuLID.

Wan 2.1 Fun | ControlNet Video Generation

Generate videos with ControlNet-style visual passes like Depth, Canny, and OpenPose.

Hunyuan3D-2 | Leading-edge 3D Assets Generator

Generate precise textured 3D assets from images with state-of-the-art AI technology.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.