ComfyUI > Nodes > CosyVoice2 for ComfyUI > NTCosyVoiceCrossLingualSampler

ComfyUI Node: NTCosyVoiceCrossLingualSampler

Class Name

NTCosyVoiceCrossLingualSampler

Category
Nineton Nodes
Author
muxueChen (Account age: 3218days)
Extension
CosyVoice2 for ComfyUI
Latest Updated
2025-02-11
Github Stars
0.1K

How to Install CosyVoice2 for ComfyUI

Install this extension via the ComfyUI Manager by searching for CosyVoice2 for ComfyUI
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter CosyVoice2 for ComfyUI in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • 16GB VRAM to 80GB VRAM GPU machines
  • 400+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 200+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

NTCosyVoiceCrossLingualSampler Description

Sophisticated node for cross-lingual text-to-speech synthesis using CosyVoice2 model for multilingual support.

NTCosyVoiceCrossLingualSampler:

The NTCosyVoiceCrossLingualSampler is a sophisticated node designed to facilitate cross-lingual text-to-speech synthesis. It leverages the CosyVoice2 model to convert text input into speech, allowing for seamless language transitions and natural-sounding audio output. This node is particularly beneficial for applications requiring multilingual support, as it can generate speech in different languages using a single model. By utilizing a pre-trained model, it ensures high-quality speech synthesis with minimal setup. The node's primary function is to take textual input and produce corresponding audio output, making it an essential tool for developers and artists looking to integrate voice synthesis into their projects.

NTCosyVoiceCrossLingualSampler Input Parameters:

audio

The audio parameter is a required input that provides the initial audio data to the node. It is expected to be in the form of a dictionary containing a waveform and a sample rate. This audio serves as a prompt for the cross-lingual synthesis process, helping the model to generate speech that matches the characteristics of the input audio. The waveform should be a tensor, and the sample rate should be an integer representing the number of samples per second.

speed

The speed parameter controls the playback speed of the synthesized speech. It is a floating-point value that can range from 0.5 to 1.5, with a default value of 1.0. Adjusting this parameter allows you to speed up or slow down the speech output, which can be useful for matching the tempo of the audio to specific requirements or preferences. A lower value will slow down the speech, while a higher value will speed it up.

text

The text parameter is a required input that contains the textual content to be converted into speech. It is a string that can support multiline input, allowing for the synthesis of longer passages of text. This parameter is crucial as it defines the content of the speech output, and the node will process this text to generate the corresponding audio in the desired language.

NTCosyVoiceCrossLingualSampler Output Parameters:

tts_speech

The tts_speech output parameter is the result of the text-to-speech synthesis process. It is an audio object containing the waveform of the synthesized speech and the sample rate at which it was generated. This output is crucial for applications that require audio playback or further audio processing, as it provides the final speech output that can be used in various multimedia projects.

NTCosyVoiceCrossLingualSampler Usage Tips:

  • Ensure that the input audio is of high quality and matches the desired characteristics of the output speech to achieve the best results.
  • Experiment with the speed parameter to find the optimal playback speed for your specific application, as this can significantly affect the naturalness and intelligibility of the synthesized speech.

NTCosyVoiceCrossLingualSampler Common Errors and Solutions:

Model not found at specified path

  • Explanation: This error occurs when the CosyVoice2 model cannot be located at the specified path.
  • Solution: Verify that the model path is correct and that the model files are present in the expected directory. Ensure that the nor_dir variable is set correctly in your environment.

Invalid audio input format

  • Explanation: This error arises when the input audio does not conform to the expected format, such as missing waveform or sample rate.
  • Solution: Check that the input audio is a dictionary containing both a waveform tensor and a sample rate integer. Ensure that the waveform is correctly formatted and the sample rate is a valid integer value.

Text input is empty

  • Explanation: This error occurs when the text input provided to the node is empty or null.
  • Solution: Ensure that the text parameter contains valid content. Provide a non-empty string to the text input to enable the synthesis process.

NTCosyVoiceCrossLingualSampler Related Nodes

Go back to the extension to check out more related nodes.
CosyVoice2 for ComfyUI
RunComfy
Copyright 2025 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.