MMAudio: Advanced video-to-audio model for high-quality audio generation.

Wan 2.1 Video Restyle | Consistent Video Style Transform

Transform your video style by applying the restyled first frame using Wan 2.1 video restyle workflow.

Insert Anything | Reference-Based Image Editing

Insert any subject into images with mask or text guidance.

IPAdapter Plus (V2) | One-Image Style Transfer

Use IPAdapter Plus and ControlNet for precise style transfer with a single reference image.

ComfyUI > Nodes > ComfyUI-FunAudioLLM > CosyVoice 跨语言克隆

ComfyUI Node: CosyVoice 跨语言克隆

Class Name

CosyVoiceCrossLingualNode

Category
FunAudioLLM - CosyVoice

Author
SpenserCai (Account age: 3000days) Extension
ComfyUI-FunAudioLLM Latest Updated
2024-11-27 Github Stars
0.08K

Github Ask SpenserCai Current Questions Past Questions

Table of Content

Description
CosyVoiceCrossLingualNode:
CosyVoiceCrossLingualNode Input Parameters:
CosyVoiceCrossLingualNode Output Parameters:
CosyVoiceCrossLingualNode Usage Tips:
CosyVoiceCrossLingualNode Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

CosyVoice 跨语言克隆 Description

Facilitates cross-lingual text-to-speech synthesis for multilingual voice applications with natural-sounding output.

CosyVoiceCrossLingualNode:

The CosyVoiceCrossLingualNode is designed to facilitate cross-lingual text-to-speech synthesis, allowing you to generate speech in one language using a text prompt from another language. This node is particularly beneficial for applications requiring multilingual voice synthesis, such as creating voiceovers for international audiences or developing language learning tools. By leveraging advanced machine learning models, the node can produce natural-sounding speech that maintains the nuances and intonations of the target language, even when the input text is in a different language. This capability is crucial for ensuring that the synthesized speech is both intelligible and culturally appropriate, enhancing the overall user experience.

CosyVoiceCrossLingualNode Input Parameters:

tts_text

The tts_text parameter is a string input that represents the text you want to convert into speech. This text can be in any language, and the node will process it to generate speech in the target language specified by the prompt. The quality and accuracy of the generated speech heavily depend on the clarity and correctness of this input text.

prompt_wav

The prompt_wav parameter is an audio input that serves as a reference for the target language and voice characteristics. It is used to guide the cross-lingual synthesis process, ensuring that the output speech matches the desired language and style. The audio should be clear and representative of the target language to achieve optimal results.

speed

The speed parameter is a float that controls the rate of speech in the generated audio. A default value of 1.0 indicates normal speed, while values greater than 1.0 will increase the speed, and values less than 1.0 will decrease it. Adjusting this parameter allows you to tailor the speech output to match specific pacing requirements.

seed

The seed parameter is an integer used to initialize the random number generator, ensuring reproducibility of the results. By setting a specific seed value, you can generate consistent outputs across different runs, which is useful for testing and comparison purposes. The default value is 42.

use_25hz

The use_25hz parameter is a boolean that determines whether to use a 25Hz sampling rate for the audio processing. The default value is False, which means the node will use the standard sampling rate. Enabling this option may be beneficial for specific applications that require lower frequency audio processing.

CosyVoiceCrossLingualNode Output Parameters:

AUDIO

The AUDIO output is the synthesized speech generated by the node. This audio output is the result of the cross-lingual text-to-speech process, where the input text is converted into speech in the target language specified by the prompt. The output is designed to be natural and intelligible, capturing the nuances of the target language and voice characteristics.

CosyVoiceCrossLingualNode Usage Tips:

Ensure that the prompt_wav audio is clear and representative of the target language to achieve the best synthesis results.
Experiment with the speed parameter to find the optimal speech rate for your specific application needs.
Use the seed parameter to maintain consistency in outputs across different runs, which is useful for testing and iterative development.

CosyVoiceCrossLingualNode Common Errors and Solutions:

ValueError: 'model_dir' do not support cross_lingual inference

Explanation: This error occurs when the model directory specified does not support cross-lingual inference, possibly due to incorrect model configuration or missing files.
Solution: Verify that the correct model directory is being used and that all necessary files for cross-lingual inference are present. Ensure that the model supports the desired cross-lingual capabilities.

Audio input error

Explanation: This error may arise if the prompt_wav audio input is not in the correct format or is corrupted.
Solution: Check the format and integrity of the prompt_wav file. Ensure it is a valid audio file and matches the expected sample rate and format requirements.

CosyVoice 跨语言克隆 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FunAudioLLM

Table of Content

Description
CosyVoiceCrossLingualNode:
CosyVoiceCrossLingualNode Input Parameters:
CosyVoiceCrossLingualNode Output Parameters:
CosyVoiceCrossLingualNode Usage Tips:
CosyVoiceCrossLingualNode Common Errors and Solutions:
Related Nodes

ICEdit | Fast AI Image Editing with Nunchaku

ICEdit+Nunchaku: A solution for ultra-fast, precise AI image editing.

FLUX Img2Img | Merge Visuals and Prompts

Merge visuals and prompts for stunning, enhanced results.

Consistent Character Creator

Create consistent, high-resolution character designs from multiple angles with full control over emotions, lighting, and environments.

Hunyuan Image to Video | Breathtaking Motion Creator

Create magnificent movies out of still images through cinematic motion and customizable effects.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.