LivePortrait | Animate Portraits | Img2Vid

Animate portraits with facial expressions and motion using a single image and reference video.

SkyReels V1 | Human-Focused Video Creation

Generate cinematic human videos with genuine facial expressions and natural movements from text or images.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

ComfyUI > Nodes > ComfyUI-FunAudioLLM > CosyVoice 预训练音色

ComfyUI Node: CosyVoice 预训练音色

Class Name

CosyVoiceSFTNode

Category
FunAudioLLM - CosyVoice

Author
SpenserCai (Account age: 3000days) Extension
ComfyUI-FunAudioLLM Latest Updated
2024-11-27 Github Stars
0.08K

Github Ask SpenserCai Current Questions Past Questions

Table of Content

Description
CosyVoiceSFTNode:
CosyVoiceSFTNode Input Parameters:
CosyVoiceSFTNode Output Parameters:
CosyVoiceSFTNode Usage Tips:
CosyVoiceSFTNode Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

CosyVoice 预训练音色 Description

Generate lifelike speech from text using pre-trained voice models for realistic audio outputs.

CosyVoiceSFTNode:

The CosyVoiceSFTNode is designed to facilitate the generation of speech from text using pre-trained voice models. This node leverages sophisticated speech synthesis technology to produce high-quality audio outputs that mimic the natural characteristics of human speech. It is particularly beneficial for applications requiring consistent and realistic voice outputs, such as virtual assistants, audiobooks, and other multimedia content. By utilizing pre-trained models, the node ensures that the generated speech is both accurate and expressive, capturing the nuances of different languages and speaker styles. This node is an essential tool for creators looking to integrate lifelike voice synthesis into their projects without the need for extensive training data or complex setup processes.

CosyVoiceSFTNode Input Parameters:

tts_text

This parameter represents the text that you want to convert into speech. It is a string input that serves as the primary content for the speech synthesis process. The quality and clarity of the generated audio depend significantly on the text provided, as it forms the basis of the spoken output. There are no specific minimum or maximum values for this parameter, but the text should be coherent and grammatically correct to ensure the best results.

speaker_name

The speaker_name parameter allows you to select the voice model that will be used to generate the speech. It offers a list of pre-trained speaker models, including options like 中文女, 中文男, 日语男, 粤语女, 英文女, 英文男, and 韩语女. The default value is 中文女. This parameter is crucial for tailoring the voice output to match the desired language and gender characteristics, providing flexibility in voice selection to suit different project needs.

speed

This parameter controls the speed at which the generated speech is delivered. It is a float value with a default setting of 1.0, which represents the normal speaking rate. Adjusting this parameter allows you to speed up or slow down the speech, providing control over the pacing of the audio output. This can be particularly useful for matching the speech rate to specific content requirements or audience preferences.

seed

The seed parameter is an integer that sets the random seed for the speech synthesis process. Its default value is 42. By setting a specific seed, you can ensure that the speech generation process is deterministic, meaning that the same input will consistently produce the same output. This is useful for maintaining consistency across multiple runs or when fine-tuning the output for specific applications.

use_25hz

This boolean parameter determines whether the speech synthesis should utilize a 25Hz sampling rate. The default value is False, meaning that the standard sampling rate is used unless specified otherwise. Enabling this option can be beneficial for certain applications that require a lower sampling rate, potentially reducing file size or meeting specific technical requirements.

CosyVoiceSFTNode Output Parameters:

AUDIO

The output of the CosyVoiceSFTNode is an audio file that contains the synthesized speech. This audio output is the result of converting the input text into spoken words using the selected speaker model and specified parameters. The quality of the audio is designed to be high, capturing the nuances of human speech and providing a realistic listening experience. This output can be used directly in various applications, such as multimedia projects, virtual assistants, or any other context where synthesized speech is required.

CosyVoiceSFTNode Usage Tips:

Ensure that the tts_text is clear and grammatically correct to achieve the best audio quality.
Experiment with different speaker_name options to find the most suitable voice for your project.
Adjust the speed parameter to match the desired pacing of your audio output, especially if the content requires a specific delivery speed.
Use the seed parameter to maintain consistency across multiple runs, ensuring that the same input consistently produces the same output.

CosyVoiceSFTNode Common Errors and Solutions:

"prompt文本为空，您是否忘记输入prompt文本？"

Explanation: This error occurs when the required text input for the speech synthesis is missing.
Solution: Ensure that you provide a valid tts_text input before running the node.

"Invalid speaker name"

Explanation: This error indicates that the specified speaker_name does not match any of the available pre-trained models.
Solution: Verify that the speaker_name is correctly spelled and matches one of the available options: 中文女, 中文男, 日语男, 粤语女, 英文女, 英文男, 韩语女.

"Audio output is not generated"

Explanation: This issue may arise if there is a problem with the input parameters or the synthesis process.
Solution: Double-check all input parameters for correctness and ensure that the node is properly configured. If the problem persists, try using different parameter values or consult the documentation for further troubleshooting steps.

CosyVoice 预训练音色 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FunAudioLLM

Table of Content

Description
CosyVoiceSFTNode:
CosyVoiceSFTNode Input Parameters:
CosyVoiceSFTNode Output Parameters:
CosyVoiceSFTNode Usage Tips:
CosyVoiceSFTNode Common Errors and Solutions:
Related Nodes

ICEdit | Fast AI Image Editing with Nunchaku

ICEdit+Nunchaku: A solution for ultra-fast, precise AI image editing.

Flux TTP Upscale | 4K Face Restore

Repair distorted faces and upscale images to 4K resolution.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.