ComfyUI > Nodes > ComfyUI-FunAudioLLM > CosyVoice 预训练音色

ComfyUI Node: CosyVoice 预训练音色

Class Name

CosyVoiceSFTNode

Category
FunAudioLLM - CosyVoice
Author
SpenserCai (Account age: 2873days)
Extension
ComfyUI-FunAudioLLM
Latest Updated
2024-11-27
Github Stars
0.05K

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FunAudioLLM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

CosyVoice 预训练音色 Description

Generate lifelike speech from text using pre-trained voice models for realistic audio outputs.

CosyVoice 预训练音色:

The CosyVoiceSFTNode is designed to facilitate the generation of speech from text using pre-trained voice models. This node leverages sophisticated speech synthesis technology to produce high-quality audio outputs that mimic the natural characteristics of human speech. It is particularly beneficial for applications requiring consistent and realistic voice outputs, such as virtual assistants, audiobooks, and other multimedia content. By utilizing pre-trained models, the node ensures that the generated speech is both accurate and expressive, capturing the nuances of different languages and speaker styles. This node is an essential tool for creators looking to integrate lifelike voice synthesis into their projects without the need for extensive training data or complex setup processes.

CosyVoice 预训练音色 Input Parameters:

tts_text

This parameter represents the text that you want to convert into speech. It is a string input that serves as the primary content for the speech synthesis process. The quality and clarity of the generated audio depend significantly on the text provided, as it forms the basis of the spoken output. There are no specific minimum or maximum values for this parameter, but the text should be coherent and grammatically correct to ensure the best results.

speaker_name

The speaker_name parameter allows you to select the voice model that will be used to generate the speech. It offers a list of pre-trained speaker models, including options like 中文女, 中文男, 日语男, 粤语女, 英文女, 英文男, and 韩语女. The default value is 中文女. This parameter is crucial for tailoring the voice output to match the desired language and gender characteristics, providing flexibility in voice selection to suit different project needs.

speed

This parameter controls the speed at which the generated speech is delivered. It is a float value with a default setting of 1.0, which represents the normal speaking rate. Adjusting this parameter allows you to speed up or slow down the speech, providing control over the pacing of the audio output. This can be particularly useful for matching the speech rate to specific content requirements or audience preferences.

seed

The seed parameter is an integer that sets the random seed for the speech synthesis process. Its default value is 42. By setting a specific seed, you can ensure that the speech generation process is deterministic, meaning that the same input will consistently produce the same output. This is useful for maintaining consistency across multiple runs or when fine-tuning the output for specific applications.

use_25hz

This boolean parameter determines whether the speech synthesis should utilize a 25Hz sampling rate. The default value is False, meaning that the standard sampling rate is used unless specified otherwise. Enabling this option can be beneficial for certain applications that require a lower sampling rate, potentially reducing file size or meeting specific technical requirements.

CosyVoice 预训练音色 Output Parameters:

AUDIO

The output of the CosyVoiceSFTNode is an audio file that contains the synthesized speech. This audio output is the result of converting the input text into spoken words using the selected speaker model and specified parameters. The quality of the audio is designed to be high, capturing the nuances of human speech and providing a realistic listening experience. This output can be used directly in various applications, such as multimedia projects, virtual assistants, or any other context where synthesized speech is required.

CosyVoice 预训练音色 Usage Tips:

  • Ensure that the tts_text is clear and grammatically correct to achieve the best audio quality.
  • Experiment with different speaker_name options to find the most suitable voice for your project.
  • Adjust the speed parameter to match the desired pacing of your audio output, especially if the content requires a specific delivery speed.
  • Use the seed parameter to maintain consistency across multiple runs, ensuring that the same input consistently produces the same output.

CosyVoice 预训练音色 Common Errors and Solutions:

"prompt文本为空,您是否忘记输入prompt文本?"

  • Explanation: This error occurs when the required text input for the speech synthesis is missing.
  • Solution: Ensure that you provide a valid tts_text input before running the node.

"Invalid speaker name"

  • Explanation: This error indicates that the specified speaker_name does not match any of the available pre-trained models.
  • Solution: Verify that the speaker_name is correctly spelled and matches one of the available options: 中文女, 中文男, 日语男, 粤语女, 英文女, 英文男, 韩语女.

"Audio output is not generated"

  • Explanation: This issue may arise if there is a problem with the input parameters or the synthesis process.
  • Solution: Double-check all input parameters for correctness and ensure that the node is properly configured. If the problem persists, try using different parameter values or consult the documentation for further troubleshooting steps.

CosyVoice 预训练音色 Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FunAudioLLM
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.