ComfyUI > Nodes > ComfyUI-FunAudioLLM > CosyVoice 自然语言控制

ComfyUI Node: CosyVoice 自然语言控制

Class Name

CosyVoiceInstructNode

Category
FunAudioLLM - CosyVoice
Author
SpenserCai (Account age: 2873days)
Extension
ComfyUI-FunAudioLLM
Latest Updated
2024-11-27
Github Stars
0.05K

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FunAudioLLM in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

CosyVoice 自然语言控制 Description

Facilitates natural language control for text-to-speech synthesis with customizable speaker characteristics.

CosyVoice 自然语言控制:

The CosyVoiceInstructNode is designed to facilitate natural language control over text-to-speech (TTS) synthesis, allowing you to generate speech with specific instructions and speaker characteristics. This node leverages advanced AI models to interpret and execute instructions embedded in text, providing a seamless way to produce speech that aligns with your creative vision. By utilizing this node, you can achieve a high degree of customization in speech synthesis, making it ideal for applications that require nuanced vocal expressions or multilingual capabilities. The node's primary function is to transform written text into spoken words, guided by user-defined instructions and speaker profiles, thus offering a versatile tool for AI artists looking to enhance their audio projects with dynamic and expressive speech outputs.

CosyVoice 自然语言控制 Input Parameters:

tts_text

The tts_text parameter is a string input that represents the text you wish to convert into speech. This text serves as the primary content for the speech synthesis process. The clarity and accuracy of the generated speech are directly influenced by the quality and structure of the input text. There are no explicit minimum or maximum length constraints, but concise and well-structured text will yield better results.

speaker_name

The speaker_name parameter allows you to select the desired speaker profile from a predefined list of options, including voices like '中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', and '韩语女'. This selection determines the vocal characteristics of the synthesized speech, such as gender and language. The default value is '中文女', and choosing the appropriate speaker can significantly impact the authenticity and emotional tone of the output.

instruct_text

The instruct_text parameter is a string input that provides specific instructions or context for the speech synthesis process. This can include directives on tone, style, or emphasis, allowing for a more tailored and expressive speech output. The instructions should be clear and relevant to the desired outcome to ensure they are effectively interpreted by the model.

speed

The speed parameter is a float that controls the rate of speech delivery. It allows you to adjust how fast or slow the synthesized speech is produced, with a default value of 1.0. Modifying this parameter can help match the speech pace to the intended use case, whether it's for a fast-paced narration or a slow, deliberate delivery.

seed

The seed parameter is an integer used to initialize the random number generator, ensuring reproducibility of the speech synthesis process. The default value is 42. By setting a specific seed, you can achieve consistent results across multiple runs, which is particularly useful for testing and iterative development.

CosyVoice 自然语言控制 Output Parameters:

AUDIO

The output of the CosyVoiceInstructNode is an AUDIO file, which contains the synthesized speech based on the provided text and instructions. This audio output is the culmination of the node's processing, reflecting the specified speaker characteristics, instructions, and speech speed. The quality and expressiveness of the audio are key indicators of the node's effectiveness in translating text and instructions into natural-sounding speech.

CosyVoice 自然语言控制 Usage Tips:

  • Ensure that the tts_text is clear and well-structured to achieve the best speech synthesis results.
  • Experiment with different speaker_name options to find the most suitable voice for your project, as this can greatly affect the emotional tone and authenticity of the output.
  • Use the instruct_text parameter to provide specific guidance on the desired speech style or tone, enhancing the expressiveness of the generated audio.
  • Adjust the speed parameter to match the pace of the speech with the intended context, whether it's for a fast-paced commercial or a slow, narrative piece.
  • Set a specific seed value to ensure consistent results across multiple synthesis attempts, which is useful for testing and refining your audio outputs.

CosyVoice 自然语言控制 Common Errors and Solutions:

"Model directory not found"

  • Explanation: This error occurs when the specified model directory does not exist or is inaccessible.
  • Solution: Verify that the model directory path is correct and that the necessary files are present. Ensure that you have the required permissions to access the directory.

"Invalid speaker name"

  • Explanation: This error indicates that the provided speaker name does not match any of the available options.
  • Solution: Check the list of valid speaker names and ensure that the input matches one of the predefined options exactly.

"Instruction text not recognized"

  • Explanation: The instructions provided in the instruct_text parameter may be unclear or unsupported by the model.
  • Solution: Simplify or rephrase the instructions to ensure they are clear and relevant to the desired speech output. Consider using more straightforward directives that the model can easily interpret.

CosyVoice 自然语言控制 Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FunAudioLLM
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.