MimicMotion | Human Motion Video Generation

Generate high-quality human motion videos with MimicMotion, using a reference image and motion sequence.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

ComfyUI Phantom | Subject to Video

Reference-driven video generation using Wan2.1 14B

ComfyUI > Nodes > ComfyUI-FunAudioLLM > CosyVoice 自然语言控制

ComfyUI Node: CosyVoice 自然语言控制

Class Name

CosyVoiceInstructNode

Category
FunAudioLLM - CosyVoice

Author
SpenserCai (Account age: 3000days) Extension
ComfyUI-FunAudioLLM Latest Updated
2024-11-27 Github Stars
0.08K

Github Ask SpenserCai Current Questions Past Questions

Table of Content

Description
CosyVoiceInstructNode:
CosyVoiceInstructNode Input Parameters:
CosyVoiceInstructNode Output Parameters:
CosyVoiceInstructNode Usage Tips:
CosyVoiceInstructNode Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-FunAudioLLM

Install this extension via the ComfyUI Manager by searching for ComfyUI-FunAudioLLM

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-FunAudioLLM in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

CosyVoice 自然语言控制 Description

Facilitates natural language control for text-to-speech synthesis with customizable speaker characteristics.

CosyVoiceInstructNode:

The CosyVoiceInstructNode is designed to facilitate natural language control over text-to-speech (TTS) synthesis, allowing you to generate speech with specific instructions and speaker characteristics. This node leverages advanced AI models to interpret and execute instructions embedded in text, providing a seamless way to produce speech that aligns with your creative vision. By utilizing this node, you can achieve a high degree of customization in speech synthesis, making it ideal for applications that require nuanced vocal expressions or multilingual capabilities. The node's primary function is to transform written text into spoken words, guided by user-defined instructions and speaker profiles, thus offering a versatile tool for AI artists looking to enhance their audio projects with dynamic and expressive speech outputs.

CosyVoiceInstructNode Input Parameters:

tts_text

The tts_text parameter is a string input that represents the text you wish to convert into speech. This text serves as the primary content for the speech synthesis process. The clarity and accuracy of the generated speech are directly influenced by the quality and structure of the input text. There are no explicit minimum or maximum length constraints, but concise and well-structured text will yield better results.

speaker_name

The speaker_name parameter allows you to select the desired speaker profile from a predefined list of options, including voices like '中文女', '中文男', '日语男', '粤语女', '英文女', '英文男', and '韩语女'. This selection determines the vocal characteristics of the synthesized speech, such as gender and language. The default value is '中文女', and choosing the appropriate speaker can significantly impact the authenticity and emotional tone of the output.

instruct_text

The instruct_text parameter is a string input that provides specific instructions or context for the speech synthesis process. This can include directives on tone, style, or emphasis, allowing for a more tailored and expressive speech output. The instructions should be clear and relevant to the desired outcome to ensure they are effectively interpreted by the model.

speed

The speed parameter is a float that controls the rate of speech delivery. It allows you to adjust how fast or slow the synthesized speech is produced, with a default value of 1.0. Modifying this parameter can help match the speech pace to the intended use case, whether it's for a fast-paced narration or a slow, deliberate delivery.

seed

The seed parameter is an integer used to initialize the random number generator, ensuring reproducibility of the speech synthesis process. The default value is 42. By setting a specific seed, you can achieve consistent results across multiple runs, which is particularly useful for testing and iterative development.

CosyVoiceInstructNode Output Parameters:

AUDIO

The output of the CosyVoiceInstructNode is an AUDIO file, which contains the synthesized speech based on the provided text and instructions. This audio output is the culmination of the node's processing, reflecting the specified speaker characteristics, instructions, and speech speed. The quality and expressiveness of the audio are key indicators of the node's effectiveness in translating text and instructions into natural-sounding speech.

CosyVoiceInstructNode Usage Tips:

Ensure that the tts_text is clear and well-structured to achieve the best speech synthesis results.
Experiment with different speaker_name options to find the most suitable voice for your project, as this can greatly affect the emotional tone and authenticity of the output.
Use the instruct_text parameter to provide specific guidance on the desired speech style or tone, enhancing the expressiveness of the generated audio.
Adjust the speed parameter to match the pace of the speech with the intended context, whether it's for a fast-paced commercial or a slow, narrative piece.
Set a specific seed value to ensure consistent results across multiple synthesis attempts, which is useful for testing and refining your audio outputs.

CosyVoiceInstructNode Common Errors and Solutions:

"Model directory not found"

Explanation: This error occurs when the specified model directory does not exist or is inaccessible.
Solution: Verify that the model directory path is correct and that the necessary files are present. Ensure that you have the required permissions to access the directory.

"Invalid speaker name"

Explanation: This error indicates that the provided speaker name does not match any of the available options.
Solution: Check the list of valid speaker names and ensure that the input matches one of the predefined options exactly.

"Instruction text not recognized"

Explanation: The instructions provided in the instruct_text parameter may be unclear or unsupported by the model.
Solution: Simplify or rephrase the instructions to ensure they are clear and relevant to the desired speech output. Consider using more straightforward directives that the model can easily interpret.

CosyVoice 自然语言控制 Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-FunAudioLLM

Table of Content

Description
CosyVoiceInstructNode:
CosyVoiceInstructNode Input Parameters:
CosyVoiceInstructNode Output Parameters:
CosyVoiceInstructNode Usage Tips:
CosyVoiceInstructNode Common Errors and Solutions:
Related Nodes

IC-Light | Video Relighting | AnimateDiff

Relight your videos with light maps and prompts

LivePortrait | Animate Portraits | Img2Vid

Animate portraits with facial expressions and motion using a single image and reference video.

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Flux Consistent Characters | Input Text

Create consistent characters and ensure they look uniform by inputting text.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.