Visit ComfyUI Online for ready-to-use ComfyUI environment
Convert text to synchronized audio-visual content for lip-synced videos with advanced AI technology in KLingAI suite.
The Lip Sync Text Input node is designed to facilitate the creation of lip-synced videos by converting text into synchronized audio-visual content. This node is part of the KLingAI suite, which leverages advanced AI technologies to generate videos where the spoken text is perfectly aligned with the lip movements of the video subject. The primary goal of this node is to provide a seamless and efficient way to produce high-quality lip-synced videos, making it an invaluable tool for content creators, educators, and marketers who wish to enhance their video content with synchronized speech. By utilizing this node, you can easily input text and configure various voice parameters to achieve the desired lip-sync effect, thereby enhancing the engagement and professionalism of your video projects.
This parameter represents the text that you want to convert into a lip-synced video. The text input is crucial as it forms the basis of the audio that will be synchronized with the video. It supports multiline input, allowing you to provide extensive scripts or dialogues. The default value is an empty string, and there are no specific minimum or maximum length restrictions, but it is advisable to keep the text concise for better synchronization results.
The voice_id
parameter allows you to select the specific voice that will be used to generate the audio for lip-syncing. This parameter is essential for customizing the audio output to match the desired tone and style of your video. The available options are predefined and include various character voices such as "Bud," "Sprite," and "Candy," among others. Selecting the appropriate voice_id
can significantly impact the overall feel and authenticity of the lip-synced video.
This parameter specifies the language of the voice that will be used in the lip-syncing process. It is important for ensuring that the audio output matches the linguistic characteristics of the text. The available options are "zh" for Chinese and "en" for English, with the default set to "zh." Choosing the correct language is crucial for accurate pronunciation and synchronization.
The voice_speed
parameter controls the speed at which the text is spoken in the generated audio. It is a floating-point value that allows you to adjust the tempo of the speech to suit your video's pacing. The default value is 1.0, with a minimum of 0.8 and a maximum of 2.0. Adjusting the voice_speed
can help in achieving the desired timing and rhythm in the lip-synced video.
The output parameter input
is of type KLING_AI_API_LIPSYNC_INPUT
and encapsulates all the input configurations required for the lip-syncing process. This includes the text, voice settings, and any other parameters that have been set. The input
parameter is crucial as it serves as the comprehensive input package that is passed to the lip-syncing engine, ensuring that all specified settings are applied to generate the final lip-synced video.
voice_id
options to find the voice that best matches the tone and style of your video content. This can greatly enhance the viewer's experience.voice_speed
parameter to match the pacing of your video. A slower speed may be more suitable for educational content, while a faster speed might be better for dynamic marketing videos.<audio_file>
voice_id
is provided, which is not part of the predefined options.voice_id
is one of the available options listed in the node's documentation. Use the correct identifier for the desired voice.voice_language
. Ensure that the text input matches the selected language for optimal results.RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.