Achieve better control with FLUX-ControlNet-Depth & FLUX-ControlNet-Canny for FLUX.1 [dev].

Flux Depth and Canny

Official Flux Tools - Flux Depth and Canny ControlNet Model

HiDream-I1 | T2I

High-quality image generation using a 17B parameter model.

LivePortrait | Animate Portraits | Vid2Vid

Transfer facial expressions and movements from a driving video onto a source video

ComfyUI > Nodes > ComfyUI CogVideoX Wrapper > CogVideo TextEncode

ComfyUI Node: CogVideo TextEncode

Class Name

CogVideoTextEncode

Category
CogVideoWrapper

Author
kijai (Account age: 2467days) Extension
ComfyUI CogVideoX Wrapper Latest Updated
2025-02-17 Github Stars
1.46K

Github Ask kijai Current Questions Past Questions

Table of Content

Description
CogVideo TextEncode:
CogVideo TextEncode Input Parameters:
CogVideo TextEncode Output Parameters:
CogVideo TextEncode Usage Tips:
CogVideo TextEncode Common Errors and Solutions:
Related Nodes

How to Install ComfyUI CogVideoX Wrapper

Install this extension via the ComfyUI Manager by searching for ComfyUI CogVideoX Wrapper

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI CogVideoX Wrapper in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

CogVideo TextEncode Description

Transform textual prompts into conditioning embeddings for video generation using CLIP model, enhancing creative possibilities.

CogVideo TextEncode:

The CogVideoTextEncode node is designed to transform textual prompts into conditioning embeddings that can be used in video generation models. This node leverages the capabilities of the CLIP model to tokenize and encode text prompts, producing embeddings that guide the video generation process. By adjusting the strength of the embeddings and optionally offloading the model to manage memory usage, this node provides a flexible and powerful way to incorporate textual descriptions into video creation workflows. The primary goal of this node is to enable AI artists to infuse their video projects with detailed and nuanced textual guidance, enhancing the creative possibilities and ensuring that the generated content aligns closely with the provided prompts.

CogVideo TextEncode Input Parameters:

clip

This parameter expects a CLIP model instance, which is used to tokenize and encode the provided text prompt. The CLIP model is essential for converting textual descriptions into embeddings that can be used for conditioning the video generation process.

prompt

The prompt parameter is a string input where you can provide the textual description that you want to encode. This text will be tokenized and transformed into embeddings by the CLIP model. The default value is an empty string, and it supports multiline input, allowing for detailed and complex descriptions.

strength

The strength parameter is a float value that determines the intensity of the generated embeddings. It allows you to scale the embeddings, making them more or less influential in the video generation process. The default value is 1.0, with a minimum of 0.0 and a maximum of 10.0, adjustable in steps of 0.01. Adjusting this parameter can help fine-tune the impact of the text prompt on the final video output.

force_offload

The force_offload parameter is a boolean that, when set to true, offloads the model to a secondary device after processing to manage memory usage efficiently. The default value is true. This can be particularly useful when working with large models or limited hardware resources, ensuring that the system remains responsive and capable of handling additional tasks.

CogVideo TextEncode Output Parameters:

conditioning

The conditioning output is the resulting embedding generated from the provided text prompt. This embedding is used to condition the video generation process, guiding the model to produce content that aligns with the textual description. The conditioning embedding is a crucial component in ensuring that the generated video reflects the nuances and details specified in the prompt.

CogVideo TextEncode Usage Tips:

Experiment with different strength values to see how they affect the influence of your text prompt on the generated video. Higher values will make the text prompt more dominant, while lower values will allow for more subtle guidance.
Use detailed and descriptive prompts to achieve more specific and nuanced video outputs. The more information you provide, the better the model can understand and incorporate your vision.
If you encounter memory issues, try enabling the force_offload option to manage resources more effectively, especially when working with large models or limited hardware.

CogVideo TextEncode Common Errors and Solutions:

ValueError: conditioning_1 and conditioning_2 must have the same shape

Explanation: This error occurs when the shapes of the conditioning embeddings do not match, which is required for certain operations like averaging or concatenation.
Solution: Ensure that the conditioning embeddings being combined have the same shape. This might involve adjusting the input parameters or preprocessing steps to align the embeddings correctly.

Invalid combination mode

Explanation: This error is raised when an unsupported combination mode is specified.
Solution: Verify that the combination mode is one of the supported options: "average", "weighted_average", or "concatenate". Correct any typos or unsupported values in the combination mode parameter.

CogVideo TextEncode Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI CogVideoX Wrapper

Table of Content

Description
CogVideo TextEncode:
CogVideo TextEncode Input Parameters:
CogVideo TextEncode Output Parameters:
CogVideo TextEncode Usage Tips:
CogVideo TextEncode Common Errors and Solutions:
Related Nodes

Flux Consistent Characters | Input Image

Create consistent characters and ensure they look uniform using your images.

LTX Video | Image+Text to Video

Generates videos from image+text prompts.

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.