Empower your AI videos with Wan 2.1 Fun.

CogVideoX Tora | Image-to-Video Model

Subject Trajectory Video Demo for CogVideoX

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

ACE++ Face Swap ｜ Image Editing

Swap faces in images with natural language instructions while preserving style and context.

ComfyUI > Nodes > VLM_nodes > MC-LLaVA Node

ComfyUI Node: MC-LLaVA Node

Class Name

MCLLaVAModel

Category
VLM Nodes/MC-LLaVA

Author
gokayfem (Account age: 1342days) Extension
VLM_nodes Latest Updated
2025-02-13 Github Stars
0.48K

Github Ask gokayfem Current Questions Past Questions

Table of Content

Description
MC-LLaVA Node:
MC-LLaVA Node Input Parameters:
MC-LLaVA Node Output Parameters:
MC-LLaVA Node Usage Tips:
MC-LLaVA Node Common Errors and Solutions:
Related Nodes

How to Install VLM_nodes

Install this extension via the ComfyUI Manager by searching for VLM_nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter VLM_nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

MC-LLaVA Node Description

Generate detailed image descriptions using advanced machine learning models for AI artists to add metadata and narrative content.

MC-LLaVA Node:

The MCLLaVAModel node is designed to generate detailed descriptions of images based on a given prompt. This node leverages advanced machine learning models to analyze the visual content of an image and produce a coherent and contextually relevant textual description. The primary benefit of using this node is its ability to interpret and describe complex visual scenes, making it a valuable tool for AI artists who want to add descriptive metadata to their images or create narrative content based on visual inputs. The node integrates seamlessly into workflows, providing a straightforward method to enhance images with rich, descriptive text.

MC-LLaVA Node Input Parameters:

image

The image parameter expects an image input in the form of a tensor. This image serves as the primary visual content that the model will analyze to generate a description. The quality and content of the image directly impact the accuracy and relevance of the generated description.

prompt

The prompt parameter is a string input that allows you to provide additional context or guidance for the description generation. This can be a simple phrase or a detailed instruction, and it supports multiline input. The default value is an empty string, meaning no additional context is provided by default.

temperature

The temperature parameter is a float value that controls the randomness of the description generation. A lower value (closer to 0.0) makes the output more deterministic and focused, while a higher value (up to 1.0) introduces more variability and creativity. The default value is 0.1, with a minimum of 0.0 and a maximum of 1.0, adjustable in steps of 0.01.

top_p

The top_p parameter is a float value that applies nucleus sampling to the description generation process. It determines the cumulative probability threshold for selecting the next word in the sequence. A value of 0.9 means that only the top 90% of probable words are considered, promoting diversity while maintaining coherence. The default value is 0.9, with a range from 0.0 to 1.0, adjustable in steps of 0.01.

max_crops

The max_crops parameter is an integer that specifies the maximum number of image crops to consider during analysis. This helps the model focus on different parts of the image to generate a more comprehensive description. The default value is 100, with a minimum of 1 and a maximum of 300, adjustable in steps of 1.

num_tokens

The num_tokens parameter is an integer that defines the maximum number of tokens (words or subwords) in the generated description. This controls the length of the output text. The default value is 728, with a minimum of 1 and a maximum of 2048, adjustable in steps of 1.

MC-LLaVA Node Output Parameters:

STRING

The output parameter is a string that contains the generated description of the input image. This description is crafted based on the visual content of the image and any additional context provided through the prompt. The output is designed to be coherent, contextually relevant, and useful for various applications such as metadata generation, storytelling, or enhancing visual content with descriptive text.

MC-LLaVA Node Usage Tips:

To achieve more creative and varied descriptions, consider increasing the temperature parameter.
Use the prompt parameter to guide the model towards specific themes or details you want to emphasize in the description.
Adjust the max_crops parameter to ensure the model analyzes different parts of the image, which can lead to more comprehensive descriptions.
If you need shorter or longer descriptions, modify the num_tokens parameter accordingly.

MC-LLaVA Node Common Errors and Solutions:

"CUDA out of memory"

Explanation: This error occurs when the GPU does not have enough memory to process the image.
Solution: Reduce the image size or lower the max_crops and num_tokens parameters to decrease memory usage.

"Invalid image format"

Explanation: The input image is not in the expected tensor format.
Solution: Ensure the image is correctly preprocessed and converted to a tensor before inputting it into the node.

"Model not found"

Explanation: The model files are not correctly downloaded or located.
Solution: Verify the model path and ensure all necessary files are downloaded and accessible. You may need to set force_download to True to re-download the model files.

"Invalid prompt format"

Explanation: The prompt provided is not a valid string.
Solution: Ensure the prompt is a properly formatted string, and avoid using unsupported characters or formats.

MC-LLaVA Node Related Nodes

Go back to the extension to check out more related nodes.

VLM_nodes

Table of Content

Description
MC-LLaVA Node:
MC-LLaVA Node Input Parameters:
MC-LLaVA Node Output Parameters:
MC-LLaVA Node Usage Tips:
MC-LLaVA Node Common Errors and Solutions:
Related Nodes

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

LBM Relighting | I2I

Relight subjects using image-based lighting inputs with LBM.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

Wan FusionX | T2V+I2V+VACE Complete

Most powerful video generation solution yet! Cinema-grade detail, your personal film studio.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.