Visit ComfyUI Online for ready-to-use ComfyUI environment
Versatile node for visual question answering with AI models, integrating text and visual inputs for interactive art creation.
Qwen2_VQA is a versatile node designed to facilitate visual question answering (VQA) by leveraging advanced AI models. This node integrates text and visual inputs to generate contextually relevant answers, making it an invaluable tool for AI artists who need to create interactive and intelligent visual content. By processing both text and images or videos, Qwen2_VQA can understand and respond to complex queries about visual data, enhancing the interactivity and depth of your AI-generated art. The node supports various model configurations and quantization options, allowing you to balance performance and resource usage according to your needs.
This parameter accepts a string input that represents the question or prompt you want the model to answer. The text should be clear and concise to ensure accurate responses. The default value is an empty string, and it supports multiline input for more complex queries.
This parameter allows you to select the specific model variant to use for inference. Available options include Qwen2-VL-2B-Instruct-GPTQ-Int4
, Qwen2-VL-2B-Instruct-GPTQ-Int8
, Qwen2-VL-2B-Instruct
, Qwen2-VL-7B-Instruct-GPTQ-Int4
, Qwen2-VL-7B-Instruct-GPTQ-Int8
, and Qwen2-VL-7B-Instruct
. The default model is Qwen2-VL-2B-Instruct
.
This parameter specifies the quantization type to be used, which can help reduce the model's memory footprint. Options include none
, 4bit
, and 8bit
, with the default being none
.
A boolean parameter that determines whether the model should remain loaded in memory after execution. This can be useful for repeated inferences to save loading time. The default value is False
.
This float parameter controls the randomness of the model's output. A higher value (closer to 1) makes the output more random, while a lower value (closer to 0) makes it more deterministic. The default value is 0.7, with a minimum of 0 and a maximum of 1.
This integer parameter sets the maximum number of new tokens to generate in the response. The default value is 2048, with a minimum of 128 and a maximum of 2048.
This integer parameter defines the minimum number of visual tokens per image, which affects the model's processing speed and memory usage. The default value is 256 * 28 * 28, with a minimum of 4 * 28 * 28 and a maximum of 16384 * 28 * 28.
This integer parameter sets the maximum number of visual tokens per image, balancing the detail and performance of the model. The default value is 16384 * 28 * 28.
An integer parameter used to set the random seed for reproducibility. If set to -1, the seed is not fixed. The default value is -1.
This optional parameter accepts a string that specifies the path to the image or video file to be processed. If not provided, the node will only process the text input.
This output parameter provides the generated token IDs after trimming the input token IDs. It represents the model's response to the input query, which can be further processed or converted to text.
keep_model_loaded
parameter to save time on repeated inferences, especially when working on large projects.temperature
parameter to control the creativity of the model's responses; higher values can produce more varied and creative answers.model
and quantization
options to find the best balance between performance and resource usage for your specific needs.source_path
parameter.source_path
parameter.max_new_tokens
or max_pixels
parameters, or use a model with lower memory requirements.© Copyright 2024 RunComfy. All Rights Reserved.