Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

FLUX Img2Img | Merge Visuals and Prompts

Merge visuals and prompts for stunning, enhanced results.

PMRF Ultra Fast Upscaler | Low VRAM ComfyUI

Ultra fast PMRF upscaler! 3.79s on medium machine. 2x scale.

ComfyUI > Nodes > ComfyUI_Qwen2-VL-Instruct > Qwen2 VQA

ComfyUI Node: Qwen2 VQA

Class Name

Qwen2_VQA

Category
Comfyui_Qwen2-VL-Instruct

Author
IuvenisSapiens (Account age: 695days) Extension
ComfyUI_Qwen2-VL-Instruct Latest Updated
2025-04-02 Github Stars
0.09K

Github Ask IuvenisSapiens Current Questions Past Questions

Table of Content

Description
Qwen2 VQA:
Qwen2 VQA Input Parameters:
Qwen2 VQA Output Parameters:
Qwen2 VQA Usage Tips:
Qwen2 VQA Common Errors and Solutions:
Related Nodes

How to Install ComfyUI_Qwen2-VL-Instruct

Install this extension via the ComfyUI Manager by searching for ComfyUI_Qwen2-VL-Instruct

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_Qwen2-VL-Instruct in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Qwen2 VQA Description

Versatile node for visual question answering with AI models, integrating text and visual inputs for interactive art creation.

Qwen2 VQA:

Qwen2_VQA is a versatile node designed to facilitate visual question answering (VQA) by leveraging advanced AI models. This node integrates text and visual inputs to generate contextually relevant answers, making it an invaluable tool for AI artists who need to create interactive and intelligent visual content. By processing both text and images or videos, Qwen2_VQA can understand and respond to complex queries about visual data, enhancing the interactivity and depth of your AI-generated art. The node supports various model configurations and quantization options, allowing you to balance performance and resource usage according to your needs.

Qwen2 VQA Input Parameters:

text

This parameter accepts a string input that represents the question or prompt you want the model to answer. The text should be clear and concise to ensure accurate responses. The default value is an empty string, and it supports multiline input for more complex queries.

model

This parameter allows you to select the specific model variant to use for inference. Available options include Qwen2-VL-2B-Instruct-GPTQ-Int4, Qwen2-VL-2B-Instruct-GPTQ-Int8, Qwen2-VL-2B-Instruct, Qwen2-VL-7B-Instruct-GPTQ-Int4, Qwen2-VL-7B-Instruct-GPTQ-Int8, and Qwen2-VL-7B-Instruct. The default model is Qwen2-VL-2B-Instruct.

quantization

This parameter specifies the quantization type to be used, which can help reduce the model's memory footprint. Options include none, 4bit, and 8bit, with the default being none.

keep_model_loaded

A boolean parameter that determines whether the model should remain loaded in memory after execution. This can be useful for repeated inferences to save loading time. The default value is False.

temperature

This float parameter controls the randomness of the model's output. A higher value (closer to 1) makes the output more random, while a lower value (closer to 0) makes it more deterministic. The default value is 0.7, with a minimum of 0 and a maximum of 1.

max_new_tokens

This integer parameter sets the maximum number of new tokens to generate in the response. The default value is 2048, with a minimum of 128 and a maximum of 2048.

min_pixels

This integer parameter defines the minimum number of visual tokens per image, which affects the model's processing speed and memory usage. The default value is 256 * 28 * 28, with a minimum of 4 * 28 * 28 and a maximum of 16384 * 28 * 28.

max_pixels

This integer parameter sets the maximum number of visual tokens per image, balancing the detail and performance of the model. The default value is 16384 * 28 * 28.

seed

An integer parameter used to set the random seed for reproducibility. If set to -1, the seed is not fixed. The default value is -1.

source_path

This optional parameter accepts a string that specifies the path to the image or video file to be processed. If not provided, the node will only process the text input.

Qwen2 VQA Output Parameters:

generated_ids_trimmed

This output parameter provides the generated token IDs after trimming the input token IDs. It represents the model's response to the input query, which can be further processed or converted to text.

Qwen2 VQA Usage Tips:

Ensure your text input is clear and specific to get the most accurate responses from the model.
Use the keep_model_loaded parameter to save time on repeated inferences, especially when working on large projects.
Adjust the temperature parameter to control the creativity of the model's responses; higher values can produce more varied and creative answers.
Experiment with different model and quantization options to find the best balance between performance and resource usage for your specific needs.

Qwen2 VQA Common Errors and Solutions:

ValueError: Either image or video must be provided

Explanation: This error occurs when neither an image nor a video is provided in the source_path parameter.
Solution: Ensure that you provide a valid path to an image or video file in the source_path parameter.

Model checkpoint not found

Explanation: This error occurs when the specified model checkpoint cannot be found in the local directory.
Solution: Verify that the model checkpoint exists in the specified directory or allow the node to download it from the Hugging Face Hub.

CUDA out of memory

Explanation: This error occurs when the GPU runs out of memory during model inference.
Solution: Reduce the max_new_tokens or max_pixels parameters, or use a model with lower memory requirements.

Qwen2 VQA Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI_Qwen2-VL-Instruct

Table of Content

Description
Qwen2 VQA:
Qwen2 VQA Input Parameters:
Qwen2 VQA Output Parameters:
Qwen2 VQA Usage Tips:
Qwen2 VQA Common Errors and Solutions:
Related Nodes

Product Relighting | Magnific.AI Relight Alternative

Elevate your product photography effortlessly, a top alternative to Magnific.AI Relight.

ReActor | Fast Face Swap

Professional face swapping toolkit for ComfyUI that enables natural face replacement and enhancement.

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

FLUX ControlNet Depth-V3 & Canny-V3

Achieve better control with FLUX-ControlNet-Depth & FLUX-ControlNet-Canny for FLUX.1 [dev].

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.