Sonic delivers advanced audio-driven lip-sync for portraits with high-quality animation.

LivePortrait | Animate Portraits | Vid2Vid

Transfer facial expressions and movements from a driving video onto a source video

Insert Anything | Reference-Based Image Editing

Insert any subject into images with mask or text guidance.

ICEdit | Fast AI Image Editing with Nunchaku

ICEdit+Nunchaku: A solution for ultra-fast, precise AI image editing.

ComfyUI > Nodes > ComfyUI_Qwen2-VL-Instruct

ComfyUI Extension: ComfyUI_Qwen2-VL-Instruct

Repo Name

ComfyUI_Qwen2-VL-Instruct

Author
IuvenisSapiens (Account age: 695 days) Nodes
View all nodes(2) Latest Updated
2025-04-02 Github Stars
0.09K

Github Ask IuvenisSapiens Current Questions Past Questions

Table of Content

Description
How ComfyUI_Qwen2-VL-Instruct Works
ComfyUI_Qwen2-VL-Instruct Features
ComfyUI_Qwen2-VL-Instruct Models
What's New with ComfyUI_Qwen2-VL-Instruct
Troubleshooting ComfyUI_Qwen2-VL-Instruct
Learn More about ComfyUI_Qwen2-VL-Instruct
Related Nodes

How to Install ComfyUI_Qwen2-VL-Instruct

Install this extension via the ComfyUI Manager by searching for ComfyUI_Qwen2-VL-Instruct

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI_Qwen2-VL-Instruct in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI_Qwen2-VL-Instruct Description

ComfyUI_Qwen2-VL-Instruct enables text, video, single-image, and multi-image queries to generate captions or responses, integrating Qwen2-VL-Instruct with ComfyUI for versatile query support.

ComfyUI_Qwen2-VL-Instruct Introduction

ComfyUI_Qwen2-VL-Instruct is an extension for the ComfyUI platform that integrates the powerful Qwen2-VL-Instruct model. This extension allows you to perform a variety of tasks such as generating captions or responses based on text, video, single-image, or multi-image queries. It is designed to help AI artists and other users easily generate descriptive content and analyze visual data without needing extensive technical knowledge.

Key Features:

Text-based Queries: Generate descriptions or responses from textual inputs.
Video Queries: Analyze video content to generate detailed captions or summaries.
Single-Image Queries: Create captions for individual images.
Multi-Image Queries: Generate collective descriptions or narratives from multiple images.

How ComfyUI_Qwen2-VL-Instruct Works

ComfyUI_Qwen2-VL-Instruct leverages the Qwen2-VL-Instruct model to process various types of input data and generate meaningful outputs. Here’s a simplified explanation of how it works:

Input Processing: The extension accepts different types of inputs such as text, images, and videos.
Model Analysis: The Qwen2-VL-Instruct model analyzes the input data. For text, it processes the query to understand the context. For images and videos, it extracts visual features and interprets them.
Output Generation: Based on the analysis, the model generates appropriate captions, descriptions, or responses. For example, if you upload a video and ask for a summary, the model will analyze each frame and provide a detailed summary of the video content.

ComfyUI_Qwen2-VL-Instruct Features

Text-based Query

Submit textual queries to request information or generate descriptions. For instance, you might input a query like "What is the meaning of life?" and receive a thoughtful response.

Chat_with_text_workflow preview

Video Query

Upload a video to generate detailed captions for each frame or a summary of the entire video. For example, you can ask, "Generate a caption for the given video."

Chat_with_video_workflow preview

Single-Image Query

Upload a single image to generate a caption. For instance, you could upload a photo and ask, "What does this image show?" resulting in a caption like "A majestic lion pride relaxing on the savannah."

Chat_with_single_image_workflow preview

Multi-Image Query

Upload multiple images to generate a collective description or a narrative that ties the images together. For example, you might ask, "Create a story from the following series of images: one of a couple at a beach, another at a wedding ceremony, and the last one at a baby's christening."

Chat_with_multiple_images_workflow preview

ComfyUI_Qwen2-VL-Instruct Models

The extension uses the Qwen2-VL-Instruct model, which is available in different sizes to suit various needs:

Qwen2-VL-2B: Suitable for smaller tasks and quicker responses.
Qwen2-VL-7B: A balanced model for general use.
Qwen2-VL-72B: The most powerful model, ideal for complex tasks and detailed analysis. Each model can be selected based on the complexity of the task and the desired level of detail in the output.

What's New with ComfyUI_Qwen2-VL-Instruct

Recent Updates:

2024.09.19: Released the instruction-tuned Qwen2-VL-72B model and its quantized versions (AWQ, GPTQ-Int4, GPTQ-Int8).
2024.08.30: Launched the Qwen2-VL series, including the 2B and 7B models. These updates bring enhanced performance and new capabilities, making the extension more powerful and versatile for AI artists.

Troubleshooting ComfyUI_Qwen2-VL-Instruct

Common Issues and Solutions:

Model Not Loading:

Ensure that the model files are in the correct directory (ComfyUI\models\prompt_generator\).
Check your internet connection if the models need to be downloaded automatically.

Incorrect Outputs:

Verify that the input data is clear and correctly formatted.
Try using a different model size if the current one does not meet your needs.

Performance Issues:

Ensure your system meets the minimum requirements for running the models.
Close other applications to free up system resources.

Frequently Asked Questions:

Q: Can I use this extension without a GPU?
A: Yes, but performance will be significantly slower. It is recommended to use a GPU for optimal performance.
Q: How do I update the models?
A: Models are updated automatically when running the workflow if they are not found in the specified directory.

Learn More about ComfyUI_Qwen2-VL-Instruct

For additional resources, tutorials, and community support, visit the following links:

ComfyUI Examples
ComfyUI GitHub Repository
Qwen2-VL-Instruct GitHub Repository These resources provide comprehensive guides and examples to help you get the most out of ComfyUI_Qwen2-VL-Instruct.

ComfyUI_Qwen2-VL-Instruct Related Nodes

Multiple Paths Input

Qwen2 VQA

Table of Content

Description
How ComfyUI_Qwen2-VL-Instruct Works
ComfyUI_Qwen2-VL-Instruct Features
ComfyUI_Qwen2-VL-Instruct Models
What's New with ComfyUI_Qwen2-VL-Instruct
Troubleshooting ComfyUI_Qwen2-VL-Instruct
Learn More about ComfyUI_Qwen2-VL-Instruct
Related Nodes

ReActor | Fast Face Swap

With ComfyUI ReActor, you can easily swap the faces of one or more characters in images or videos.

Self Forcing | Autoregressive Keyframe-to-Video Generation

SUPER FAST! 5-second video in 45 seconds!

Consistent Character Creator

Create consistent, high-resolution character designs from multiple angles with full control over emotions, lighting, and environments.

Flux Upscaler - Ultimate 32k | Image Upscaler

Flux Upscaler – Achieve 4k, 8k, 16k, and Ultimate 32k Resolution!

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.