Transform videos with a reference style image using VACE Wan2.1.

Flux Upscaler - Ultimate 32k | Image Upscaler

Flux Upscaler – Achieve 4k, 8k, 16k, and Ultimate 32k Resolution!

ReActor | Fast Face Swap

Professional face swapping toolkit for ComfyUI that enables natural face replacement and enhancement.

Flux Consistent Characters | Input Image

Create consistent characters and ensure they look uniform using your images.

ComfyUI > Nodes > ComfyUI-Qwen-VL-API

ComfyUI Extension: ComfyUI-Qwen-VL-API

Repo Name

ComfyUI-Qwen-VL-API

Author
ZHO-ZHO-ZHO (Account age: 624 days) Nodes
View all nodes(2) Latest Updated
2024-05-22 Github Stars
0.2K

Github Ask ZHO-ZHO-ZHO Current Questions Past Questions

Table of Content

Description
How ComfyUI-Qwen-VL-API Works
ComfyUI-Qwen-VL-API Features
ComfyUI-Qwen-VL-API Models
Troubleshooting ComfyUI-Qwen-VL-API
Learn More about ComfyUI-Qwen-VL-API
Related Nodes

How to Install ComfyUI-Qwen-VL-API

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen-VL-API

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Qwen-VL-API in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-Qwen-VL-API Description

ComfyUI-Qwen-VL-API integrates QWen-VL-Plus and QWen-VL-Max into ComfyUI, enhancing its visual language processing capabilities. This extension optimizes image and text analysis within the ComfyUI framework.

ComfyUI-Qwen-VL-API Introduction

ComfyUI-Qwen-VL-API is an extension that integrates the powerful Qwen-VL models into ComfyUI, a user-friendly interface for AI artists. Developed by Alibaba, Qwen-VL models are among the best open-source visual language models available. This extension allows you to leverage these models through an API, enabling advanced image and text processing capabilities directly within ComfyUI.

With ComfyUI-Qwen-VL-API, you can:

Perform detailed image analysis and text recognition.
Engage in multi-round dialogues with the AI, enhancing interactive experiences.
Utilize high-resolution images and various aspect ratios for superior performance in visual tasks.

How ComfyUI-Qwen-VL-API Works

ComfyUI-Qwen-VL-API works by connecting ComfyUI to the Qwen-VL models via an API. Think of it as a bridge that allows ComfyUI to send images and text to the Qwen-VL models, which then process this data and return detailed responses. This process involves:

Input: You provide an image and/or text prompt.
Processing: The API sends this input to the Qwen-VL models.
Output: The models analyze the input and return a response, which can include text descriptions, recognized text from images, or answers to questions. For example, you can upload an image of a document, and the model will return the text content of the document, or you can ask the model to describe the contents of an image.

ComfyUI-Qwen-VL-API Features

Model Integration

Qwen-VL-Plus: Enhanced version of the Qwen-VL model, offering improved detail recognition and text recognition capabilities. It supports high-resolution images and various aspect ratios.
Qwen-VL-Max: A larger-scale model that further enhances visual reasoning and instruction-following capabilities, providing the highest level of visual perception and cognition.

Nodes

QWenVL_Zho: Supports both Qwen-VL-Plus and Qwen-VL-Max models. Accepts local images as input, which are temporarily stored and automatically deleted after use.
QWenVL_Chat_Zho: Also supports both models and includes a context window for multi-round dialogues. Images are stored in a specific folder and can be manually cleared.

Multi-Round Dialogue

This feature allows for more interactive and context-aware conversations with the AI. You can ask follow-up questions and the model will remember the context of the previous interactions.

Image and Text Processing

The extension can read local images and process them to extract text or provide detailed descriptions. This is particularly useful for tasks like document analysis or detailed image descriptions.

ComfyUI-Qwen-VL-API Models

Qwen-VL-Plus

Description: Enhanced visual language model with improved detail and text recognition.
Use Case: Ideal for tasks requiring high-resolution image analysis and detailed text extraction.

Qwen-VL-Max

Description: Larger-scale model with superior visual reasoning and instruction-following capabilities.
Use Case: Best for complex visual tasks and scenarios requiring high cognitive understanding.

Troubleshooting ComfyUI-Qwen-VL-API

Common Issues and Solutions

API Key Issues:

Problem: API key not working.
Solution: Ensure you have applied for an API key from QWen-VL API Application and added it to the config.json file.

Image Not Loading:

Problem: Local images not being processed.
Solution: Check that the image path is correct and that the image format is supported.

Model Selection:

Problem: Incorrect model being used.
Solution: Ensure the model_name parameter is set correctly to either Qwen-VL-Plus or Qwen-VL-Max.

Frequently Asked Questions

How do I switch between models? Set the model_name parameter in the node settings to either Qwen-VL-Plus or Qwen-VL-Max.
Where are the images stored? Images are temporarily stored and automatically deleted after processing. For QWenVL_Chat_Zho, images are stored in the /custom nodes/ComfyUI-Qwen-VL-API/qw folder.

Learn More about ComfyUI-Qwen-VL-API

For additional resources, tutorials, and community support, you can explore the following:

Qwen-VL GitHub Repository
ComfyUI-Gemini for related extensions and tools.
QWen-VL API Application to get your API key. These resources will help you get the most out of ComfyUI-Qwen-VL-API and enhance your AI art projects.

ComfyUI-Qwen-VL-API Related Nodes

㊙️QWenVL_Chat_Zho

㊙️QWenVL_Zho

Table of Content

Description
How ComfyUI-Qwen-VL-API Works
ComfyUI-Qwen-VL-API Features
ComfyUI-Qwen-VL-API Models
Troubleshooting ComfyUI-Qwen-VL-API
Learn More about ComfyUI-Qwen-VL-API
Related Nodes

Flux Redux | Variation and Restyling

Official Flux Tools - Flux Redux for Image Variation and Restyling

Wan 2.1 | Revolutionary Video Generation

Create incredible videos from text or images with breakthrough AI running on everyday CPUs.

OmniGen | Image-To-Image

OmniGen: Modify Images Based on Reference Images and Prompts

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.