Install this extension via the ComfyUI Manager by searching
for ComfyUI-Qwen-VL-API
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Qwen-VL-API in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-Qwen-VL-API integrates QWen-VL-Plus and QWen-VL-Max into ComfyUI, enhancing its visual language processing capabilities. This extension optimizes image and text analysis within the ComfyUI framework.
ComfyUI-Qwen-VL-API Introduction
ComfyUI-Qwen-VL-API is an extension that integrates the powerful Qwen-VL models into ComfyUI, a user-friendly interface for AI artists. Developed by Alibaba, Qwen-VL models are among the best open-source visual language models available. This extension allows you to leverage these models through an API, enabling advanced image and text processing capabilities directly within ComfyUI.
With ComfyUI-Qwen-VL-API, you can:
Perform detailed image analysis and text recognition.
Engage in multi-round dialogues with the AI, enhancing interactive experiences.
Utilize high-resolution images and various aspect ratios for superior performance in visual tasks.
How ComfyUI-Qwen-VL-API Works
ComfyUI-Qwen-VL-API works by connecting ComfyUI to the Qwen-VL models via an API. Think of it as a bridge that allows ComfyUI to send images and text to the Qwen-VL models, which then process this data and return detailed responses. This process involves:
Input: You provide an image and/or text prompt.
Processing: The API sends this input to the Qwen-VL models.
Output: The models analyze the input and return a response, which can include text descriptions, recognized text from images, or answers to questions.
For example, you can upload an image of a document, and the model will return the text content of the document, or you can ask the model to describe the contents of an image.
ComfyUI-Qwen-VL-API Features
Model Integration
Qwen-VL-Plus: Enhanced version of the Qwen-VL model, offering improved detail recognition and text recognition capabilities. It supports high-resolution images and various aspect ratios.
Qwen-VL-Max: A larger-scale model that further enhances visual reasoning and instruction-following capabilities, providing the highest level of visual perception and cognition.
Nodes
QWenVL_Zho: Supports both Qwen-VL-Plus and Qwen-VL-Max models. Accepts local images as input, which are temporarily stored and automatically deleted after use.
QWenVL_Chat_Zho: Also supports both models and includes a context window for multi-round dialogues. Images are stored in a specific folder and can be manually cleared.
Multi-Round Dialogue
This feature allows for more interactive and context-aware conversations with the AI. You can ask follow-up questions and the model will remember the context of the previous interactions.
Image and Text Processing
The extension can read local images and process them to extract text or provide detailed descriptions. This is particularly useful for tasks like document analysis or detailed image descriptions.
ComfyUI-Qwen-VL-API Models
Qwen-VL-Plus
Description: Enhanced visual language model with improved detail and text recognition.
Use Case: Ideal for tasks requiring high-resolution image analysis and detailed text extraction.
Qwen-VL-Max
Description: Larger-scale model with superior visual reasoning and instruction-following capabilities.
Use Case: Best for complex visual tasks and scenarios requiring high cognitive understanding.
Troubleshooting ComfyUI-Qwen-VL-API
Common Issues and Solutions
API Key Issues:
Problem: API key not working.
Solution: Ensure you have applied for an API key from QWen-VL API Application and added it to the config.json file.
Image Not Loading:
Problem: Local images not being processed.
Solution: Check that the image path is correct and that the image format is supported.
Model Selection:
Problem: Incorrect model being used.
Solution: Ensure the model_name parameter is set correctly to either Qwen-VL-Plus or Qwen-VL-Max.
Frequently Asked Questions
How do I switch between models?
Set the model_name parameter in the node settings to either Qwen-VL-Plus or Qwen-VL-Max.
Where are the images stored?
Images are temporarily stored and automatically deleted after processing. For QWenVL_Chat_Zho, images are stored in the /custom nodes/ComfyUI-Qwen-VL-API/qw folder.
Learn More about ComfyUI-Qwen-VL-API
For additional resources, tutorials, and community support, you can explore the following: