ComfyUI > Nodes > ComfyUI-Qwen-VL-API

ComfyUI Extension: ComfyUI-Qwen-VL-API

Repo Name

ComfyUI-Qwen-VL-API

Author
ZHO-ZHO-ZHO (Account age: 340 days)
Nodes
View all nodes(2)
Latest Updated
2024-05-22
Github Stars
0.19K

How to Install ComfyUI-Qwen-VL-API

Install this extension via the ComfyUI Manager by searching for ComfyUI-Qwen-VL-API
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Qwen-VL-API in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

ComfyUI-Qwen-VL-API Description

ComfyUI-Qwen-VL-API integrates QWen-VL-Plus and QWen-VL-Max into ComfyUI, enhancing its visual language processing capabilities. This extension optimizes image and text analysis within the ComfyUI framework.

ComfyUI-Qwen-VL-API Introduction

ComfyUI-Qwen-VL-API is an extension that integrates the powerful Qwen-VL models into ComfyUI, a user-friendly interface for AI artists. Developed by Alibaba, Qwen-VL models are among the best open-source visual language models available. This extension allows you to leverage these models through an API, enabling advanced image and text processing capabilities directly within ComfyUI.

With ComfyUI-Qwen-VL-API, you can:

  • Perform detailed image analysis and text recognition.
  • Engage in multi-round dialogues with the AI, enhancing interactive experiences.
  • Utilize high-resolution images and various aspect ratios for superior performance in visual tasks.

How ComfyUI-Qwen-VL-API Works

ComfyUI-Qwen-VL-API works by connecting ComfyUI to the Qwen-VL models via an API. Think of it as a bridge that allows ComfyUI to send images and text to the Qwen-VL models, which then process this data and return detailed responses. This process involves:

  1. Input: You provide an image and/or text prompt.
  2. Processing: The API sends this input to the Qwen-VL models.
  3. Output: The models analyze the input and return a response, which can include text descriptions, recognized text from images, or answers to questions. For example, you can upload an image of a document, and the model will return the text content of the document, or you can ask the model to describe the contents of an image.

ComfyUI-Qwen-VL-API Features

Model Integration

  • Qwen-VL-Plus: Enhanced version of the Qwen-VL model, offering improved detail recognition and text recognition capabilities. It supports high-resolution images and various aspect ratios.
  • Qwen-VL-Max: A larger-scale model that further enhances visual reasoning and instruction-following capabilities, providing the highest level of visual perception and cognition.

Nodes

  • QWenVL_Zho: Supports both Qwen-VL-Plus and Qwen-VL-Max models. Accepts local images as input, which are temporarily stored and automatically deleted after use.
  • QWenVL_Chat_Zho: Also supports both models and includes a context window for multi-round dialogues. Images are stored in a specific folder and can be manually cleared.

Multi-Round Dialogue

This feature allows for more interactive and context-aware conversations with the AI. You can ask follow-up questions and the model will remember the context of the previous interactions.

Image and Text Processing

The extension can read local images and process them to extract text or provide detailed descriptions. This is particularly useful for tasks like document analysis or detailed image descriptions.

ComfyUI-Qwen-VL-API Models

Qwen-VL-Plus

  • Description: Enhanced visual language model with improved detail and text recognition.
  • Use Case: Ideal for tasks requiring high-resolution image analysis and detailed text extraction.

Qwen-VL-Max

  • Description: Larger-scale model with superior visual reasoning and instruction-following capabilities.
  • Use Case: Best for complex visual tasks and scenarios requiring high cognitive understanding.

Troubleshooting ComfyUI-Qwen-VL-API

Common Issues and Solutions

  1. API Key Issues:
  • Problem: API key not working.
  • Solution: Ensure you have applied for an API key from QWen-VL API Application and added it to the config.json file.
  1. Image Not Loading:
  • Problem: Local images not being processed.
  • Solution: Check that the image path is correct and that the image format is supported.
  1. Model Selection:
  • Problem: Incorrect model being used.
  • Solution: Ensure the model_name parameter is set correctly to either Qwen-VL-Plus or Qwen-VL-Max.

Frequently Asked Questions

  • How do I switch between models? Set the model_name parameter in the node settings to either Qwen-VL-Plus or Qwen-VL-Max.

  • Where are the images stored? Images are temporarily stored and automatically deleted after processing. For QWenVL_Chat_Zho, images are stored in the /custom nodes/ComfyUI-Qwen-VL-API/qw folder.

Learn More about ComfyUI-Qwen-VL-API

For additional resources, tutorials, and community support, you can explore the following:

ComfyUI-Qwen-VL-API Related Nodes

RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.