Visit ComfyUI Online for ready-to-use ComfyUI environment
VLM_nodes offers custom nodes for Vision Language Models (VLM) and Large Language Models (LLM), enabling image captioning, automatic prompt generation, creative and consistent prompt suggestions, and keyword extraction.
ComfyUI_VLM_nodes is an extension designed to enhance the capabilities of AI artists by integrating Vision Language Models (VLMs) into the ComfyUI framework. This extension allows you to load and use various VLMs, enabling advanced functionalities such as structured output generation, image-to-music conversion, and automatic prompt generation. By leveraging models like LLaVa, ChatMusician, and InternLM-XComposer2-VL, ComfyUI_VLM_nodes provides a powerful toolset for creating and manipulating AI-generated content, making it easier for artists to achieve their creative goals.
ComfyUI_VLM_nodes operates by integrating VLMs into the ComfyUI environment using the llama-cpp-python
library. This integration allows the extension to load and utilize models in GGUF format, which are specifically designed for vision-language tasks. The extension works by downloading the necessary model files and clip projectors, placing them in the appropriate directories, and then using these models to process and generate content based on user inputs. The structured output node, for example, can extract entities, numbers, and classify prompts, while the image-to-music feature uses VLMs and LLMs to create music from images.
The Structured Output node simplifies the process of obtaining reliable answers from VLMs. It can extract entities, numbers, classify prompts, and generate specific prompts. You can customize the output by adding descriptions to fields and selecting the attributes you want to return.
This feature uses VLMs, LLMs, and AudioLDM-2 to create music from images. The SaveAudioNode allows you to save the generated music in the output
folder. The necessary files are automatically downloaded into the models/LLavacheckpoints/files_for_audioldm2
directory.
Utilizes Chat Musician, an open-source LLM with intrinsic musical abilities, to generate music from text prompts. You can try prompts from the ChatMusician Demo Page. Recommended GGUF files are ChatMusician.Q5_K_M.gguf
or ChatMusician.Q5_K_S.gguf
.
This node integrates the InternLM-XComposer2-VL Model using AutoGPTQ
. It automatically downloads the necessary files into the models/LLavacheckpoints/files_for_internlm
directory. This model is known for its excellent visual perception capabilities.
models/LLavacheckpoints
directory.models/LLavacheckpoints/files_for_audioldm2
directory.temperature
setting in the prompt generation nodes. Higher temperatures result in more creative outputs.For additional resources, tutorials, and community support, you can visit the following links:
© Copyright 2024 RunComfy. All Rights Reserved.