Mochi Edit: Modify Videos Using Text-Based Prompts and Unsampling.

IDM-VTON | Virtual Try-on

Virtual try-on creating realistic results by capturing garment details and style.

Wan 2.1 Fun | ControlNet Video Generation

Generate videos with ControlNet-style visual passes like Depth, Canny, and OpenPose.

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

ComfyUI > Nodes > VLM_nodes > Moondream-2 Node

ComfyUI Node: Moondream-2 Node

Class Name

Moondream2model

Category
VLM Nodes/Moondream2

Author
gokayfem (Account age: 1342days) Extension
VLM_nodes Latest Updated
2025-02-13 Github Stars
0.48K

Github Ask gokayfem Current Questions Past Questions

Table of Content

Description
Moondream-2 Node:
Moondream-2 Node Input Parameters:
Moondream-2 Node Output Parameters:
Moondream-2 Node Usage Tips:
Moondream-2 Node Common Errors and Solutions:
Related Nodes

How to Install VLM_nodes

Install this extension via the ComfyUI Manager by searching for VLM_nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter VLM_nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Moondream-2 Node Description

Generate text predictions from images and prompts using advanced visual-language models for AI artists creating interactive art.

Moondream-2 Node:

The Moondream2model node is designed to generate text predictions based on an input image and a textual prompt. This node leverages advanced visual-language models to interpret the content of an image and provide relevant textual responses, making it a powerful tool for AI artists who want to create interactive and context-aware art pieces. By combining image analysis with natural language processing, the Moondream2model can offer insightful and creative text outputs that enhance the storytelling and descriptive aspects of visual art. This node is particularly useful for generating captions, descriptions, or even narrative elements that are directly influenced by the visual content provided.

Moondream-2 Node Input Parameters:

image

The image parameter expects an image input that the model will analyze to generate text predictions. This image should be in a format that can be processed by the model, typically a tensor image that will be converted to a PIL Image internally. The quality and content of the image significantly impact the relevance and accuracy of the generated text. There are no specific minimum or maximum values for this parameter, but the image should be clear and relevant to the desired output.

text_input

The text_input parameter is a string that serves as a prompt or question for the model to generate predictions based on the provided image. This input can be multiline and is used to guide the model in producing contextually appropriate text. The default value for this parameter is an empty string, meaning that if no text is provided, the model may generate a more generic response based on the image alone. The quality and specificity of the text input can greatly influence the detail and relevance of the generated predictions.

Moondream-2 Node Output Parameters:

STRING

The output parameter is a STRING that contains the text generated by the model based on the input image and text prompt. This output is the result of the model's analysis and interpretation of the visual and textual inputs, providing a coherent and contextually relevant text response. The generated text can be used for various purposes, such as captions, descriptions, or narrative elements in AI art projects.

Moondream-2 Node Usage Tips:

Ensure that the input image is clear and relevant to the desired output to improve the accuracy and relevance of the generated text.
Use specific and detailed text prompts to guide the model in producing more contextually appropriate and detailed responses.
Experiment with different combinations of images and text inputs to explore the creative potential of the model and discover unique and interesting outputs.

Moondream-2 Node Common Errors and Solutions:

"Image input is not in the correct format"

Explanation: This error occurs when the provided image is not in a format that the model can process.
Solution: Ensure that the image is a tensor image that can be converted to a PIL Image. Check the image preprocessing steps to confirm the correct format.

"Text input is too long"

Explanation: This error happens when the text input exceeds the maximum length that the model can handle.
Solution: Shorten the text input to fit within the model's maximum length constraints. Break down longer prompts into smaller, more manageable parts.

"Model loading failed"

Explanation: This error indicates that the model could not be loaded, possibly due to missing files or incorrect paths.
Solution: Verify that the model files are correctly downloaded and located in the specified directory. Check the paths and ensure that all necessary files are present.

"CUDA device not available"

Explanation: This error occurs when the model is set to use a CUDA device, but no compatible device is available.
Solution: Ensure that a CUDA-compatible GPU is available and properly configured. Alternatively, set the model to use the CPU if a GPU is not available.

Moondream-2 Node Related Nodes

Go back to the extension to check out more related nodes.

VLM_nodes

Table of Content

Description
Moondream-2 Node:
Moondream-2 Node Input Parameters:
Moondream-2 Node Output Parameters:
Moondream-2 Node Usage Tips:
Moondream-2 Node Common Errors and Solutions:
Related Nodes

SkyReels-A2 | Multi-Element Video Generation

Combine multi elements into dynamic videos with precision.

Hunyuan3D-1 | ComfyUI 3D Pack

Create multi-view RGB images first, then transform them into 3D assets.

FLUX LoRA (RealismLoRA) | Photorealistic Images

Blend FLUX-1 model with FLUX-RealismLoRA for photorealistic AI images

Advanced Live Portrait | Parameter Control

Use customizable parameters to control every feature, from eye blinks to head movements, for natural results.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.