DreamO | Unified Multi-Task Image Customization Framework

Perform identity, style, try-on, and multi-condition image generation from 1–3 references

PuLID Flux II | Consistent Character Generation

Generate images with precise character control while preserving artistic style.

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

CogVideoX Tora | Image-to-Video Model

Subject Trajectory Video Demo for CogVideoX

ComfyUI > Nodes > ComfyUI-PixtralLlamaMolmoVision > Generate Text with Molmo

ComfyUI Node: Generate Text with Molmo

Class Name

MolmoGenerateText

Category
PixtralLlamaVision/Molmo

Author
SeanScripts (Account age: 1805days) Extension
ComfyUI-PixtralLlamaMolmoVision Latest Updated
2025-01-31 Github Stars
0.07K

Github Ask SeanScripts Current Questions Past Questions

Table of Content

Description
MolmoGenerateText:
MolmoGenerateText Input Parameters:
Related Nodes

How to Install ComfyUI-PixtralLlamaMolmoVision

Install this extension via the ComfyUI Manager by searching for ComfyUI-PixtralLlamaMolmoVision

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-PixtralLlamaMolmoVision in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Generate Text with Molmo Description

Generate text using Molmo model, integrating visual cues for coherent and contextually relevant output.

MolmoGenerateText:

MolmoGenerateText is a powerful node designed to generate text using the Molmo model, which is particularly adept at processing visual and textual inputs. This node allows you to input a series of images along with a textual prompt, enabling the model to generate coherent and contextually relevant text based on the visual content provided. The primary benefit of using MolmoGenerateText is its ability to seamlessly integrate visual cues into text generation, making it ideal for applications that require a deep understanding of both image and text data. This node is especially useful for creative projects where you want to describe images or generate narratives that are informed by visual elements. By leveraging advanced text generation techniques, MolmoGenerateText ensures that the output is not only relevant but also engaging and insightful.

MolmoGenerateText Input Parameters:

molmo_model

This parameter specifies the vision model to be used for text generation. It is crucial as it determines the model's ability to interpret and generate text based on the provided images and prompts.

images

A list of images that the model will use as input. The number of images should match the number of [IMG] tokens in the prompt. These images provide the visual context necessary for generating relevant text.

system_prompt

A string that serves as an initial prompt or context for the model. It can be multiline and is used to set the stage for the text generation process. The default value is an empty string.

prompt

This is the main textual input that guides the text generation. It should include [IMG] tokens corresponding to the images provided. The default prompt is "Describe this image."

max_new_tokens

An integer that sets the maximum number of new tokens the model can generate. This controls the length of the generated text, with a default of 256 and a range from 1 to 4096.

do_sample

A boolean that determines whether sampling is used during text generation. When set to true, the model will generate more diverse outputs. The default value is true.

temperature

A float that influences the randomness of the text generation. Higher values result in more random outputs, while lower values make the output more deterministic. The default is 0.3, with a minimum of 0.

top_p

A float that sets the cumulative probability threshold for token selection. It helps in controlling the diversity of the generated text. The default value is 0.9, with a range from 0.0 to 1.0.

top_k

An integer that limits the number of highest probability tokens to consider during generation. This parameter helps in focusing the output. The default is 40, with a minimum of 1.

stop_strings

A string that specifies the stopping criteria for text generation. The model will stop generating text when it encounters this string. The default is `

Generate Text with Molmo Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-PixtralLlamaMolmoVision

Table of Content

Description
MolmoGenerateText:
MolmoGenerateText Input Parameters:
Related Nodes

Wan 2.1 Fun | I2V + T2V

Empower your AI videos with Wan 2.1 Fun.

VACE Wan2.1 | V2V

Transform videos with a reference style image using VACE Wan2.1.

FLUX Inpainting | Seamless Image Editing

Effortlessly fill, remove, and refine images, seamlessly integrating new content.

Flux UltraRealistic LoRA V2

Create stunningly lifelike image with Flux UltraRealistic LoRA V2

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.