Advanced audio-driven lip sync technology.

CogVideoX Tora | Image-to-Video Model

Subject Trajectory Video Demo for CogVideoX

Stable Diffusion 3.5

Stable Diffusion 3.5 (SD3.5) for high-quality, diverse image generation.

FLUX Outpainting

Use SDXL and FLUX to expand and refine images seamlessly.

ComfyUI > Nodes > ComfyUI-Florence2 > Florence2Run

ComfyUI Node: Florence2Run

Class Name

Florence2Run

Category
Florence2

Author
kijai (Account age: 2467days) Extension
ComfyUI-Florence2 Latest Updated
2025-03-23 Github Stars
1.11K

Github Ask kijai Current Questions Past Questions

Table of Content

Description
Florence2Run:
Florence2Run Input Parameters:
Florence2Run Output Parameters:
Florence2Run Usage Tips:
Florence2Run Common Errors and Solutions:
Related Nodes

How to Install ComfyUI-Florence2

Install this extension via the ComfyUI Manager by searching for ComfyUI-Florence2

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Florence2 in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

Florence2Run Description

Powerful node for image and text processing with Florence2 AI model, enabling object detection, captioning, region proposal, OCR, and resource management.

Florence2Run:

Florence2Run is a powerful node designed to process images and text inputs using the Florence2 model, a sophisticated AI model for various image and text-related tasks. This node allows you to perform tasks such as object detection, dense region captioning, region proposal, and more, by leveraging the capabilities of the Florence2 model. It is particularly useful for AI artists who want to generate detailed captions, segment images based on referring expressions, or perform OCR (Optical Character Recognition) on images. The node ensures that the model is efficiently loaded and offloaded to manage resources effectively, making it a versatile tool for complex image and text processing tasks.

Florence2Run Input Parameters:

image

The image parameter is the input image that you want to process using the Florence2 model. This image serves as the primary data on which various tasks like object detection, captioning, and segmentation will be performed. The quality and content of the image can significantly impact the results of the processing tasks.

text_input

The text_input parameter is used to provide additional textual information or prompts for specific tasks such as referring_expression_segmentation and caption_to_phrase_grounding. This input helps the model to focus on particular aspects of the image based on the provided text. If the task does not require text input, this parameter can be left empty.

florence2_model

The florence2_model parameter is the pre-loaded Florence2 model that will be used for processing the image and text inputs. This model includes both the neural network and the processor required for the tasks. It is essential to ensure that the correct model is loaded to achieve accurate results.

task

The task parameter specifies the type of processing you want to perform on the image. Options include region_caption, dense_region_caption, region_proposal, caption, detailed_caption, more_detailed_caption, caption_to_phrase_grounding, referring_expression_segmentation, ocr, and ocr_with_region. Each task has a specific purpose and will generate different types of outputs based on the input image and text.

fill_mask

The fill_mask parameter is a boolean flag that indicates whether to fill the mask for segmentation tasks. When set to True, the model will generate a filled mask for the segmented regions in the image. This is particularly useful for tasks like referring_expression_segmentation.

keep_model_loaded

The keep_model_loaded parameter is a boolean flag that determines whether the Florence2 model should remain loaded in memory after processing. Setting this to True can save time if you plan to run multiple tasks sequentially, but it will consume more memory. The default value is False.

num_beams

The num_beams parameter controls the number of beams used in beam search for generating captions. A higher number of beams can lead to more accurate and diverse captions but will increase the computational load. The default value is 3.

max_new_tokens

The max_new_tokens parameter specifies the maximum number of new tokens to generate for tasks involving text generation, such as captioning. This limits the length of the generated text and helps control the output size. The default value is 1024.

do_sample

The do_sample parameter is a boolean flag that determines whether to use sampling for text generation. When set to True, the model will generate text by sampling from the probability distribution, leading to more diverse outputs. The default value is True.

Florence2Run Output Parameters:

out_tensor

The out_tensor parameter is the primary output tensor containing the processed image data. This tensor includes the results of the specified task, such as detected objects, generated captions, or segmented regions. It is essential for further analysis or visualization of the results.

out_mask_tensor

The out_mask_tensor parameter is the output tensor containing the mask data for segmentation tasks. This tensor provides the segmented regions of the image, which can be used for detailed analysis or further processing. If no masks are generated, this tensor will contain a default value.

out_results

The out_results parameter is a list of additional results generated by the Florence2 model. This can include various metadata, such as confidence scores, bounding boxes, or textual descriptions, depending on the specified task. These results provide valuable insights into the processed image and text data.

Florence2Run Usage Tips:

Ensure that the input image is of high quality and relevant to the task to achieve the best results.
Use the text_input parameter effectively for tasks like referring_expression_segmentation to guide the model's focus.
Experiment with the num_beams and max_new_tokens parameters to balance between computational load and output quality.
Set keep_model_loaded to True if you plan to run multiple tasks sequentially to save loading time.

Florence2Run Common Errors and Solutions:

ValueError: Text input (prompt) is only supported for 'referring_expression_segmentation' and 'caption_to_phrase_grounding'

Explanation: This error occurs when text_input is provided for a task that does not support it.
Solution: Ensure that text_input is only provided for the supported tasks or leave it empty for other tasks.

Offloading model...

Explanation: This message indicates that the model is being offloaded from memory to manage resources.
Solution: This is an informational message and does not require any action. If you want to keep the model loaded, set keep_model_loaded to True.

Downloading Lumina model to: `<model_path>`

Explanation: This message indicates that the model is being downloaded because it is not found locally.
Solution: Wait for the download to complete. Ensure you have a stable internet connection and sufficient disk space.

using `<attention>` for attention

Explanation: This message indicates the type of attention mechanism being used by the model.
Solution: This is an informational message and does not require any action.

Florence2Run Related Nodes

Go back to the extension to check out more related nodes.

ComfyUI-Florence2

Table of Content

Description
Florence2Run:
Florence2Run Input Parameters:
Florence2Run Output Parameters:
Florence2Run Usage Tips:
Florence2Run Common Errors and Solutions:
Related Nodes

Wan 2.1 Video Restyle | Consistent Video Style Transform

Transform your video style by applying the restyled first frame using Wan 2.1 video restyle workflow.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

HiDream-I1 | T2I

High-quality image generation using a 17B parameter model.

AP Workflow 12.0 | Ready-to-Use Complete AI Media Suite

Pre-set all-in-one system for image & video generation, enhancement, and manipulation. Zero setup required.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.