ComfyUI > Nodes > ComfyUI Llava-OneVision > OneVision Caption Folder

ComfyUI Node: OneVision Caption Folder

Class Name

OneVisionCaptionFolder

Category
LLaVA-OneVision
Author
kijai (Account age: 2297days)
Extension
ComfyUI Llava-OneVision
Latest Updated
2024-08-25
Github Stars
0.08K

How to Install ComfyUI Llava-OneVision

Install this extension via the ComfyUI Manager by searching for ComfyUI Llava-OneVision
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI Llava-OneVision in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

OneVision Caption Folder Description

Automates detailed image captions generation using LLaVA-OneVision model for AI artists annotating large image batches efficiently.

OneVision Caption Folder:

The OneVisionCaptionFolder node is designed to automate the process of generating detailed captions for images stored within a specified folder. Utilizing the LLaVA-OneVision model, this node processes each image, applies necessary transformations, and generates descriptive captions that are useful for image model training purposes. The captions focus on the composition, style, and actions within the image, as well as the background, without making assumptions or storytelling. This node is particularly beneficial for AI artists who need to annotate large batches of images efficiently, ensuring high-quality and consistent descriptions.

OneVision Caption Folder Input Parameters:

llava_model

This parameter specifies the LLaVA-OneVision model to be used for generating captions. The model is responsible for interpreting the images and creating detailed descriptions.

folder_path

This is the path to the folder containing the images you want to caption. The node will process all images within this folder that have the extensions .png, .jpg, or .jpeg.

prompt

The prompt is a string that guides the captioning process. It provides context to the model on how to generate the captions. The default prompt is designed to produce detailed and useful descriptions for image model training.

max_tokens

This integer parameter sets the maximum number of tokens (words or word pieces) for each generated caption. The default value is 512, with a minimum of 1 and a maximum of 8192. Adjusting this value controls the length and detail of the captions.

keep_model_loaded

A boolean parameter that determines whether the LLaVA-OneVision model should remain loaded in memory after processing. The default value is True, which can speed up subsequent operations but uses more memory.

temperature

This float parameter controls the randomness of the caption generation. A lower value (closer to 0) makes the output more deterministic, while a higher value (up to 1.0) introduces more variability. The default value is 0.2.

seed

An integer used to seed the random number generator for reproducibility. The default value is 1, with a minimum of 1 and a maximum of 0xffffffffffffffff.

max_image_size

This integer parameter sets the maximum size (in pixels) for the longest dimension of the images. Images larger than this size will be resized. The default value is 1024, with a minimum of 256 and a maximum of 8192.

prefix

A string that will be added at the beginning of each generated caption. This can be used to add consistent introductory text to all captions.

suffix

A string that will be added at the end of each generated caption. This can be used to add consistent concluding text to all captions.

OneVision Caption Folder Output Parameters:

STRING

The output is a list of strings, where each string is a caption generated for an image in the specified folder. These captions provide detailed descriptions of the images, focusing on composition, style, actions, and background elements.

OneVision Caption Folder Usage Tips:

  • Ensure that the folder path is correctly specified and contains only the images you want to caption.
  • Adjust the max_tokens parameter to control the length and detail of the captions based on your needs.
  • Use the prefix and suffix parameters to add consistent text to all captions, which can be useful for specific annotation requirements.
  • Set keep_model_loaded to True if you plan to caption multiple folders in succession to save time on model loading.

OneVision Caption Folder Common Errors and Solutions:

Cannot open image: <image_path>

  • Explanation: This error occurs when the node is unable to open an image file, possibly due to file corruption or unsupported format.
  • Solution: Verify that the image files in the folder are not corrupted and are in supported formats (.png, .jpg, .jpeg).

Model loading failed

  • Explanation: This error indicates that the LLaVA-OneVision model could not be loaded, which might be due to incorrect model path or insufficient resources.
  • Solution: Ensure that the model path is correct and that your system has enough resources to load the model.

Image size exceeds maximum allowed

  • Explanation: This error occurs when an image exceeds the specified max_image_size.
  • Solution: Increase the max_image_size parameter or resize the images manually to fit within the specified dimensions.

Insufficient tokens for caption generation

  • Explanation: This error indicates that the max_tokens parameter is set too low to generate a meaningful caption.
  • Solution: Increase the max_tokens parameter to allow for more detailed captions.

OneVision Caption Folder Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI Llava-OneVision
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.