Visit ComfyUI Online for ready-to-use ComfyUI environment
Automates detailed image captions generation using LLaVA-OneVision model for AI artists annotating large image batches efficiently.
The OneVisionCaptionFolder node is designed to automate the process of generating detailed captions for images stored within a specified folder. Utilizing the LLaVA-OneVision model, this node processes each image, applies necessary transformations, and generates descriptive captions that are useful for image model training purposes. The captions focus on the composition, style, and actions within the image, as well as the background, without making assumptions or storytelling. This node is particularly beneficial for AI artists who need to annotate large batches of images efficiently, ensuring high-quality and consistent descriptions.
This parameter specifies the LLaVA-OneVision model to be used for generating captions. The model is responsible for interpreting the images and creating detailed descriptions.
This is the path to the folder containing the images you want to caption. The node will process all images within this folder that have the extensions .png, .jpg, or .jpeg.
The prompt is a string that guides the captioning process. It provides context to the model on how to generate the captions. The default prompt is designed to produce detailed and useful descriptions for image model training.
This integer parameter sets the maximum number of tokens (words or word pieces) for each generated caption. The default value is 512, with a minimum of 1 and a maximum of 8192. Adjusting this value controls the length and detail of the captions.
A boolean parameter that determines whether the LLaVA-OneVision model should remain loaded in memory after processing. The default value is True, which can speed up subsequent operations but uses more memory.
This float parameter controls the randomness of the caption generation. A lower value (closer to 0) makes the output more deterministic, while a higher value (up to 1.0) introduces more variability. The default value is 0.2.
An integer used to seed the random number generator for reproducibility. The default value is 1, with a minimum of 1 and a maximum of 0xffffffffffffffff.
This integer parameter sets the maximum size (in pixels) for the longest dimension of the images. Images larger than this size will be resized. The default value is 1024, with a minimum of 256 and a maximum of 8192.
A string that will be added at the beginning of each generated caption. This can be used to add consistent introductory text to all captions.
A string that will be added at the end of each generated caption. This can be used to add consistent concluding text to all captions.
The output is a list of strings, where each string is a caption generated for an image in the specified folder. These captions provide detailed descriptions of the images, focusing on composition, style, actions, and background elements.
max_tokens
parameter to control the length and detail of the captions based on your needs.prefix
and suffix
parameters to add consistent text to all captions, which can be useful for specific annotation requirements.keep_model_loaded
to True if you plan to caption multiple folders in succession to save time on model loading.<image_path>
max_image_size
.max_image_size
parameter or resize the images manually to fit within the specified dimensions.max_tokens
parameter is set too low to generate a meaningful caption.max_tokens
parameter to allow for more detailed captions.© Copyright 2024 RunComfy. All Rights Reserved.