Visit ComfyUI Online for ready-to-use ComfyUI environment
Generate detailed image captions using GPT-4-vision-preview model.
The KepOpenAI_ImageWithPrompt node is designed to generate high-quality textual descriptions or captions for images using OpenAI's advanced language models. This node leverages the capabilities of the GPT-4-vision-preview model to analyze an image and produce a detailed and contextually relevant caption based on a provided prompt. The primary benefit of this node is its ability to create rich, descriptive text that highlights the most important aspects of an image, making it an invaluable tool for AI artists who need to generate captions, descriptions, or other textual content related to visual media. By integrating image analysis with natural language processing, this node helps streamline the creative process and enhances the quality of the generated content.
This parameter expects an image in the form of a tensor. The image serves as the primary visual input that the node will analyze to generate a caption. The quality and content of the image directly impact the relevance and accuracy of the generated text.
This is a string parameter that allows you to provide a textual prompt to guide the caption generation process. The prompt can be multiline and should describe the desired focus of the caption. For example, you can instruct the model to emphasize certain aspects of the image or apply weights to specific words or phrases using the format (word or phrase:weight)
. The default prompt is Generate a high quality caption for the image. The most important aspects of the image should be described first. If needed, weights can be applied to the caption in the following format: '(word or phrase:weight)', where the weight should be a float less than 2.
This integer parameter specifies the maximum number of tokens (words and punctuation) that the generated caption can contain. The range for this parameter is from 1 to 2048, with a default value of 77. Adjusting this value allows you to control the length and detail of the generated text.
The output of this node is a string that contains the generated caption or description for the provided image. This text is crafted based on the input image and the provided prompt, aiming to deliver a high-quality and contextually relevant description that captures the essence of the image.
max_tokens
parameter based on the level of detail you need in the caption. For shorter, more concise descriptions, use a lower value; for more detailed captions, increase the value.© Copyright 2024 RunComfy. All Rights Reserved.