Install this extension via the ComfyUI Manager by searching
for ComfyUI-Gemini
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Gemini in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI-Gemini integrates Gemini-pro and Gemini-pro-vision into ComfyUI, enhancing its functionality with advanced features and improved user experience.
ComfyUI-Gemini Introduction
ComfyUI-Gemini is an extension that integrates Google Gemini models into ComfyUI, a user interface for AI-based applications. This extension allows you to generate prompts, describe images, and engage in conversations with the AI. It supports various types of media inputs, including text, images, audio, and video files. By using ComfyUI-Gemini, AI artists can enhance their creative workflows, automate repetitive tasks, and explore new artistic possibilities with the help of advanced AI models.
How ComfyUI-Gemini Works
ComfyUI-Gemini leverages the power of Google Gemini models to provide a seamless experience for AI artists. The extension works by connecting to the Gemini API, which processes the input data (text, images, audio, or video) and generates the desired output. For example, you can input a text prompt, and the model will generate a corresponding image or description. The extension supports multimodal interactions, meaning it can handle multiple types of media simultaneously, making it a versatile tool for various creative projects.
ComfyUI-Gemini Features
Main Features
System Instruction Support: Allows you to set specific instructions for the AI to follow, enhancing control over the generated content.
Multimodal and Multi-turn Dialogues: Supports conversations that involve multiple types of media and can continue over several turns, making interactions more natural and dynamic.
File Reading Capability: Can read and process various file types, including video and audio files up to 20GB.
High Token Limit: Supports input tokens up to 1,048,576, allowing for more complex and detailed prompts.
Rate Limiting: Currently, the API usage is limited to 2 requests per minute and 1000 requests per day.
Customization
Each feature can be customized to suit your specific needs. For instance, you can adjust the system instructions to guide the AI's behavior or choose different models based on the type of media you are working with. By experimenting with these settings, you can achieve different artistic effects and streamline your creative process.
ComfyUI-Gemini Models
ComfyUI-Gemini offers three main models, each designed for different types of tasks:
Gemini-pro: A text-based model ideal for generating text prompts and descriptions.
Genimi-pro-vision: A model that combines text and image processing, suitable for tasks that require both text and visual inputs.
Gemini 1.5 Pro: The most advanced model, supporting text, image, and various file types (audio, video, etc.). This model is perfect for complex, multimodal projects.
When to Use Each Model
Gemini-pro: Use this model when your project is primarily text-based, such as generating prompts or writing descriptions.
Genimi-pro-vision: Ideal for projects that require both text and images, such as creating visual art based on textual descriptions.
Gemini 1.5 Pro: Best for comprehensive projects that involve multiple types of media, offering the most flexibility and capability.
What's New with ComfyUI-Gemini
Version 3.0
New Gemini 1.5 Pro Model: Includes support for system instructions, multimodal interactions, and file uploads.
File Upload Feature: Now supports uploading single files (images, text, PDFs, audio), with future plans to support multiple file uploads.
Enhanced Workflow: New workflows that combine Gemini 1.5 Pro with Stable Diffusion and ComfyUI, providing an alternative to DALL·E 3.
Previous Updates
Version 2.1: Fixed a bug related to the deadline of 60.0s.
Version 2.0: Introduced context-aware chat nodes, effectively turning the AI into a chatbot.
Version 1.1: Improved API key handling by automatically adding it to the config.json file.
Troubleshooting ComfyUI-Gemini
Common Issues and Solutions
API Key Issues: Ensure your API key is correctly added to the config.json file or directly input into the node if using explicit nodes.
Connection Problems: Verify that you have a stable internet connection and can access Google Gemini services. Using platforms like Colab or Kaggle can help avoid connectivity issues.
Rate Limiting: Be mindful of the API rate limits (2 requests per minute, 1000 per day). Plan your usage accordingly to avoid hitting these limits.
Frequently Asked Questions
How do I get an API key?
You can apply for an API key here.
What types of files can I upload?
Currently, you can upload images, text files, PDFs, and audio files. Video support is planned for future updates.
Can I share my workflows?
Yes, but avoid sharing workflows that contain your API key to prevent unauthorized usage.
Learn More about ComfyUI-Gemini
For additional resources, tutorials, and community support, check out the following links:
Google AI Studio
These resources provide comprehensive guides and examples to help you get the most out of ComfyUI-Gemini. Whether you're a beginner or an experienced AI artist, you'll find valuable information to enhance your creative projects.