Sonic delivers advanced audio-driven lip-sync for portraits with high-quality animation.

Insert Anything | Reference-Based Image Editing

Insert any subject into images with mask or text guidance.

SUPIR + Foolhardy Remacri | 8K Image/Video Upscaler

Upscale images to 8K with SUPIR and 4x Foolhardy Remacri model.

FLUX Dev ControlNet | Multi-Condition ControlNet

Controlled FLUX Dev image generation with Pose, Depth, Canny, and ReColor

ComfyUI > Nodes > ComfyUI-Gemini

ComfyUI Extension: ComfyUI-Gemini

Repo Name

ComfyUI-Gemini

Author
ZHO-ZHO-ZHO (Account age: 624 days) Nodes
View all nodes(12) Latest Updated
2024-05-22 Github Stars
0.74K

Github Ask ZHO-ZHO-ZHO Current Questions Past Questions

Table of Content

Description
How ComfyUI-Gemini Works
ComfyUI-Gemini Features
ComfyUI-Gemini Models
What's New with ComfyUI-Gemini
Troubleshooting ComfyUI-Gemini
Learn More about ComfyUI-Gemini
Related Nodes

How to Install ComfyUI-Gemini

Install this extension via the ComfyUI Manager by searching for ComfyUI-Gemini

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI-Gemini in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI-Gemini Description

ComfyUI-Gemini integrates Gemini-pro and Gemini-pro-vision into ComfyUI, enhancing its functionality with advanced features and improved user experience.

ComfyUI-Gemini Introduction

ComfyUI-Gemini is an extension that integrates Google Gemini models into ComfyUI, a user interface for AI-based applications. This extension allows you to generate prompts, describe images, and engage in conversations with the AI. It supports various types of media inputs, including text, images, audio, and video files. By using ComfyUI-Gemini, AI artists can enhance their creative workflows, automate repetitive tasks, and explore new artistic possibilities with the help of advanced AI models.

How ComfyUI-Gemini Works

ComfyUI-Gemini leverages the power of Google Gemini models to provide a seamless experience for AI artists. The extension works by connecting to the Gemini API, which processes the input data (text, images, audio, or video) and generates the desired output. For example, you can input a text prompt, and the model will generate a corresponding image or description. The extension supports multimodal interactions, meaning it can handle multiple types of media simultaneously, making it a versatile tool for various creative projects.

ComfyUI-Gemini Features

Main Features

System Instruction Support: Allows you to set specific instructions for the AI to follow, enhancing control over the generated content.
Multimodal and Multi-turn Dialogues: Supports conversations that involve multiple types of media and can continue over several turns, making interactions more natural and dynamic.
File Reading Capability: Can read and process various file types, including video and audio files up to 20GB.
High Token Limit: Supports input tokens up to 1,048,576, allowing for more complex and detailed prompts.
Rate Limiting: Currently, the API usage is limited to 2 requests per minute and 1000 requests per day.

Customization

Each feature can be customized to suit your specific needs. For instance, you can adjust the system instructions to guide the AI's behavior or choose different models based on the type of media you are working with. By experimenting with these settings, you can achieve different artistic effects and streamline your creative process.

ComfyUI-Gemini Models

ComfyUI-Gemini offers three main models, each designed for different types of tasks:

Gemini-pro: A text-based model ideal for generating text prompts and descriptions.
Genimi-pro-vision: A model that combines text and image processing, suitable for tasks that require both text and visual inputs.
Gemini 1.5 Pro: The most advanced model, supporting text, image, and various file types (audio, video, etc.). This model is perfect for complex, multimodal projects.

When to Use Each Model

Gemini-pro: Use this model when your project is primarily text-based, such as generating prompts or writing descriptions.
Genimi-pro-vision: Ideal for projects that require both text and images, such as creating visual art based on textual descriptions.
Gemini 1.5 Pro: Best for comprehensive projects that involve multiple types of media, offering the most flexibility and capability.

What's New with ComfyUI-Gemini

Version 3.0

New Gemini 1.5 Pro Model: Includes support for system instructions, multimodal interactions, and file uploads.
File Upload Feature: Now supports uploading single files (images, text, PDFs, audio), with future plans to support multiple file uploads.
Enhanced Workflow: New workflows that combine Gemini 1.5 Pro with Stable Diffusion and ComfyUI, providing an alternative to DALL·E 3.

Previous Updates

Version 2.1: Fixed a bug related to the deadline of 60.0s.
Version 2.0: Introduced context-aware chat nodes, effectively turning the AI into a chatbot.
Version 1.1: Improved API key handling by automatically adding it to the config.json file.

Troubleshooting ComfyUI-Gemini

Common Issues and Solutions

API Key Issues: Ensure your API key is correctly added to the config.json file or directly input into the node if using explicit nodes.
Connection Problems: Verify that you have a stable internet connection and can access Google Gemini services. Using platforms like Colab or Kaggle can help avoid connectivity issues.
Rate Limiting: Be mindful of the API rate limits (2 requests per minute, 1000 per day). Plan your usage accordingly to avoid hitting these limits.

Frequently Asked Questions

How do I get an API key? You can apply for an API key here.
What types of files can I upload? Currently, you can upload images, text files, PDFs, and audio files. Video support is planned for future updates.
Can I share my workflows? Yes, but avoid sharing workflows that contain your API key to prevent unauthorized usage.

Learn More about ComfyUI-Gemini

For additional resources, tutorials, and community support, check out the following links:

Gemini API Documentation
Gemini API Cookbook
ComfyUI GitHub Repository
Google AI Studio These resources provide comprehensive guides and examples to help you get the most out of ComfyUI-Gemini. Whether you're a beginner or an experienced AI artist, you'll find valuable information to enhance your creative projects.

ComfyUI-Gemini Related Nodes

✨ConcatText_Zho

✨DisplayText_Zho

🆕Gemini_15P_Advance_Zho

🆕Gemini_15P_Chat_Advance_Zho

✨Gemini_API_Chat_Zho

㊙️Gemini_Chat_Zho

㊙️Gemini_ImgURL_Zho

㊙️Gemini_Zho

✨Gemini_API_Vsion_ImgURL_Zho

✨Gemini_API_Zho

📄Gemini_FileUpload_Zho

📄Gemini_File_Zho

Table of Content

Description
How ComfyUI-Gemini Works
ComfyUI-Gemini Features
ComfyUI-Gemini Models
What's New with ComfyUI-Gemini
Troubleshooting ComfyUI-Gemini
Learn More about ComfyUI-Gemini
Related Nodes

Step1X-Edit | AI Image Editing Tool

Perform 11 editing operations with natural language in Step1X-Edit.

FLUX IPAdapter V2 | XLabs

Explore XLabs FLUX IPAdapter V2 model compared to V1 for your creative goals.

MultiTalk | Photo to Talking Video

Millisecond lip sync + Wan2.1 = 15s ultra-detailed talking videos!

Hunyuan Video | Text to Video

Generates videos from text prompts.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.