Updated: 5/11/2024
Hello, fellow AI artists! 👋 Welcome to our beginner-friendly tutorial on ComfyUI, an incredibly powerful and flexible tool for creating stunning AI-generated artwork. 🎨 In this guide, we'll walk you through the basics of ComfyUI, explore its features, and help you unlock its potential to take your AI art to the next level. 🚀
We will cover:
ComfyUI is like having a magic wand 🪄 for creating stunning, AI-generated artwork with ease. At its core, ComfyUI is a node-based graphical user interface (GUI) built on top of Stable Diffusion, a state-of-the-art deep learning model that generates images from text descriptions. 🌟 But what makes ComfyUI truly special is how it empowers artists like you to unleash your creativity and bring your wildest ideas to life.
Imagine a digital canvas where you can construct your own unique image generation workflows by connecting different nodes, each representing a specific function or operation. 🧩 It's like building a visual recipe for your AI-generated masterpieces!
Want to generate an image from scratch using a text prompt? There's a node for that! Need to apply a specific sampler or fine-tune the noise level? Simply add the corresponding nodes and watch the magic happen. ✨
But here's the best part: ComfyUI breaks down the workflow into rearrangeable elements, giving you the freedom to create your own custom workflows tailored to your artistic vision. 🖼️ It's like having a personalized toolkit that adapts to your creative process.
AUTOMATIC1111 is the default GUI for Stable Diffusion. So, should you use ComfyUI instead? Let's compare:
✅ Benefits of using ComfyUI:
❌ Drawbacks of using ComfyUI:
We believe that the best way to learn ComfyUI is by diving into examples and experiencing it firsthand. 🙌 That's why we've created this unique tutorial that sets itself apart from others. In this tutorial, you'll find a detailed, step-by-step guide that you can follow along with.
But here's the best part: 🌟 We've integrated ComfyUI directly into this webpage! You'll be able to interact with ComfyUI examples in real time as you progress through the guide.🌟 Let's dive in!
Let's begin with the simplest case: generating an image from text. Click Queue Prompt to run the workflow. After a short wait, you should see your first generated image! To check your queue, just click View Queue.
Here's a default text-to-image workflow for you to try:
The ComfyUI workflow consists of two basic building blocks: Nodes and Edges.
First, select a Stable Diffusion Checkpoint model in the Load Checkpoint node. Click on the model name to view available models. If clicking the model name does nothing, you may need to upload a custom model.
You'll see two nodes labeled CLIP Text Encode (Prompt). The top prompt is connected to the positive input of the KSampler node, while the bottom prompt is connected to the negative input. So enter your positive prompt in the top one and your negative prompt in the bottom one.
The CLIP Text Encode node converts the prompt into tokens and encodes them into embeddings using the text encoder.
💡 Tip: Use (keyword:weight) syntax to control the weight of a keyword, e.g., (keyword:1.2) to increase its effect or (keyword:0.8) to decrease it.
Click Queue Prompt to run the workflow. After a short wait, your first image will be generated!
The power of ComfyUI lies in its configurability. Understanding what each node does allows you to tailor them to your needs. But before diving into the details, let's take a look at the Stable Diffusion process to better understand how ComfyUI works.
The Stable Diffusion process can be summarized in three main steps:
Now that we have a high-level understanding of the Stable Diffusion process, let's dive into the key components and nodes in ComfyUI that make this process possible.
The Load Checkpoint node in ComfyUI is crucial for selecting a Stable Diffusion model. A Stable Diffusion model consists of three main components: MODEL, CLIP, and VAE. Let's explore each component and its relationship with the corresponding nodes in ComfyUI.
It's important to note that the VAE is a separate component from the CLIP language model. While CLIP focuses on processing text prompts, the VAE deals with the conversion between pixel and latent spaces.
The CLIP Text Encode node in ComfyUI is responsible for taking the user-provided prompts and feeding them into the CLIP language model. CLIP is a powerful language model that understands the semantic meaning of words and can associate them with visual concepts. When a prompt is entered into the CLIP Text Encode node, it undergoes a transformation process where each word is converted into embeddings. These embeddings are high-dimensional vectors that capture the semantic information of the words. By transforming the prompts into embeddings, CLIP enables the MODEL to generate images that accurately reflect the meaning and intent of the given prompts.
In the text-to-image process, the generation starts with a random image in the latent space. This random image serves as the initial state for the MODEL to work with. The size of the latent image is proportional to the actual image size in the pixel space. In ComfyUI, you can adjust the height and width of the latent image to control the size of the generated image. Additionally, you can set the batch size to determine the number of images generated in each run.
The optimal sizes for latent images depend on the specific Stable Diffusion model being used. For SD v1.5 models, the recommended sizes are 512x512 or 768x768, while for SDXL models, the optimal size is 1024x1024. ComfyUI provides a range of common aspect ratios to choose from, such as 1:1 (square), 3:2 (landscape), 2:3 (portrait), 4:3 (landscape), 3:4 (portrait), 16:9 (widescreen), and 9:16 (vertical). It's important to note that the width and height of the latent image must be divisible by 8 to ensure compatibility with the model's architecture.
The VAE (Variational AutoEncoder) is a crucial component in the Stable Diffusion model that handles the conversion of images between the pixel space and the latent space. It consists of two main parts: an Image Encoder and an Image Decoder.
The Image Encoder takes an image in the pixel space and compresses it into a lower-dimensional latent representation. This compression process significantly reduces the data size, allowing for more efficient processing and storage. For example, an image of size 512x512 pixels can be compressed down to a latent representation of size 64x64.
On the other hand, the Image Decoder, also known as the VAE Decoder, is responsible for reconstructing the image from the latent representation back into the pixel space. It takes the compressed latent representation and expands it to generate the final image.
Using a VAE offers several advantages:
However, there are also some disadvantages to consider:
Despite these limitations, the VAE plays a vital role in the Stable Diffusion model by enabling efficient conversion between the pixel space and the latent space, facilitating faster generation and more precise control over the generated images.
The KSampler node in ComfyUI is the heart of the image generation process in Stable Diffusion. It is responsible for denoising the random image in the latent space to match the user-provided prompt. The KSampler employs a technique called reverse diffusion, where it iteratively refines the latent representation by removing noise and adding meaningful details based on the guidance from the CLIP embeddings.
The KSampler node offers several parameters that allow users to fine-tune the image generation process:
Seed: The seed value controls the initial noise and composition of the final image. By setting a specific seed, users can achieve reproducible results and maintain consistency across multiple generations.
Control_after_generation: This parameter determines how the seed value changes after each generation. It can be set to randomize (generate a new random seed for each run), increment (increase the seed value by 1), decrement (decrease the seed value by 1), or fixed (keep the seed value constant).
Step: The number of sampling steps determines the refinement process's intensity. Higher values result in fewer artifacts and more detailed images but also increase the generation time.
Sampler_name: This parameter allows users to choose the specific sampling algorithm used by the KSampler. Different sampling algorithms may yield slightly different results and have varying generation speeds.
Scheduler: The scheduler controls how the noise level changes at each step of the denoising process. It determines the rate at which noise is removed from the latent representation.
Denoise: The denoise parameter sets the amount of initial noise that should be erased by the denoising process. A value of 1 means that all noise will be removed, resulting in a clean and detailed image.
By adjusting these parameters, you can fine-tune the image generation process to achieve the desired results.
At RunComfy, we've created the ultimate ComfyUI online experience just for you. Say goodbye to complicated installations! 🎉 Try ComfyUI Online now and unleash your artistic potential like never before! 🎉
The Image-to-Image workflow generates an image based on a prompt and an input image. Try it yourself!
To use the Image-to-Image workflow:
For more premium ComfyUI workflows, visit our 🌟ComfyUI Workflow List🌟
Thanks to its extreme configurability, ComfyUI is one of the first GUIs to support the Stable Diffusion XL model. Let's give it a try!
To use the ComfyUI SDXL workflow:
Let's dive into something more complex: inpainting! When you have a great image but want to modify specific parts, inpainting is the best method. Try it here!
To use the inpainting workflow:
Outpainting is another exciting technique that allows you to expand your images beyond their original boundaries. 🌆 It's like having an infinite canvas to work with!
To use the ComfyUI Outpainting workflow:
For more premium inpainting/outpainting workflows, visit our 🌟ComfyUI Workflow List🌟
Next, let's explore ComfyUI upscale. We'll introduce three fundamental workflows to help you upscale efficiently.
There are two main methods for upscaling:
Two ways to achieve this:
Another upscaling method is Upscale Latent, also known as Hi-res Latent Fix Upscale, which directly upscales in the latent space.
For more premium restore/upscale workflows, visit our 🌟ComfyUI Workflow List🌟
Get ready to take your AI art to the next level with ControlNet, a game-changing technology that revolutionizes image generation!
ControlNet is like a magic wand 🪄 that grants you unprecedented control over your AI-generated images. It works hand in hand with powerful models like Stable Diffusion, enhancing their capabilities and allowing you to guide the image creation process like never before!
Imagine being able to specify the edges, human poses, depth, or even segmentation maps of your desired image. 🌠 With ControlNet, you can do just that!
If you're eager to dive deeper into the world of ControlNet and unleash its full potential, we've got you covered. Check out our detailed tutorial on mastering ControlNet in ComfyUI! 📚 It's packed with step-by-step guides, and inspiring examples to help you become a ControlNet pro. 🏆
ComfyUI Manager is a custom node that allows you to install and update other custom nodes through the ComfyUI interface. You'll find the Manager button on the Queue Prompt menu.
If a workflow requires custom nodes that you haven't installed, follow these steps:
Double-click any empty area to bring up a menu to search for nodes.
Embeddings, also known as textual inversion, are a powerful feature in ComfyUI that allows you to inject custom concepts or styles into your AI-generated images. 💡 It's like teaching the AI a new word or phrase and associating it with specific visual characteristics.
To use embeddings in ComfyUI, simply type "embedding:" followed by the name of your embedding in the positive or negative prompt box. For example:
embedding: BadDream
When you use this prompt, ComfyUI will search for an embedding file named "BadDream" in the ComfyUI > models > embeddings folder. 📂 If it finds a match, it will apply the corresponding visual characteristics to your generated image.
Embeddings are a great way to personalize your AI art and achieve specific styles or aesthetics. 🎨 You can create your own embeddings by training them on a set of images that represent the desired concept or style.
Remembering the exact names of your embeddings can be a hassle, especially if you have a large collection. 😅 That's where the ComfyUI-Custom-Scripts custom node comes to the rescue!
To enable embedding name autocomplete:
Once you have the ComfyUI-Custom-Scripts node installed, you'll experience a more user-friendly way of using embeddings. 😊 Simply start typing "embedding:" in a prompt box, and a list of available embeddings will appear. You can then select the desired embedding from the list, saving you time and effort!
Did you know that you can control the strength of your embeddings? 💪 Since embeddings are essentially keywords, you can apply weights to them just like you would with regular keywords in your prompts.
To adjust the weight of an embedding, use the following syntax:
(embedding: BadDream:1.2)
In this example, the weight of the "BadDream" embedding is increased by 20%. So higher weights (e.g., 1.2) will make the embedding more prominent, while lower weights (e.g., 0.8) will reduce its influence. 🎚️ This gives you even more control over the final result!
LoRA, short for Low-rank Adaptation, is another exciting feature in ComfyUI that allows you to modify and fine-tune your checkpoint models. 🎨 It's like adding a small, specialized model on top of your base model to achieve specific styles or incorporate custom elements.
LoRA models are compact and efficient, making them easy to use and share. They are commonly used for tasks such as modifying the artistic style of an image or injecting a specific person or object into the generated result.
When you apply a LoRA model to a checkpoint model, it modifies the MODEL and CLIP components while leaving the VAE (Variational Autoencoder) untouched. This means that the LoRA focuses on adjusting the content and style of the image without altering its overall structure.
Using LoRA in ComfyUI is straightforward. Let's take a look at the simplest method:
ComfyUI will then combine the checkpoint model and the LoRA model to create an image that reflects the specified prompts and incorporates the modifications introduced by the LoRA.
But what if you want to apply multiple LoRAs to a single image? No problem! ComfyUI allows you to use two or more LoRAs in the same text-to-image workflow.
The process is similar to using a single LoRA, but you'll need to select multiple LoRA models instead of just one. ComfyUI will apply the LoRAs sequentially, meaning that each LoRA will build upon the modifications introduced by the previous one.
This opens up a world of possibilities for combining different styles, elements, and modifications in your AI-generated images. 🌍💡 Experiment with different LoRA combinations to achieve unique and creative results!
Congratulations on completing this beginner's guide to ComfyUI! 🙌 You're now ready to dive into the exciting world of AI art creation. But why hassle with installation when you can start creating right away? 🤔
At RunComfy, we've made it simple for you to use ComfyUI online without any setup. Our ComfyUI Online service comes preloaded with over 200 popular nodes and models, along with 50+ stunning workflows to inspire your creations.
🌟 Whether you're a beginner or an experienced AI artist, RunComfy has everything you need to bring your artistic visions to life. 💡 Don't wait any longer – try ComfyUI Online now and experience the power of AI art creation at your fingertips! 🚀
© Copyright 2024 RunComfy. All Rights Reserved.