ComfyUI  >  Workflows  >  IDM-VTON | Virtual Try-on

IDM-VTON | Virtual Try-on

IDM-VTON, or Improving Diffusion Models for Authentic Virtual Try-on in the Wild, is a groundbreaking diffusion model that allows for realistic virtual garment try-on. By preserving the unique details and identity of garments, IDM-VTON generates incredibly authentic results. The model utilizes an image prompt adapter (IP-Adapter) to extract high-level garment semantics and a parallel UNet (GarmentNet) to encode low-level features. In ComfyUI, the IDM-VTON node powers the virtual try-on process, requiring inputs such as a human image, pose representation, clothing mask, and garment image.

ComfyUI IDM-VTON Workflow

ComfyUI Workflow: IDM-VTON for Virtual Clothing Try-on
Want to run this workflow?
  • Fully operational workflows
  • No missing nodes or models
  • No manual setups required
  • Features stunning visuals

ComfyUI IDM-VTON Examples

idm-vton-on-comfyui-realistic-virtual-clothing-try-on-1135

ComfyUI IDM-VTON Description

IDM-VTON, short for "Improving Diffusion Models for Authentic Virtual Try-on in the Wild," is an innovative diffusion model that allows you to realistically try on garments virtually using just a few inputs. What sets IDM-VTON apart is its ability to preserve the unique details and identity of the garments while generating virtual try-on results that look incredibly authentic.

1. Understanding IDM-VTON

At its core, IDM-VTON is a diffusion model that's been specifically engineered for virtual try-on. To use it, you simply need a representation of a person and a garment you want to try on. IDM-VTON then works its magic, rendering a result that looks like the person is actually wearing the garment. It achieves a level of garment fidelity and authenticity that surpasses previous diffusion-based virtual try-on methods.

2. The Inner Workings of IDM-VTON

So, how does IDM-VTON pull off such realistic virtual try-on? The secret lies in its two main modules that work together to encode the semantics of the garment input:

  1. The first is an image prompt adapter, or IP-Adapter for short. This clever component extracts the high-level semantics of the garment - essentially, the key characteristics that define its appearance. It then fuses this information into the cross-attention layer of the main UNet diffusion model.
  2. The second module is a parallel UNet called GarmentNet. Its job is to encode the low-level features of the garment - the nitty-gritty details that make it unique. These features are then fused into the self-attention layer of the main UNet.

But that's not all! IDM-VTON also makes use of detailed textual prompts for both the garment and the person inputs. These prompts provide additional context that enhances the authenticity of the final virtual try-on result.

3. Putting IDM-VTON to Work in ComfyUI

3.1 The Star of the Show: The IDM-VTON Node

In ComfyUI, the "IDM-VTON" node is the powerhouse that runs the IDM-VTON diffusion model and generates the virtual try-on output.

For the IDM-VTON node to work its magic, it needs a few key inputs:

  1. Pipeline: This is the loaded IDM-VTON diffusion pipeline that powers the whole virtual try-on process.
  2. Human Input: An image of the person who will be virtually trying on the garment.
  3. Pose Input: A preprocessed DensePose representation of the human input, which helps IDM-VTON understand the person's pose and body shape.
  4. Mask Input: A binary mask that indicates which parts of the human input are clothing. This mask needs to be converted into an appropriate format.
  5. Garment Input: An image of the garment to be virtually tried on.

3.2 Getting Everything Ready

To get the IDM-VTON node up and running, there are a few preparation steps:

  1. Loading the Human Image: A LoadImage node is used to load the image of the person. IDM-VTON
  2. Generating the Pose Image: The human image is passed through a DensePosePreprocessor node, which computes the DensePose representation that IDM-VTON needs. IDM-VTON
  3. Obtaining the Mask Image: There are two ways to get the clothing mask: IDM-VTON

a. Manual Masking (Recommended)

  • Right-click on the loaded human image and choose "Open in Mask Editor."
  • In the mask editor UI, manually mask the clothing regions.

b. Automatic Masking

  • Use a GroundingDinoSAMSegment node to automatically segment the clothing.
  • Prompt the node with a text description of the garment (like "t-shirt").

Whichever method you choose, the obtained mask needs to be converted to an image using a MaskToImage node, which is then connected to the "Mask Image" input of the IDM-VTON node.

  1. Loading the Garment Image: It is used to load the image of the garment.
IDM-VTON

For a deeper dive into the IDM-VTON model, don't miss the original paper, "". And if you're interested in using IDM-VTON in ComfyUI, be sure to check out the dedicated nodes . Huge thanks to the researchers and developers behind these incredible resources.

Want More ComfyUI Workflows?

RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.