ComfyUI > Workflows > IDM-VTON | Virtual Try-on

IDM-VTON | Virtual Try-on

IDM-VTON, or Improving Diffusion Models for Authentic Virtual Try-on in the Wild, is a groundbreaking diffusion model that allows for realistic virtual garment try-on. By preserving the unique details and identity of garments, IDM-VTON generates incredibly authentic results. The model utilizes an image prompt adapter (IP-Adapter) to extract high-level garment semantics and a parallel UNet (GarmentNet) to encode low-level features. In ComfyUI, the IDM-VTON node powers the virtual try-on process, requiring inputs such as a human image, pose representation, clothing mask, and garment image.

ComfyUI IDM-VTON Workflow

ComfyUI Workflow: IDM-VTON for Virtual Clothing Try-on

Want to run this workflow?

Fully operational workflows
No missing nodes or models
No manual setups required
Features stunning visuals

ComfyUI IDM-VTON Examples

idm-vton-on-comfyui-realistic-virtual-clothing-try-on-1135

ComfyUI IDM-VTON Description

IDM-VTON, short for "Improving Diffusion Models for Authentic Virtual Try-on in the Wild," is an innovative diffusion model that allows you to realistically try on garments virtually using just a few inputs. What sets IDM-VTON apart is its ability to preserve the unique details and identity of the garments while generating virtual try-on results that look incredibly authentic.

1. Understanding IDM-VTON

At its core, IDM-VTON is a diffusion model that's been specifically engineered for virtual try-on. To use it, you simply need a representation of a person and a garment you want to try on. IDM-VTON then works its magic, rendering a result that looks like the person is actually wearing the garment. It achieves a level of garment fidelity and authenticity that surpasses previous diffusion-based virtual try-on methods.

2. The Inner Workings of IDM-VTON

So, how does IDM-VTON pull off such realistic virtual try-on? The secret lies in its two main modules that work together to encode the semantics of the garment input:

The first is an image prompt adapter, or IP-Adapter for short. This clever component extracts the high-level semantics of the garment - essentially, the key characteristics that define its appearance. It then fuses this information into the cross-attention layer of the main UNet diffusion model.
The second module is a parallel UNet called GarmentNet. Its job is to encode the low-level features of the garment - the nitty-gritty details that make it unique. These features are then fused into the self-attention layer of the main UNet.

But that's not all! IDM-VTON also makes use of detailed textual prompts for both the garment and the person inputs. These prompts provide additional context that enhances the authenticity of the final virtual try-on result.

3. Putting IDM-VTON to Work in ComfyUI

3.1 The Star of the Show: The IDM-VTON Node

In ComfyUI, the "IDM-VTON" node is the powerhouse that runs the IDM-VTON diffusion model and generates the virtual try-on output.

For the IDM-VTON node to work its magic, it needs a few key inputs:

Pipeline: This is the loaded IDM-VTON diffusion pipeline that powers the whole virtual try-on process.
Human Input: An image of the person who will be virtually trying on the garment.
Pose Input: A preprocessed DensePose representation of the human input, which helps IDM-VTON understand the person's pose and body shape.
Mask Input: A binary mask that indicates which parts of the human input are clothing. This mask needs to be converted into an appropriate format.
Garment Input: An image of the garment to be virtually tried on.

3.2 Getting Everything Ready

To get the IDM-VTON node up and running, there are a few preparation steps:

Loading the Human Image: A LoadImage node is used to load the image of the person.
Generating the Pose Image: The human image is passed through a DensePosePreprocessor node, which computes the DensePose representation that IDM-VTON needs.
Obtaining the Mask Image: There are two ways to get the clothing mask:

a. Manual Masking (Recommended)

Right-click on the loaded human image and choose "Open in Mask Editor."
In the mask editor UI, manually mask the clothing regions.

b. Automatic Masking

Use a GroundingDinoSAMSegment node to automatically segment the clothing.
Prompt the node with a text description of the garment (like "t-shirt").

Whichever method you choose, the obtained mask needs to be converted to an image using a MaskToImage node, which is then connected to the "Mask Image" input of the IDM-VTON node.

Loading the Garment Image: It is used to load the image of the garment.

For a deeper dive into the IDM-VTON model, don't miss the original paper, "". And if you're interested in using IDM-VTON in ComfyUI, be sure to check out the dedicated nodes . Huge thanks to the researchers and developers behind these incredible resources.

Want More ComfyUI Workflows?

InstantID | Portraits to Art

InstantID accurately enhances and transforms portraits with style and aesthetic appeal.

Product Relighting | Magnific.AI Relight Alternative

Elevate your product photography effortlessly, a top alternative to Magnific.AI Relight.

SVD + IPAdapter V1 | Image to Video

Utilize IPAdapters for static image generation and Stable Video Diffusion for dynamic video generation.

AnimateDiff + ControlNet | Marble Sculpture Style

Transform your videos into timeless marble sculptures, capturing the essence of classic art.

ComfyUI Vid2Vid Dance Transfer

Transfers the motion and style from a source video onto a target image or object.

MV-Adapter | High-Resolution Multi-view Generator

Generate 360-degree views of anything from a single image or description.

CogVideoX Tora | Image-to-Video Model

Subject Trajectory Video Demo for CogVideoX

ToonCrafter | Generative Cartoon Interpolation

ToonCrafter can generate cartoon interpolations between two cartoon images.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.