Mochi Edit: Modify Videos Using Text-Based Prompts and Unsampling.

Nvidia Cosmos | Text & Image to Video Creation

Generate videos from text prompts or create frame interpolation between two images with Nvidia's Cosmos.

IC-Light | Video Relighting | AnimateDiff

Relight your videos with light maps and prompts

VACE 14B: All-in-One Video Creation & Editing

Create, edit and transform videos with the powerful VACE Wan2.1 14B.

ComfyUI > Nodes > ComfyUI ExLlamaV2 Nodes

ComfyUI Extension: ComfyUI ExLlamaV2 Nodes

Repo Name

ComfyUI-ExLlama-Nodes

Author
Zuellni (Account age: 807 days) Nodes
View all nodes(4) Latest Updated
2024-12-06 Github Stars
0.12K

Github Ask Zuellni Current Questions Past Questions

Table of Content

Description
How ComfyUI ExLlamaV2 Nodes Works
ComfyUI ExLlamaV2 Nodes Features
ComfyUI ExLlamaV2 Nodes Models
What's New with ComfyUI ExLlamaV2 Nodes
Troubleshooting ComfyUI ExLlamaV2 Nodes
Learn More about ComfyUI ExLlamaV2 Nodes
Related Nodes

How to Install ComfyUI ExLlamaV2 Nodes

Install this extension via the ComfyUI Manager by searching for ComfyUI ExLlamaV2 Nodes

1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI ExLlamaV2 Nodes in the search bar

After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

Free trial available
16GB VRAM to 80GB VRAM GPU machines
400+ preloaded models/nodes
Freedom to upload custom models/nodes
200+ ready-to-run workflows
100% private workspace with up to 200GB storage
Dedicated Support

Run ComfyUI Online

ComfyUI ExLlamaV2 Nodes Description

ComfyUI ExLlamaV2 Nodes is a local text generator for ComfyUI, leveraging the ExLlamaV2 model. It requires manual package installation and provides efficient text generation capabilities within the ComfyUI framework.

ComfyUI ExLlamaV2 Nodes Introduction

ComfyUI-ExLlama-Nodes is an extension designed to enhance the capabilities of ComfyUI by integrating it with ExLlamaV2, a powerful local text generation library. This extension allows AI artists to generate high-quality text locally on their machines, leveraging the advanced features of ExLlamaV2. Whether you're creating stories, dialogues, or any other text-based content, ComfyUI-ExLlama-Nodes provides a seamless and efficient way to produce text with minimal setup.

How ComfyUI ExLlamaV2 Nodes Works

At its core, ComfyUI-ExLlama-Nodes works by connecting ComfyUI with ExLlamaV2, enabling local text generation on modern consumer GPUs. ExLlamaV2 is an inference library that supports various models and quantization techniques, making it versatile and efficient. The extension provides nodes that load models, generate text based on prompts, and display the generated text within the ComfyUI interface.

Basic Principles

Model Loading: The extension loads pre-trained language models from a specified directory. These models can be in different quantization formats, such as 4-bit GPTQ or unquantized.
Text Generation: Using the loaded models, the extension generates text based on user-provided prompts. The generation process can be customized with various parameters to control the output.
Display and Interaction: The generated text is displayed within the ComfyUI interface, allowing users to interact with and refine the output as needed.

ComfyUI ExLlamaV2 Nodes Features

Loader Node

The Loader node is responsible for loading models from the llm directory. It offers several customization options:

cache_bits: Determines the number of bits used for caching. Lower values reduce VRAM usage but may affect generation speed and quality.
fast_tensors: When enabled, this option reduces RAM usage and speeds up model loading.
flash_attention: Reduces VRAM usage by enabling FlashAttention, which is not supported on GPUs with compute capability below 8.0.
max_seq_len: Sets the maximum context length. Higher values increase VRAM usage. A value of 0 defaults to the model's configuration.

Generator Node

The Generator node generates text based on a given prompt. Key parameters include:

unload: Unloads the model after each generation to reduce VRAM usage.
stop_conditions: A list of strings that, when encountered, stop the text generation. For example, ["\n"] stops generation on a newline.
max_tokens: Sets the maximum number of new tokens to generate. A value of 0 uses the available context.

Previewer Node

The Previewer node displays the generated text within the ComfyUI interface, allowing users to review and interact with the output.

Replacer Node

The Replacer node replaces variable names in brackets (e.g., [a]) with their corresponding values, making it easier to manage dynamic content within the generated text.

ComfyUI ExLlamaV2 Nodes Models

ComfyUI-ExLlama-Nodes supports various models, including EXL2, 4-bit GPTQ, and unquantized models. These models can be found on Hugging Face. Here are some examples:

Llama-3-8B-Instruct: A 6-bit model suitable for instructional text generation.
Llama2 70B: A large model that can run on a single 24 GB GPU with a 2048-token context, producing coherent and stable output. To use a model, you can clone its repository or manually download the files and place them in the models/llm directory.

What's New with ComfyUI ExLlamaV2 Nodes

Version 0.1.0+

Paged Attention Support: Integration with FlashAttention 2.5.7+ for improved performance.
Dynamic Generator: A new generator with dynamic batching, smart prompt caching, and K/V cache deduplication. These updates enhance the efficiency and flexibility of text generation, making it easier for AI artists to produce high-quality content.

Troubleshooting ComfyUI ExLlamaV2 Nodes

Common Issues and Solutions

Model Loading Errors:

Ensure that the model files are correctly placed in the models/llm directory.
Verify that the model format is supported (EXL2, 4-bit GPTQ, or unquantized).

High VRAM Usage:

Lower the cache_bits value in the Loader node settings.
Enable flash_attention if your GPU supports it.

Slow Text Generation:

Enable fast_tensors in the Loader node settings.
Reduce the max_seq_len value to decrease the context length.

Frequently Asked Questions

Can I use my own models? Yes, you can add your own models by placing them in the models/llm directory and updating the extra_model_paths.yaml file.
What GPUs are supported? The extension supports modern consumer GPUs with compute capability 8.0 or higher for FlashAttention.

Learn More about ComfyUI ExLlamaV2 Nodes

For additional resources, tutorials, and community support, consider the following:

ComfyUI Documentation
ExLlamaV2 Documentation
FlashAttention Documentation
Hugging Face Models These resources provide comprehensive information and support to help you get the most out of ComfyUI-ExLlama-Nodes.

ComfyUI ExLlamaV2 Nodes Related Nodes

Generator

Loader

Preview

Replace

Table of Content

Description
How ComfyUI ExLlamaV2 Nodes Works
ComfyUI ExLlamaV2 Nodes Features
ComfyUI ExLlamaV2 Nodes Models
What's New with ComfyUI ExLlamaV2 Nodes
Troubleshooting ComfyUI ExLlamaV2 Nodes
Learn More about ComfyUI ExLlamaV2 Nodes
Related Nodes

OmniGen | Image-To-Image

OmniGen: Modify Images Based on Reference Images and Prompts

Wan 2.1 Control LoRA | Depth and Tile

Advance Wan 2.1 video generation with lightweight depth and tile LoRAs for improved structure and detail.

Dance Video Transform | Scene Customization & Face Swap

Transform dance videos with scene editing, face-swapping, and motion preservation.

Hallo2 | Lip-Sync Portrait Animation

Audio-driven lip-sync for portrait animation in 4K.

Support

Resources

Legal

RunComfy

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.