Install this extension via the ComfyUI Manager by searching
for ComfyUI ExLlamaV2 Nodes
1. Click the Manager button in the main menu
2. Select Custom Nodes Manager button
3. Enter ComfyUI ExLlamaV2 Nodes in the search bar
After installation, click the Restart button to
restart ComfyUI. Then, manually
refresh your browser to clear the cache and access
the updated list of nodes.
Visit
ComfyUI Online
for ready-to-use ComfyUI environment
ComfyUI ExLlamaV2 Nodes is a local text generator for ComfyUI, leveraging the ExLlamaV2 model. It requires manual package installation and provides efficient text generation capabilities within the ComfyUI framework.
ComfyUI ExLlamaV2 Nodes Introduction
ComfyUI-ExLlama-Nodes is an extension designed to enhance the capabilities of ComfyUI by integrating it with ExLlamaV2, a powerful local text generation library. This extension allows AI artists to generate high-quality text locally on their machines, leveraging the advanced features of ExLlamaV2. Whether you're creating stories, dialogues, or any other text-based content, ComfyUI-ExLlama-Nodes provides a seamless and efficient way to produce text with minimal setup.
How ComfyUI ExLlamaV2 Nodes Works
At its core, ComfyUI-ExLlama-Nodes works by connecting ComfyUI with ExLlamaV2, enabling local text generation on modern consumer GPUs. ExLlamaV2 is an inference library that supports various models and quantization techniques, making it versatile and efficient. The extension provides nodes that load models, generate text based on prompts, and display the generated text within the ComfyUI interface.
Basic Principles
Model Loading: The extension loads pre-trained language models from a specified directory. These models can be in different quantization formats, such as 4-bit GPTQ or unquantized.
Text Generation: Using the loaded models, the extension generates text based on user-provided prompts. The generation process can be customized with various parameters to control the output.
Display and Interaction: The generated text is displayed within the ComfyUI interface, allowing users to interact with and refine the output as needed.
ComfyUI ExLlamaV2 Nodes Features
Loader Node
The Loader node is responsible for loading models from the llm directory. It offers several customization options:
cache_bits: Determines the number of bits used for caching. Lower values reduce VRAM usage but may affect generation speed and quality.
fast_tensors: When enabled, this option reduces RAM usage and speeds up model loading.
flash_attention: Reduces VRAM usage by enabling FlashAttention, which is not supported on GPUs with compute capability below 8.0.
max_seq_len: Sets the maximum context length. Higher values increase VRAM usage. A value of 0 defaults to the model's configuration.
Generator Node
The Generator node generates text based on a given prompt. Key parameters include:
unload: Unloads the model after each generation to reduce VRAM usage.
stop_conditions: A list of strings that, when encountered, stop the text generation. For example, ["\n"] stops generation on a newline.
max_tokens: Sets the maximum number of new tokens to generate. A value of 0 uses the available context.
Previewer Node
The Previewer node displays the generated text within the ComfyUI interface, allowing users to review and interact with the output.
Replacer Node
The Replacer node replaces variable names in brackets (e.g., [a]) with their corresponding values, making it easier to manage dynamic content within the generated text.
ComfyUI ExLlamaV2 Nodes Models
ComfyUI-ExLlama-Nodes supports various models, including EXL2, 4-bit GPTQ, and unquantized models. These models can be found on Hugging Face. Here are some examples:
Llama-3-8B-Instruct: A 6-bit model suitable for instructional text generation.
Llama2 70B: A large model that can run on a single 24 GB GPU with a 2048-token context, producing coherent and stable output.
To use a model, you can clone its repository or manually download the files and place them in the models/llm directory.
What's New with ComfyUI ExLlamaV2 Nodes
Version 0.1.0+
Paged Attention Support: Integration with FlashAttention 2.5.7+ for improved performance.
Dynamic Generator: A new generator with dynamic batching, smart prompt caching, and K/V cache deduplication.
These updates enhance the efficiency and flexibility of text generation, making it easier for AI artists to produce high-quality content.
Troubleshooting ComfyUI ExLlamaV2 Nodes
Common Issues and Solutions
Model Loading Errors:
Ensure that the model files are correctly placed in the models/llm directory.
Verify that the model format is supported (EXL2, 4-bit GPTQ, or unquantized).
High VRAM Usage:
Lower the cache_bits value in the Loader node settings.
Enable flash_attention if your GPU supports it.
Slow Text Generation:
Enable fast_tensors in the Loader node settings.
Reduce the max_seq_len value to decrease the context length.
Frequently Asked Questions
Can I use my own models?
Yes, you can add your own models by placing them in the models/llm directory and updating the extra_model_paths.yaml file.
What GPUs are supported?
The extension supports modern consumer GPUs with compute capability 8.0 or higher for FlashAttention.
Learn More about ComfyUI ExLlamaV2 Nodes
For additional resources, tutorials, and community support, consider the following: