Visit ComfyUI Online for ready-to-use ComfyUI environment
Customizable configuration options for LLaMA model optimization and performance enhancement.
The MZ_LLamaCPPOptions
node is designed to provide a comprehensive set of configuration options for the LLaMA (Large Language Model Meta AI) model, enabling you to fine-tune various parameters to optimize the model's performance for your specific needs. This node allows you to adjust settings such as context length, batch size, GPU layers, and various penalties and probabilities that influence the model's output. By offering a wide range of customizable options, MZ_LLamaCPPOptions
ensures that you can tailor the model's behavior to suit different tasks, whether it's generating text, answering questions, or performing other AI-driven functions. This flexibility makes it a powerful tool for AI artists looking to leverage advanced language models in their creative workflows.
n_ctx
specifies the context length, which is the number of tokens the model can consider at once. A higher value allows the model to take more context into account, potentially improving the quality of the output. The default value is 2048, and it can be adjusted based on your specific requirements.
n_batch
determines the batch size, which is the number of samples processed before the model updates its parameters. A larger batch size can improve training efficiency but requires more memory. The default value is 2048.
n_threads
sets the number of CPU threads to use. More threads can speed up processing but may also increase CPU usage. The default value is 0, which means the model will automatically determine the optimal number of threads.
n_threads_batch
specifies the number of threads to use for batch processing. Similar to n_threads
, this can affect processing speed and CPU usage. The default value is 0.
split_mode
defines how the model's layers are split across multiple GPUs. Options include LLAMA_SPLIT_MODE_NONE
, LLAMA_SPLIT_MODE_LAYER
, and LLAMA_SPLIT_MODE_ROW
. This setting can help optimize GPU memory usage and processing speed.
main_gpu
indicates the primary GPU to use for processing. The default value is 0, which typically refers to the first GPU in your system.
n_gpu_layers
specifies the number of layers to offload to the GPU. A value of -1 means all layers will be processed on the GPU. Adjusting this can help balance GPU and CPU usage.
max_tokens
sets the maximum number of tokens the model can generate in a single output. The default value is 4096, which can be increased or decreased based on your needs.
temperature
controls the randomness of the model's output. A higher value (e.g., 1.6) makes the output more random, while a lower value makes it more deterministic. The default value is 1.6.
top_p
is used for nucleus sampling, where the model considers only the top p
probability mass. The default value is 0.95, which helps balance diversity and coherence in the output.
min_p
sets the minimum probability threshold for tokens to be considered in the output. The default value is 0.05, which can help filter out less likely tokens.
typical_p
is another parameter for controlling the diversity of the output. The default value is 1.0.
stop
specifies a string or list of strings that will stop the generation when encountered. This can be useful for controlling the length and content of the output.
frequency_penalty
penalizes tokens that appear frequently in the output, encouraging the model to use a more diverse vocabulary. The default value is 0.0.
presence_penalty
penalizes tokens that have already appeared in the context, further promoting diversity. The default value is 0.0.
repeat_penalty
applies a penalty to repeated tokens, helping to reduce redundancy in the output. The default value is 1.1.
top_k
limits the model to considering only the top k
tokens by probability. The default value is 50, which can help focus the output on the most likely tokens.
tfs_z
is a parameter for controlling the temperature of the final softmax layer. The default value is 1.0.
mirostat_mode
sets the mode for the Mirostat algorithm, which aims to control the perplexity of the output. Options include none
, mirostat
, and mirostat_v2
.
mirostat_tau
is a parameter for the Mirostat algorithm, controlling the target perplexity. The default value is 5.0.
mirostat_eta
is another parameter for the Mirostat algorithm, controlling the learning rate for perplexity adjustment. The default value is 0.1.
The text
output parameter provides the generated text based on the input parameters and context. This is the primary output of the node, containing the model's response or generated content.
The conditioning
output parameter contains the conditioning information used by the model to generate the text. This can include context, prompts, and other relevant data that influenced the output.
temperature
parameter to control the randomness of the output. Higher values can make the text more creative, while lower values make it more focused.stop
parameter to control where the model should stop generating text, which can help in creating more concise outputs.top_p
and top_k
to balance the diversity and coherence of the generated text, especially for creative writing tasks.n_gpu_layers
or use a model with fewer parameters.split_mode
parameter has an invalid value.split_mode
is set to one of the following: LLAMA_SPLIT_MODE_NONE
, LLAMA_SPLIT_MODE_LAYER
, or LLAMA_SPLIT_MODE_ROW
.© Copyright 2024 RunComfy. All Rights Reserved.