ComfyUI-DataSet Introduction
ComfyUI-DataSet is an extension designed to assist AI artists and model trainers in managing and manipulating datasets. This extension provides a variety of nodes that help you visualize, organize, and process your data efficiently. Whether you are preparing data for training models or analyzing existing datasets, ComfyUI-DataSet offers tools to streamline these tasks, making it easier to handle large volumes of data and extract meaningful insights.
How ComfyUI-DataSet Works
ComfyUI-DataSet operates through a series of nodes that you can integrate into your workflow. Each node performs a specific function, such as visualizing data, copying files, or extracting specific information from text files. By connecting these nodes, you can create complex data processing pipelines tailored to your needs. Think of it as building blocks that you can combine in various ways to achieve your desired outcome.
For example, you might use a node to load text files, another to analyze the frequency of words, and a third to visualize this data in a graph. This modular approach allows you to customize your data processing workflow without needing to write any code.
ComfyUI-DataSet Features
DataSet_Visualizer
The DataSet_Visualizer
node helps you visualize dataset captions by generating graphs. It includes:
- Word Cloud: Shows token frequency with different font sizes.
- Network Graph: Illustrates relationships between tokens.
- Frequency Graph: Displays how often each token appears.
- TextFileContents: The text to be processed.
- Seperator: Delimiter used to separate tokens (comma, colon, space, pipe).
- WordCloudTop: Number of top tokens for the Word Cloud.
- NetworkGraphTop: Number of top tokens for the Network Graph.
- FrequencyGraphTop: Number of top tokens for the Frequency Graph.
Outputs
- GraphsPaths: Paths to the generated visualizations.
- GraphsImages: The generated images for the visualizations.
DataSet_CopyFiles
The DataSet_CopyFiles
node copies files from a source folder to a destination folder using different modes:
- BlindCopy: Copies all files.
- CopyByDestinationFiles: Copies files only if a matching file exists in the destination.
- source_folder: Path to the source folder.
- destination_folder: Path to the destination folder.
- copy_mode: Mode of copying (BlindCopy, CopyByDestinationFiles).
DataSet_TriggerWords
The DataSet_TriggerWords
node extracts trigger words from captions, identifying tokens that contain both letters and numbers.
- TextFileContents: The text to be processed.
- search: Mode of extraction (trigger_word_only, trigger_word_phrase).
Outputs
- Words: The extracted trigger words or phrases.
DataSet_TextFilesLoadFromList
This node processes basic attributes of text files, such as filenames and contents, from a list of file paths.
- TextFilePathsList: List of file paths to the text files.
Outputs
- TextFileNames: Names of the text files.
- TextFileNamesWithoutExtension: Names without extensions.
- TextFilePaths: File paths.
- TextFileContents: Contents of the text files.
DataSet_TextFilesLoad
Similar to the above, but uses a directory path to load text files.
- directory: Directory path where the text files are located.
Outputs
- TextFileNames: Names of the text files.
- TextFileNamesWithoutExtension: Names without extensions.
- TextFilePaths: File paths.
- TextFileContents: Contents of the text files.
DataSet_TextFilesSave
This node saves text file contents to a specified directory with various modes like overwriting, merging, and creating new files.
- TextFileNames: Names of the text files.
- TextFileContents: Contents of the text files.
- destination: Directory path for saving.
- save_mode: Mode of saving (Overwrite, Merge, SaveNew, MergeAndSaveNew).
DataSet_FindAndReplace
The DataSet_FindAndReplace
node finds and replaces text patterns within caption text files.
- TextFileContents: The text to be processed.
- SearchFor: The text pattern to search for.
- ReplaceWith: The replacement text.
Outputs
- TextFileContents: The modified text contents.
DataSet_PathSelector
This node identifies images in a sub-dataset that are missing caption text files from a larger repository.
- search_in_directory: Directory with missing pairings.
- search_for_extensions: Extensions of the orphaned files.
- select_from_directory: Repository directory with complete pairings.
- select_extensions: Extensions of the required files.
Outputs
- SelectedNamesWithExtension: Names with extensions.
- SelectedNamesWithoutExtension: Names without extensions.
- SelectedPaths: Full paths of the required files.
DataSet_ConceptManager
The DataSet_ConceptManager
node adds or removes tokens within caption files and places them at designated positions.
- TextFileContents: The text to be processed.
- Mode: Mode of operation (add, remove).
- Concepts: Concepts to add or remove.
Outputs
- TextFileContents: The modified text contents.
DataSet_OpenAIChat
This node uses the OpenAI GPT chat to help generate prompts.
- model: OpenAI model to use.
- api_url: API URL.
- api_key: API key.
- prompt: The query chat.
- token_length: Maximum number of tokens.
Outputs
- STRING: The generated prompt.
DataSet_LoadImage
Provides essential image file attributes for captioning with the DataSet_OpenAIChat
node.
- image: Name of the image file.
Outputs
- IMAGE: The image file.
- MASK: The mask associated with the image.
- STRING: Name of the image file.
- STRING: Name without extension.
- STRING: Full path of the image file.
- STRING: Directory path of the image file.
DataSet_SaveImage
Batch saves images to a specified directory with optional PNG metadata.
- Images: List of images to save.
- ImageFilePrefix: Prefix for the saved image filenames.
- destination: Directory path for saving.
DataSet_OpenAIChatImage
Uses the OpenAI GPTo multi-modal vision API to caption images.
- image: Image to be processed.
- image_detail: Detail level of the image.
- prompt: Text prompt for the AI model.
- model: OpenAI model to use.
- api_url: API URL.
- api_key: API key.
- token_length: Maximum token length.
Outputs
- STRING: Generated captions.
DataSet_OpenAIChatImageBatch
Extends the functionality of DataSet_OpenAIChatImage
to process batches of images.
- images: List of images to be processed.
- image_detail: Detail level of the images.
- prompt: Text prompt for the AI model.
- model: OpenAI model to use.
- api_url: API URL.
- api_key: API key.
- token_length: Maximum token length.
Outputs
- STRING: List of generated captions.
Troubleshooting ComfyUI-DataSet
Common Issues and Solutions
- Node Not Working as Expected:
- Ensure all required inputs are provided.
- Check for any error messages in the console.
- Restart ComfyUI and try again.
- File Not Found Errors:
- Verify the file paths are correct.
- Ensure the files exist in the specified directories.
- API Key Issues:
- Double-check the API key for OpenAI nodes.
- Ensure the API key has the necessary permissions.
Frequently Asked Questions
Q: How do I update ComfyUI-DataSet?
A: Follow the installation instructions to update the extension. Restart ComfyUI after updating.
Q: Can I use ComfyUI-DataSet with other extensions?
A: Yes, ComfyUI-DataSet is designed to work alongside other extensions. Ensure there are no conflicts between nodes.
Learn More about ComfyUI-DataSet
For additional resources, tutorials, and community support, visit the following links: