ComfyUI > Nodes > ComfyUI-Documents > Text Chunker

ComfyUI Node: Text Chunker

Class Name

TextChunker

Category
document_processing
Author
Indra's Mirror (Account age: 37days)
Extension
ComfyUI-Documents
Latest Updated
2024-07-11
Github Stars
0.02K

How to Install ComfyUI-Documents

Install this extension via the ComfyUI Manager by searching for ComfyUI-Documents
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-Documents in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

Text Chunker Description

Breaks text into manageable chunks for efficient processing, offering flexibility in chunking methods and word boundary preservation.

Text Chunker:

The TextChunker node is designed to break down large blocks of text into smaller, more manageable chunks. This is particularly useful for processing lengthy documents or texts where handling the entire content at once is impractical. By dividing the text into chunks, you can perform more efficient and targeted operations on each segment. The TextChunker node offers flexibility in how the text is divided, allowing you to choose between chunking by words or characters. Additionally, it provides an option to respect word boundaries, ensuring that chunks do not split words inappropriately. This node is essential for tasks that require text segmentation, such as natural language processing, document analysis, and data preparation for machine learning models.

Text Chunker Input Parameters:

text

This parameter accepts the text that you want to chunk. It should be a string and can be multiline. The text is the primary input that will be processed and divided into smaller segments.

chunk_size

This parameter determines the size of each chunk. When chunking by words, it specifies the number of words per chunk. When chunking by characters, it specifies the number of characters per chunk. The default value is 1000, with a minimum value of 1 and a maximum value of 10000. Adjusting this parameter allows you to control the granularity of the chunks.

chunk_method

This parameter allows you to choose the method of chunking. It can be set to either "words" or "characters". Selecting "words" will divide the text based on word count, while "characters" will divide the text based on character count. This choice affects how the text is segmented and can be tailored to your specific needs.

respect_word_boundaries

This boolean parameter determines whether to respect word boundaries when chunking by characters. If set to true, the node will ensure that chunks do not split words, providing cleaner and more readable segments. The default value is true. This parameter is particularly useful when you want to maintain the integrity of words within each chunk.

Text Chunker Output Parameters:

text_chunks

This output parameter returns the resulting chunks of text as a list of strings. Each string in the list represents a chunk of the original text, divided according to the specified chunk size and method. This output allows you to easily access and process each segment individually.

Text Chunker Usage Tips:

  • To ensure that chunks are readable and do not split words, set the respect_word_boundaries parameter to true when chunking by characters.
  • Adjust the chunk_size parameter based on the length of your text and the desired granularity of the chunks. For shorter texts, a smaller chunk size may be more appropriate.
  • Use the chunk_method parameter to choose the most suitable chunking method for your task. Chunking by words is often more natural for text analysis, while chunking by characters can be useful for specific technical applications.

Text Chunker Common Errors and Solutions:

No text provided

  • Explanation: This error occurs when the input text is empty or not provided.
  • Solution: Ensure that you provide a valid text input for the text parameter.

Selected index is out of range. Available chunks: <number_of_chunks>

  • Explanation: This error occurs when the selected index for chunk routing is outside the range of available chunks.
  • Solution: Verify that the selected_index parameter is within the valid range of chunk indices. Adjust the index to a value between 0 and the number of available chunks minus one.

Text Chunker Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-Documents
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.