Visit ComfyUI Online for ready-to-use ComfyUI environment
Specialized node for processing SRT subtitles for AI artists, enabling semantic analysis and audio synthesis.
FishSpeech_INFER_SRT is a specialized node designed to process subtitle files (SRT) and convert them into a format suitable for semantic analysis and audio synthesis. This node is particularly useful for AI artists who work with audio-visual content and need to generate or manipulate speech based on textual subtitles. By leveraging advanced models and tokenizers, FishSpeech_INFER_SRT ensures that the text from subtitle files is accurately parsed and prepared for further processing, enabling seamless integration with other nodes in the FishSpeech suite. The node automates the downloading and loading of necessary model weights and tokenizers, making it user-friendly and efficient for creative projects.
This parameter expects an SRT file containing the text that needs to be processed. The text from this file will be read and used for semantic analysis and audio synthesis. The input should be a valid SRT file path.
This parameter requires an audio file that serves as a reference for the speech synthesis process. The audio file should be in a format compatible with the node, such as WAV or MP3. The reference audio helps in maintaining the consistency and quality of the synthesized speech.
This parameter expects an SRT file containing the prompt text that will guide the speech synthesis. The text from this file will be read and used to generate the semantic representation needed for audio synthesis. The input should be a valid SRT file path.
This boolean parameter indicates whether the input text involves multiple speakers. If set to True, the node will handle the text accordingly to differentiate between different speakers. The default value is False.
This parameter specifies the type of text-to-semantic model to be used. The available options are "medium" and "large," with "medium" being the default. This choice affects the complexity and accuracy of the semantic analysis.
This string parameter is your token for downloading the necessary model weights from the Hugging Face hub. Ensure you provide a valid token to enable the node to fetch the required resources.
This integer parameter defines the number of samples to generate. The default value is 1. Adjusting this value allows you to control the number of output variations.
This integer parameter sets the maximum number of new tokens to generate. The default value is 0, which means no new tokens will be generated beyond the input text.
This float parameter controls the nucleus sampling strategy, which affects the diversity of the generated text. The default value is 0.7. Adjusting this value can help balance between creativity and coherence.
This float parameter applies a penalty to repeated tokens, helping to reduce redundancy in the generated text. The default value is 1.5.
This float parameter controls the randomness of the text generation process. A higher value results in more random outputs, while a lower value makes the output more deterministic. The default value is 0.7.
This boolean parameter indicates whether to compile the model for optimized performance. The default value is False.
This integer parameter sets the random seed for reproducibility. The default value is 42.
This boolean parameter indicates whether to use half-precision for the model, which can reduce memory usage and speed up processing. The default value is False.
This boolean parameter indicates whether to use an iterative prompting strategy. The default value is True.
This integer parameter sets the maximum length of the input text. The default value is 2048.
This integer parameter defines the length of text chunks to process at a time. The default value is 30.
The output is an audio file generated based on the input text and reference audio. This audio file represents the synthesized speech, which can be used in various creative projects. The output format is typically WAV or MP3, depending on the configuration.
text2semantic_type
parameter to find the right balance between model complexity and performance for your specific project.top_p
and temperature
parameters to fine-tune the creativity and coherence of the generated text.<file_path>
"<hf_token>
"max_length
parameter.max_length
parameter value.© Copyright 2024 RunComfy. All Rights Reserved.