ComfyUI > Nodes > ComfyUI-FishSpeech > FishSpeech Voice Clone

ComfyUI Node: FishSpeech Voice Clone

Class Name

FishSpeech_INFER_SRT

Category
AIFSH_FishSpeech
Author
AIFSH (Account age: 261days)
Extension
ComfyUI-FishSpeech
Latest Updated
2024-05-23
Github Stars
0.01K

How to Install ComfyUI-FishSpeech

Install this extension via the ComfyUI Manager by searching for ComfyUI-FishSpeech
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter ComfyUI-FishSpeech in the search bar
After installation, click the Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Online for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

FishSpeech Voice Clone Description

Specialized node for processing SRT subtitles for AI artists, enabling semantic analysis and audio synthesis.

FishSpeech Voice Clone:

FishSpeech_INFER_SRT is a specialized node designed to process subtitle files (SRT) and convert them into a format suitable for semantic analysis and audio synthesis. This node is particularly useful for AI artists who work with audio-visual content and need to generate or manipulate speech based on textual subtitles. By leveraging advanced models and tokenizers, FishSpeech_INFER_SRT ensures that the text from subtitle files is accurately parsed and prepared for further processing, enabling seamless integration with other nodes in the FishSpeech suite. The node automates the downloading and loading of necessary model weights and tokenizers, making it user-friendly and efficient for creative projects.

FishSpeech Voice Clone Input Parameters:

text

This parameter expects an SRT file containing the text that needs to be processed. The text from this file will be read and used for semantic analysis and audio synthesis. The input should be a valid SRT file path.

prompt_audio

This parameter requires an audio file that serves as a reference for the speech synthesis process. The audio file should be in a format compatible with the node, such as WAV or MP3. The reference audio helps in maintaining the consistency and quality of the synthesized speech.

prompt_text

This parameter expects an SRT file containing the prompt text that will guide the speech synthesis. The text from this file will be read and used to generate the semantic representation needed for audio synthesis. The input should be a valid SRT file path.

if_mutiple_speaker

This boolean parameter indicates whether the input text involves multiple speakers. If set to True, the node will handle the text accordingly to differentiate between different speakers. The default value is False.

text2semantic_type

This parameter specifies the type of text-to-semantic model to be used. The available options are "medium" and "large," with "medium" being the default. This choice affects the complexity and accuracy of the semantic analysis.

hf_token

This string parameter is your token for downloading the necessary model weights from the Hugging Face hub. Ensure you provide a valid token to enable the node to fetch the required resources.

num_samples

This integer parameter defines the number of samples to generate. The default value is 1. Adjusting this value allows you to control the number of output variations.

max_new_tokens

This integer parameter sets the maximum number of new tokens to generate. The default value is 0, which means no new tokens will be generated beyond the input text.

top_p

This float parameter controls the nucleus sampling strategy, which affects the diversity of the generated text. The default value is 0.7. Adjusting this value can help balance between creativity and coherence.

repetition_penalty

This float parameter applies a penalty to repeated tokens, helping to reduce redundancy in the generated text. The default value is 1.5.

temperature

This float parameter controls the randomness of the text generation process. A higher value results in more random outputs, while a lower value makes the output more deterministic. The default value is 0.7.

compile

This boolean parameter indicates whether to compile the model for optimized performance. The default value is False.

seed

This integer parameter sets the random seed for reproducibility. The default value is 42.

half

This boolean parameter indicates whether to use half-precision for the model, which can reduce memory usage and speed up processing. The default value is False.

iterative_prompt

This boolean parameter indicates whether to use an iterative prompting strategy. The default value is True.

max_length

This integer parameter sets the maximum length of the input text. The default value is 2048.

chunk_length

This integer parameter defines the length of text chunks to process at a time. The default value is 30.

FishSpeech Voice Clone Output Parameters:

AUDIO

The output is an audio file generated based on the input text and reference audio. This audio file represents the synthesized speech, which can be used in various creative projects. The output format is typically WAV or MP3, depending on the configuration.

FishSpeech Voice Clone Usage Tips:

  • Ensure that your SRT files are properly formatted and contain accurate timestamps to achieve the best results.
  • Use high-quality reference audio to maintain the consistency and quality of the synthesized speech.
  • Experiment with the text2semantic_type parameter to find the right balance between model complexity and performance for your specific project.
  • Adjust the top_p and temperature parameters to fine-tune the creativity and coherence of the generated text.

FishSpeech Voice Clone Common Errors and Solutions:

"File not found: <file_path>"

  • Explanation: The specified file path for the SRT or audio file is incorrect or the file does not exist.
  • Solution: Verify that the file path is correct and that the file exists at the specified location.

"Invalid token: <hf_token>"

  • Explanation: The provided Hugging Face token is invalid or expired.
  • Solution: Ensure that you provide a valid and active Hugging Face token.

"Model weights not found"

  • Explanation: The necessary model weights could not be downloaded or loaded.
  • Solution: Check your internet connection and ensure that the Hugging Face token is valid. Verify that the specified model weights are available in the repository.

"Unsupported audio format"

  • Explanation: The provided reference audio file is in an unsupported format.
  • Solution: Convert the audio file to a supported format such as WAV or MP3 and try again.

"Text length exceeds maximum limit"

  • Explanation: The input text exceeds the maximum length specified by the max_length parameter.
  • Solution: Reduce the length of the input text or increase the max_length parameter value.

FishSpeech Voice Clone Related Nodes

Go back to the extension to check out more related nodes.
ComfyUI-FishSpeech
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.