Visit ComfyUI Online for ready-to-use ComfyUI environment
Dissect audio files into elements for reactive visualizations and manipulation, leveraging advanced separation models.
The Audio Analysis node is designed to dissect audio files into their constituent elements, such as drums, vocals, bass, and others, allowing you to generate reactive weights and visual graphs based on these components. This node is particularly beneficial for AI artists who wish to create audio-reactive visualizations or manipulate specific audio elements for creative projects. By leveraging advanced audio separation models, the node can isolate and process different audio components, providing you with the flexibility to focus on particular elements of a track. The node's primary goal is to facilitate the extraction and analysis of audio elements, enabling you to apply manual control over audio weights and enhance your creative workflow.
This parameter requires a pre-loaded audio separation model, which is essential for the node to function. The model is responsible for isolating different audio components, such as drums, vocals, and bass, from the input audio. The quality and accuracy of the separation depend on the model used, making it a critical component of the node's execution.
The batch size determines the number of frames that will be associated with the audio weights during processing. It directly impacts the granularity of the analysis, with larger batch sizes potentially leading to less detailed weight distribution. The batch size must be an integer, and it is crucial to balance it according to the desired level of detail and processing efficiency.
Frames per second (fps) is a parameter that sets the rate at which audio weights are processed. It affects the temporal resolution of the analysis, with higher fps values providing more frequent updates to the audio weights. This parameter is a float and should be chosen based on the desired smoothness and responsiveness of the audio-reactive elements.
The audio parameter is the input audio file that you wish to analyze. It must contain a waveform and a sample rate, as these are necessary for the node to process the audio correctly. The quality and format of the input audio can influence the results, so it is important to ensure that the audio is properly prepared before analysis.
This parameter allows you to select the specific audio component to analyze, such as "Drums Only," "Vocals Only," "Bass Only," "Others Audio," or "Full Audio." The choice of analysis mode determines which elements of the audio will be isolated and processed, providing you with control over the focus of the analysis.
The threshold parameter sets the minimum weight value that must be exceeded for an audio component to be considered significant. It is a float with a default value of 0.5, and it can range from 0.0 to 1.0. Adjusting the threshold allows you to filter out less prominent audio elements, ensuring that only the most impactful components are highlighted.
This parameter is an amplification factor applied to the audio weights before normalization. It is a float with a default value of 1.0, and it can range from 0.0 to 5.0. By adjusting the multiply value, you can enhance or diminish the influence of the audio weights, providing additional control over the final output.
The processed audio output is the result of the audio separation process, containing only the isolated components specified by the analysis mode. This output allows you to work with specific elements of the audio, such as drums or vocals, independently from the rest of the track.
This output provides the original, unmodified audio input, allowing you to reference the initial audio file alongside the processed results. It is useful for comparison and ensuring that the integrity of the original audio is maintained.
Audio weights are a list of values that represent the reactive weights based on the processed audio. These weights can be used to create audio-reactive visualizations or to inform other creative processes. They provide a quantitative measure of the audio's dynamic elements.
The graph audio output is an image that visualizes the audio weights over time, providing a graphical representation of the audio's dynamic characteristics. This visualization can be used to better understand the distribution and impact of the audio weights across the frames.
RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals. RunComfy also provides AI Playground, enabling artists to harness the latest AI tools to create incredible art.