Visit ComfyUI Online for ready-to-use ComfyUI environment
Generate visual masks from audio amplitude data for creative visualizations.
The CreateAudioMask
node is designed to generate visual masks based on the amplitude of an audio file. This node processes an audio file to create a series of images, where each image represents a frame of the audio's spectrogram. The primary purpose of this node is to convert audio data into a visual format that can be used in various creative and artistic applications, such as audio-reactive visualizations. By analyzing the amplitude of the audio, the node creates circular masks whose sizes are proportional to the audio's intensity at each frame. This allows for dynamic and visually engaging representations of audio signals.
This parameter determines whether the generated masks should be inverted. When set to True
, the masks will be inverted, meaning the areas that would normally be white will be black, and vice versa. This can be useful for creating different visual effects. The default value is False
.
This parameter specifies the number of frames to generate from the audio file. Each frame corresponds to a segment of the audio, and the node will create a mask for each frame. The minimum value is 1, the maximum value is 255, and the default value is 16. Adjusting this parameter allows you to control the granularity of the audio analysis.
This parameter controls the scaling factor for the size of the circles in the masks. A higher value will result in larger circles, while a lower value will produce smaller circles. The minimum value is 0.0, the maximum value is 2.0, and the default value is 0.5. This parameter allows you to fine-tune the visual representation of the audio's amplitude.
This parameter specifies the path to the audio file that will be processed. The default value is "audio.wav"
. Ensure that the audio file is accessible and correctly specified, as this is crucial for the node to function properly.
This parameter sets the width of the generated images. The minimum value is 16, the maximum value is 4096, and the default value is 256. Adjusting this parameter allows you to control the resolution of the output images.
This parameter sets the height of the generated images. The minimum value is 16, the maximum value is 4096, and the default value is 256. Adjusting this parameter allows you to control the resolution of the output images.
The output is a tensor containing the generated images. Each image represents a frame of the audio's spectrogram, with circular masks indicating the amplitude of the audio at that frame. The images are normalized to a range of 0.0 to 1.0, making them suitable for further processing or visualization.
audio_path
parameter is accessible and correctly formatted to avoid errors during processing.frames
parameter to find the optimal number of frames for your specific application. More frames provide finer detail but require more processing power.scale
parameter to adjust the size of the circles in the masks to match the visual style you are aiming for.invert
parameter to see how the inverted masks look.librosa
library is not installed in your Python environment.librosa
library by running the command pip install librosa
in your terminal or command prompt.audio_path
parameter is correctly set to the location of your audio file and that the file exists.librosa
library.frames
, width
, or height
parameters to decrease the memory usage, or run the node on a machine with more GPU memory.© Copyright 2024 RunComfy. All Rights Reserved.