ComfyUI  >  Nodes  >  comfyui_LLM_party >  OpenAI语音识别(openai_whisper)

ComfyUI Node: OpenAI语音识别(openai_whisper)

Class Name

openai_whisper

Category
大模型派对(llm_party)/函数(function)
Author
heshengtao (Account age: 2893 days)
Extension
comfyui_LLM_party
Latest Updated
6/22/2024
Github Stars
0.1K

How to Install comfyui_LLM_party

Install this extension via the ComfyUI Manager by searching for  comfyui_LLM_party
  • 1. Click the Manager button in the main menu
  • 2. Select Custom Nodes Manager button
  • 3. Enter comfyui_LLM_party in the search bar
After installation, click the  Restart button to restart ComfyUI. Then, manually refresh your browser to clear the cache and access the updated list of nodes.

Visit ComfyUI Cloud for ready-to-use ComfyUI environment

  • Free trial available
  • High-speed GPU machines
  • 200+ preloaded models/nodes
  • Freedom to upload custom models/nodes
  • 50+ ready-to-run workflows
  • 100% private workspace with up to 200GB storage
  • Dedicated Support

Run ComfyUI Online

OpenAI语音识别(openai_whisper) Description

Transcribe audio to text with high accuracy using OpenAI's Whisper model for subtitles, interviews, and voice notes.

OpenAI语音识别(openai_whisper):

The openai_whisper node is designed to transcribe audio files into text using OpenAI's Whisper model. This node is particularly useful for converting spoken language into written text, making it an invaluable tool for tasks such as creating subtitles, transcribing interviews, or generating text from voice notes. By leveraging the powerful Whisper model, this node ensures high accuracy and reliability in transcription, providing you with clear and precise text outputs from your audio inputs. The node is easy to use and integrates seamlessly with other components, allowing you to automate and streamline your transcription workflows.

OpenAI语音识别(openai_whisper) Input Parameters:

is_enable

This parameter is a boolean that determines whether the node is active or not. When set to True, the node will process the audio input and generate a transcription. If set to False, the node will not perform any action. The default value is True.

audio

This parameter accepts an audio file that you want to transcribe. The audio file should be in a format that is compatible with the Whisper model, such as WAV or MP3. The node will read this file and use it as the input for the transcription process.

base_url

This optional parameter allows you to specify the base URL for the OpenAI API. If not provided, the node will use the default URL https://api.openai.com/v1/. This parameter is useful if you need to point to a different API endpoint.

api_key

This optional parameter is used to provide your OpenAI API key. If not provided, the node will attempt to load the API key from the environment variables. This key is essential for authenticating your requests to the OpenAI API.

OpenAI语音识别(openai_whisper) Output Parameters:

text

The output parameter text contains the transcribed text from the provided audio file. This text is generated by the Whisper model and represents the spoken content of the audio input in written form. The output is a string that you can use for further processing or display.

OpenAI语音识别(openai_whisper) Usage Tips:

  • Ensure that your audio files are clear and free from excessive background noise to improve transcription accuracy.
  • Use the base_url parameter if you need to connect to a custom OpenAI API endpoint.
  • Always provide a valid api_key to authenticate your requests and avoid errors related to missing or invalid API keys.
  • Test with different audio formats to find the one that works best with the Whisper model for your specific use case.

OpenAI语音识别(openai_whisper) Common Errors and Solutions:

请输入API_KEY

  • Explanation: This error occurs when the API key is not provided or is empty.
  • Solution: Ensure that you have provided a valid API key either through the api_key parameter or by setting the appropriate environment variable.

Invalid audio file

  • Explanation: This error occurs when the provided audio file is not in a supported format or is corrupted.
  • Solution: Check the audio file format and ensure it is compatible with the Whisper model. Supported formats typically include WAV and MP3.

API request failed

  • Explanation: This error occurs when the request to the OpenAI API fails, possibly due to network issues or incorrect API endpoint.
  • Solution: Verify your network connection and ensure that the base_url parameter is correctly set to a valid OpenAI API endpoint.

Transcription failed

  • Explanation: This error occurs when the Whisper model is unable to transcribe the provided audio file.
  • Solution: Ensure that the audio file is clear and of good quality. Try using a different audio file to see if the issue persists.

OpenAI语音识别(openai_whisper) Related Nodes

Go back to the extension to check out more related nodes.
comfyui_LLM_party
RunComfy

© Copyright 2024 RunComfy. All Rights Reserved.

RunComfy is the premier ComfyUI platform, offering ComfyUI online environment and services, along with ComfyUI workflows featuring stunning visuals.