Speech Recognition RESTful API with Whisper Model

This RESTful API is designed for speech recognition using the Whisper ASR model. It leverages the Flask framework to create an interface for transcribing audio files.

Features

ASR Model: The API employs the Whisper ASR model for accurate and efficient speech recognition.

RESTful Interface: Utilizing Flask, the API offers a RESTful interface for seamless integration and communication.

Multiple Decoding Strategies: The API supports various decoding strategies, including greedy decoding and beam search decoding(under development).

Pretrained Models: ONNX Runtime is utilized for efficient model inference, and the pretrained models can be found in the Pretrained directory.

Getting Started

Installation:

Clone this repository to your local machine Install the required dependencies using pip

pip install -r requirements.txt

Run the API:

Execute python app.py to start the Flask server and download (if needed) the optimized models.

python app.py

API Endpoints:

The main endpoint for transcribing audio is /transcribe. Sending Requests:

Send a POST request to the /transcribe endpoint with your audio file and optional generation parameters. Example using requests in Python:

##Copy code
import requests
import base64
url = "http://127.0.0.1:5000/transcribe"

# Replace 'audio_file_path' with the actual path to your audio file
audio_file_path = './test-wavs/recording.wav'

# Read the audio file content as binary data
with open(audio_file_path, 'rb') as audio_file:
    audio_content = audio_file.read()

# Encode the binary data as base64
encoded_audio_content = base64.b64encode(audio_content).decode('utf-8')

# Create the data dictionary with both file content and additional parameters
data = {
    # audio file kwargs
    'audio_file': {
        'filename': 'audio.wav',
        'content': encoded_audio_content,
        'content_type': 'audio/wav'
    },
    # Optional generation kwargs
    'generation_kwargs': {
        'decoding': 'greedy',
        "return_multiple":True
    }
}
# Send the data and wait for the response.
response = requests.post(url, json=data)

Additional Information ONNX Runtime:

The API utilizes ONNX Runtime for efficient and optimized model inference. Pretrained Models:

Pretrained Whisper models can be found in the Pretrained directory.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
test-wavs		test-wavs
whisper		whisper
LICENSE		LICENSE
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt
startup.txt		startup.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Recognition RESTful API with Whisper Model

This RESTful API is designed for speech recognition using the Whisper ASR model. It leverages the Flask framework to create an interface for transcribing audio files.

Features

Getting Started

Installation:

Run the API:

About

Releases

Packages

Languages

License

mohame54/Speech-Transcriber-App

Folders and files

Latest commit

History

Repository files navigation

Speech Recognition RESTful API with Whisper Model

This RESTful API is designed for speech recognition using the Whisper ASR model. It leverages the Flask framework to create an interface for transcribing audio files.

Features

Getting Started

Installation:

Run the API:

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages