Skip to content

Speech Transcriber sever that provides an easy to use language(only English supported for now) transcription service.

License

Notifications You must be signed in to change notification settings

mohame54/Speech-Transcriber-App

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Speech Recognition RESTful API with Whisper Model

This RESTful API is designed for speech recognition using the Whisper ASR model. It leverages the Flask framework to create an interface for transcribing audio files.

Features

ASR Model: The API employs the Whisper ASR model for accurate and efficient speech recognition.

RESTful Interface: Utilizing Flask, the API offers a RESTful interface for seamless integration and communication.

Multiple Decoding Strategies: The API supports various decoding strategies, including greedy decoding and beam search decoding(under development).

Pretrained Models: ONNX Runtime is utilized for efficient model inference, and the pretrained models can be found in the Pretrained directory.

Getting Started

Installation:

Clone this repository to your local machine Install the required dependencies using pip

pip install -r requirements.txt

Run the API:

Execute python app.py to start the Flask server and download (if needed) the optimized models.

python app.py

API Endpoints:

The main endpoint for transcribing audio is /transcribe. Sending Requests:

Send a POST request to the /transcribe endpoint with your audio file and optional generation parameters. Example using requests in Python:

##Copy code
import requests
import base64
url = "http://127.0.0.1:5000/transcribe"

# Replace 'audio_file_path' with the actual path to your audio file
audio_file_path = './test-wavs/recording.wav'

# Read the audio file content as binary data
with open(audio_file_path, 'rb') as audio_file:
    audio_content = audio_file.read()

# Encode the binary data as base64
encoded_audio_content = base64.b64encode(audio_content).decode('utf-8')

# Create the data dictionary with both file content and additional parameters
data = {
    # audio file kwargs
    'audio_file': {
        'filename': 'audio.wav',
        'content': encoded_audio_content,
        'content_type': 'audio/wav'
    },
    # Optional generation kwargs
    'generation_kwargs': {
        'decoding': 'greedy',
        "return_multiple":True
    }
}
# Send the data and wait for the response.
response = requests.post(url, json=data)

Additional Information ONNX Runtime:

The API utilizes ONNX Runtime for efficient and optimized model inference. Pretrained Models:

Pretrained Whisper models can be found in the Pretrained directory.

About

Speech Transcriber sever that provides an easy to use language(only English supported for now) transcription service.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages