Skip to content

Commit

Permalink
Adding Whisper support
Browse files Browse the repository at this point in the history
  • Loading branch information
toshiakit committed May 15, 2024
1 parent 9054461 commit 471dc4e
Show file tree
Hide file tree
Showing 3 changed files with 283 additions and 3 deletions.
33 changes: 30 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,14 @@

This repository contains example code to demonstrate how to connect MATLAB to the OpenAI™ Chat Completions API (which powers ChatGPT™) as well as OpenAI Images API (which powers DALL·E™). This allows you to leverage the natural language processing capabilities of large language models directly within your MATLAB environment.

The functionality shown here serves as an interface to the ChatGPT and DALL·E APIs. To start using the OpenAI APIs, you first need to obtain OpenAI API keys. You are responsible for any fees OpenAI may charge for the use of their APIs. You should be familiar with the limitations and risks associated with using this technology, and you agree that you shall be solely responsible for full compliance with any terms that may apply to your use of the OpenAI APIs.
The functionality shown here serves as an interface to the Chat Completion, Images and Audio APIs. To start using the OpenAI APIs, you first need to obtain OpenAI API keys. You are responsible for any fees OpenAI may charge for the use of their APIs. You should be familiar with the limitations and risks associated with using this technology, and you agree that you shall be solely responsible for full compliance with any terms that may apply to your use of the OpenAI APIs.

Some of the current LLMs supported are:
- gpt-3.5-turbo, gpt-3.5-turbo-1106, gpt-3.5-turbo-0125
- gpt-4-turbo, gpt-4-turbo-2024-04-09 (GPT-4 Turbo with Vision)
- gpt-4, gpt-4-0613
- dall-e-2, dall-e-3
- dall-e-2, dall-e-3 (Images)
- Whisper (Audio)

For details on the specification of each model, check the official [OpenAI documentation](https://platform.openai.com/docs/models).

Expand Down Expand Up @@ -328,6 +329,31 @@ imshow(images{1})
% Should output an image based on the prompt
```

## Getting Started with Audio API
Generate speech from your text with OpenAI using the function `openAIAudio.speech` as follows:
```matlab
exampleText = "Here is an example!";
[y,Fs] = openAIAudio.speech(exampleText);
sound(y,Fs)
audiowrite("example.wav",y,Fs) % audio file
```

Transcribe speeach from the audio file using the function `openAIAudio.transcrptions` as follows:
```matlab
output = openAIAudio.transcriptions("example.wav");
delete("example.wav")
output.text
```

This will return the original input text.
```shell
>> output.text

ans =

'Here is an example.'
```

## Examples
To learn how to use this in your workflows, see [Examples](/examples/).

Expand All @@ -340,7 +366,8 @@ To learn how to use this in your workflows, see [Examples](/examples/).
- [DescribeImagesUsingChatGPT.mlx](/examples/DescribeImagesUsingChatGPT.mlx): Learn how to use GPT-4 Turbo with Vision to understand the content of an image.
- [AnalyzeSentimentinTextUsingChatGPTinJSONMode.mlx](/examples/AnalyzeSentimentinTextUsingChatGPTinJSONMode.mlx): Learn how to use JSON mode in chat completions
- [UsingDALLEToEditImages.mlx](/examples/UsingDALLEToEditImages.mlx): Learn how to generate images
- [UsingDALLEToGenerateImages.mlx](/examples/UsingDALLEToGenerateImages.mlx): Create variations of images and editimages.
- [UsingDALLEToGenerateImages.mlx](/examples/UsingDALLEToGenerateImages.mlx): Create variations of images and edit images.
- [UsingWhisperToTranscribeSpeech.mlx](/example/UsingWhisperToTranscribeSpeech.mlx): Transcribe speech and have it read aloud.

## License

Expand Down
Binary file added examples/UsingWhisperToTranscribeSpeech.mlx
Binary file not shown.
253 changes: 253 additions & 0 deletions openAIAudio.m
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
classdef openAIAudio
%openAIAudio Collection of static methods to connect to Audio API from OpenAI.
%
% openAIAudio Functions:
% speech - Text to Speech API from OpenAI that generates
% audio from text.
% transcriptions - Speech to Text API from OpenAI that generates
% text transcription from an audio file.
% translations - Speech to Text API from OpenAI that generates
% English translation from a foreign language audio file.

methods (Access=public,Static)

function [y,Fs,response] = speech(text,nvp)
% SPEECH Generate speech using the OpenAI API
%
% [y,Fs,response] = OPENAIAUDIO.SPEECH(text) generates audio
% from the input TEXT using the OpenAI API, and returns
% sampled data, y, and a sample rate for that data, Fs.
% use `audiowrite(filename,y,Fs)` to save the audio to a file.
%
% [y,Fs,response] = OPENAIAUDIO.SPEECH(__, Name=Value) specifies additional options
% using one or more name-value arguments:
%
% ModelName - Name of the model to use for speech generation.
% "tts-1" (default) or "tts-1-hd"
% Voice - The voice to use in generated audio. Options are:
% "alloy" (default), "echo", "fable", "onyx",
% "nova", and "shimmer". The preview is available here
% https://platform.openai.com/docs/guides/text-to-speech/voice-options
% Speed - The speed of the generated, from 0.25 to 4. Default is 1.
% TimeOut - Connection Timeout in seconds (default: 10 secs)
%

arguments
text (1,1) {mustBeTextScalar}
nvp.ModelName (1,1) {mustBeMember(nvp.ModelName,["tts-1","tts-1-hd"])} = "tts-1"
nvp.Voice (1,1) {mustBeMember(nvp.Voice,["alloy","echo","fable","onyx","nova","shimmer"])} = "alloy"
nvp.Speed (1,2) {mustBeNumeric,mustBeInRange(nvp.Speed,0.25,4)} = 1
nvp.TimeOut (1,1) {mustBeReal,mustBePositive} = 10
nvp.ApiKey {mustBeNonzeroLengthTextScalar}
end

endpoint = "https://api.openai.com/v1/audio/speech";
apikey = getenv("OPENAI_API_KEY");
timeout = nvp.TimeOut;
params = struct("model",nvp.ModelName,"input",text,"voice",nvp.Voice);
if nvp.Speed ~= 1
params.speed = nvp.Speed;
end

% Send the HTTP Request
response = sendRequest(apikey, endpoint, params, timeout);
if isfield(response.Body.Data,"error")
y = [];
Fs = [];
else
y = response.Body.Data{1};
Fs = response.Body.Data{2};
end

end

function [output,response] = transcriptions(filepath,nvp)
% TRANSCRIPTIONS Transcribe audio using the OpenAI API
%
% [output, response] = OPENAIAUDIO.TRANSCRIPTIONS(filepath) generates
% text transcription from the input audio file FILEPATH using the
% OpenAI API.
%
% [output, response] = OPENAIAUDIO.TRANSCRIPTIONS(__, Name=Value)
% specifies additional options using one or more name-value arguments:
%
% ModelName - Name of the model to use for transcription.
% Only "whisper-1" is currently available.
% Language - The language of the input audio. This
% improves the accuracy and latency. Use
% the ISO-639-1 format to specify the language
% https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes
% Prompt - An optional text to guide the model's style
% or continue a previous audio segment.
% ResponseFormat - The format of the transcript output:
% "json" (default), "text", "srt", "vtt",
% or "verbose_json"
% Temperature - The sampling temperature between 0 and 1.
% Higher values like 0.8 will make the output
% more random. Default is 0.
% TimestampGranularities - The timestamp granularity used when
% ResponseFormat is set to "verbose_json":
% "segment" (default), and/or "word". Choosing
% "word" will add latency.
% TimeOut - Connection Timeout in seconds (default: 10 secs)
%

arguments
filepath {mustBeValidFileType(filepath)}
nvp.ModelName {mustBeMember(nvp.ModelName,"whisper-1")}= "whisper-1"
nvp.Language {mustBeValidLanCode(nvp.Language)}
nvp.Prompt {mustBeTextScalar}
nvp.ResponseFormat {mustBeMember(nvp.ResponseFormat, ...
["json","text","srt","vtt","verbose_json"])} = "json"
nvp.Temperature (1,1) {mustBeInRange(nvp.Temperature,0,1)} = 0
nvp.TimestampGranularities (1,:) {mustBeText,mustBeMember(nvp.TimestampGranularities, ...
["segment","word"])}
nvp.TimeOut (1,1) {mustBeReal,mustBePositive} = 10
nvp.ApiKey {mustBeNonzeroLengthTextScalar}
end

endpoint = "https://api.openai.com/v1/audio/transcriptions";
apikey = getenv("OPENAI_API_KEY");
timeout = nvp.TimeOut;

import matlab.net.http.io.*
params = struct('model', nvp.ModelName, 'file', FileProvider(filepath));
if isfield(nvp,"Language")
params.language = nvp.Language;
end
if isfield(nvp,"Prompt")
params.prompt = nvp.Prompt;
end
if nvp.ResponseFormat ~= "json"
params.response_format = nvp.ResponseFormat;
end
if nvp.Temperature > 0
params.temperature = nvp.Temperature;
end
if isfield(nvp,"TimestampGranularities")
if nvp.ResponseFormat == "verbose_json"
if isscalar(nvp.TimestampGranularities) && nvp.TimestampGranularities ~= "segment"
params.timestamp_granularities = {nvp.TimestampGranularities};
else
params.timestamp_granularities = nvp.TimestampGranularities;
end
else
warning("set ResponseFormat to 'verbose_json' to enable TimestampGranularities.")
end
end
keyval = [fieldnames(params) struct2cell(params)].';
body = MultipartFormProvider(keyval{:});

% Send the HTTP Request
response = sendRequest(apikey, endpoint, body, timeout);
if isfield(response.Body.Data,"error")
output = "";
else
output = response.Body.Data;
end
end

function [output,response] = translations(filepath,nvp)
% TRANSLATIONS Translate audio into English text using the OpenAI API
%
% [output, response] = OPENAIAUDIO.TRANSLATIONS(filepath) generates
% English translation from the input audio file FILEPATH using the
% OpenAI API.
%
% [output, response] = OPENAIAUDIO.TRANSLATIONS(__, Name=Value)
% specifies additional options using one or more name-value arguments:
%
% ModelName - Name of the model to use for transcription.
% Only "whisper-1" is currently available.
% Prompt - An optional text to guide the model's style
% or continue a previous audio segment.The
% prompt must be in English.
% ResponseFormat - The format of the transcript output:
% "json" (default), "text", "srt", "vtt",
% or "verbose_json"
% Temperature - The sampling temperature between 0 and 1.
% Higher values like 0.8 will make the output
% more random. Default is 0.
%

arguments
filepath {mustBeValidFileType(filepath)}
nvp.ModelName {mustBeMember(nvp.ModelName,"whisper-1")}= "whisper-1"
nvp.Prompt {mustBeTextScalar}
nvp.ResponseFormat {mustBeMember(nvp.ResponseFormat, ...
["json","text","srt","vtt","verbose_json"])} = "json"
nvp.Temperature (1,1) {mustBeInRange(nvp.Temperature,0,1)} = 0
nvp.TimeOut (1,1) {mustBeReal,mustBePositive} = 10
nvp.ApiKey {mustBeNonzeroLengthTextScalar}
end

endpoint = "https://api.openai.com/v1/audio/translations";
apikey = getenv("OPENAI_API_KEY");
timeout = nvp.TimeOut;

import matlab.net.http.io.*
params = struct('model', nvp.ModelName, 'file', FileProvider(filepath));
if isfield(nvp,"Prompt")
params.prompt = nvp.Prompt;
end
if nvp.ResponseFormat ~= "json"
params.response_format = nvp.ResponseFormat;
end
if nvp.Temperature > 0
params.temperature = nvp.Temperature;
end
keyval = [fieldnames(params) struct2cell(params)].';
body = MultipartFormProvider(keyval{:});

keyval = [fieldnames(params) struct2cell(params)].';
body = MultipartFormProvider(keyval{:});

% Send the HTTP Request
response = sendRequest(apikey, endpoint, body, timeout);
if isfield(response.Body.Data,"error")
output = "";
else
output = response.Body.Data;
end

end

end

end

function response = sendRequest(apikey,endpoint,body,timeout)
% sendRequest send request to the given endpoint, return response
headers = matlab.net.http.HeaderField('Authorization', "Bearer " + apikey);
if isa(body,'struct')
headers(2) = matlab.net.http.HeaderField('Content-Type', 'application/json');
end
request = matlab.net.http.RequestMessage('post', headers, body);
httpOpts = matlab.net.http.HTTPOptions;
httpOpts.ConnectTimeout = timeout;
response = send(request, matlab.net.URI(endpoint), httpOpts);
end

function mustBeValidFileType(filePath)
mustBeFile(filePath);
s = dir(filePath);
if ~endsWith(s.name, [".flac",".mp3",".mp4",".mpeg",".mpga",".m4a",".ogg",".wav","webm"])
error("Not a valid file type")
% error("llms:pngExpected", ...
% llms.utils.errorMessageCatalog.getMessage("llms:pngExpected"));
end
mustBeLessThan(s.bytes,4e+6)
end

function mustBeNonzeroLengthTextScalar(content)
mustBeNonzeroLengthText(content)
mustBeTextScalar(content)
end

function mustBeValidLanCode(code)
mustBeTextScalar(code)
if strlength(code) ~= 2
error("Use 2-letter ISO-639-1 language code.")
end

end

0 comments on commit 471dc4e

Please sign in to comment.