REST API

Endpoints

List models available
Request a model
Load a model
Check a loaded model's status
Run a loaded model
List all running models
Shutdown a loaded model

Auth

Your api key should be added to an Authorization header, e.g. Authorization: Key API_KEY

Parameters

model: (required) the model name
concurrency: (default = 1) for /load endpoint
prompt: the prompt to generate a response

Commands

List models available

Request

curl --location 'https://api.bytez.com/model/list' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json'

Response

[{"name":"EleutherAI/gpt-neo-2.7B","requiredRAM":2.232933128273094,"benchmarked":true},{"name":"Gustavosta/MagicPrompt-Stable-Diffusion","requiredRAM":0.9401917929177755,"benchmarked":true},{"name":"Gustavosta/MagicPrompt-Stable-Diffusion.onnx.8-bit","requiredRAM":null,"benchmarked":false}, ....]

Request a model

Note, this is an automated process. This will queue up our system to make the model available. Behind the scenes we need to compute things such as the amount of VRAM the model takes up when running.

curl --location 'https://api.bytez.com/model/job' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2"
}'

Response

{
  "model": "openai-community/gpt2",
  "success": true,
  "modified": "2024-06-07T22:28:40.122Z"
}

Note, you can check the status of the model by repeating the same call. (See the message prop in the response.)

curl --location 'https://api.bytez.com/model/job' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2"
}'

Response

{
  "model": "openai-community/gpt2",
  "success": true,
  "message": "Model is already queued",
  "startTime": null,
  "modified": "2024-06-07T22:29:22.333Z"
}

When the model is ready, you will get the following response:

Response

Notice how "message" in the response now says "Model available"

{
  "model": "chavinlo/alpaca-native",
  "success": true,
  "message": "Model available",
  "startTime": "2024-05-30T01:20:37.644Z",
  "endTime": "2024-05-30T01:22:07.804Z",
  "modified": "2024-05-30T01:22:07.804Z"
}

Load a model

Request

curl --location 'https://api.bytez.com/model/load' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2",
    "concurrency": 1
}'

Response

{ "model": "openai-community/gpt2", "status": "started", "concurrency": 1 }

Note, this endpoint also takes in a param, expirationPeriodSeconds, which allows for your instance to expire within 2 minutes after the expirationPeriodSeconds has been reached.

Any time a request is sent to run the model, this expiration period resets. Meaning the instance will continue to run as long as you are making requests to it within the specified expiriation period.

To make an instance expire 5 minutes after the last request it receives, you would do this:

Request

curl --location 'https://api.bytez.com/model/load' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2",
    "concurrency": 1,
    "expirationPeriodSeconds": 300
}'

Check a loaded model's status

Request

curl --location 'https://api.bytez.com/model/status' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2"
}'

Response

{
  "model": "openai-community/gpt2",
  "status": "RUNNING",
  "concurrency": 1,
  "inferences": 0,
  "expirationPeriodSeconds": 1800,
  "expirationPeriodMinutes": 30,
  "expiresAt": "2024-05-28T00:12:18.738Z",
  "created": "2024-05-27T23:35:35.863Z",
  "modified": "2024-05-27T23:42:19.239Z"
}

Run a model

Request

curl --location 'https://api.bytez.com/model/run' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2",
    "prompt": "Once upon a time there was a",
    "params": {
        "min_length": 30,
        "max_length": 256
    },
    "stream": true
}'

Response

Once upon a time there was a man upon the throne...But now it is him who must stand up! [...]

List all running models

Request

curl --location 'https://api.bytez.com/model/instances' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json'

Shutdown a loaded model

Request

curl --location 'https://api.bytez.com/model/delete' \
--header 'Authorization: Key API_KEY' \
--header 'Content-Type: application/json' \
--data '{
    "model": "openai-community/gpt2"
}'

Help us make this better

At Bytez, we want to build the best DX for AI builders. We value your feedback! If you have suggestions for improving our docs, please let us know on Discord or via team@bytez.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

api.md

api.md

REST API

Endpoints

Auth

Parameters

Commands

List models available

Request

Response

Request a model

Response

Response

Response

Load a model

Request

Response

Request

Check a loaded model's status

Request

Response

Run a model

Request

Response

List all running models

Request

Shutdown a loaded model

Request

Help us make this better

Files

api.md

Latest commit

History

api.md

File metadata and controls

REST API

Endpoints

Auth

Parameters

Commands

List models available

Request

Response

Request a model

Response

Response

Response

Load a model

Request

Response

Request

Check a loaded model's status

Request

Response

Run a model

Request

Response

List all running models

Request

Shutdown a loaded model

Request

Help us make this better