Skip to content

mowa-ai/llm-as-a-service

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

llm-as-a-service

Simple FastAPI service for LLAMA-2 7B chat model.

Current version supports only 7B-chat model.

Tested on a single Nvidia L4 GPU (24GB) at GCP (machine type g2-standard-8).

How to run

Install all dependecies

Run:

poetry install

Download llama-2 model

Download llama-2-7b-chat model accordingly to the instruction from llama repository.

Setup environment variables

export RANK="0"
export WORLD_SIZE="1"
export MASTER_ADDR="0.0.0.0"
export MASTER_PORT="2137"
export NCCL_P2P_DISABLE="1"
export OMP_NUM_THREADS=4  # optional

Start the server

Run the following command:

python laas/main.py

Use server

To learn more about endpoints go to http://0.0.0.0:8080/docs

Tests

pytest

Without loading LLM

To run fast tests (no LLM loaded) run

pytest

With loading LLM

For longer tests run

pytest --runslow

About

Simple FastAPI service for LLAMA-2 7B chat model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages