Skip to content

Commit

Permalink
Update scripts/inference/benchmarking/README.md
Browse files Browse the repository at this point in the history
Co-authored-by: Vitaliy Chiley <[email protected]>
  • Loading branch information
sashaDoubov and vchiley committed Jun 29, 2023
1 parent 5ffaa4a commit 21e7b47
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion scripts/inference/benchmarking/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -125,7 +125,7 @@ Benchmark Setup:

![assets](assets/Latency-vs.-Throughput-(n_input_tok=512,-n_output_tok=64).svg)

Here, we perform a similar benchmark to the previous section, but compare different open-source models amongst each other in doing inference.
Here, we perform a similar benchmark to the previous section, but compare inference performance for different open-source models.
The benchmark script supports calling models directly from huggingface (using `hf.generate`), which is done to keep the comparison amongst the models fair.
The analysis is done on a single A100 80GB GPU, with input length 512, and output length 64, while varying the batch size. As in previous sections, the batch sizes swept are 1, 2, 4, 8, 16, 32, 64, unless the GPU ran out of memory, in which case that point is not shown.

Expand Down

0 comments on commit 21e7b47

Please sign in to comment.