From 32a2a92f67ea1844f6fe83cdfa53a098bfb61c1f Mon Sep 17 00:00:00 2001 From: Guillaume Klein Date: Fri, 23 Jun 2023 17:11:25 +0200 Subject: [PATCH] Update README --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 7909c2edd..d0c2804af 100644 --- a/README.md +++ b/README.md @@ -24,7 +24,7 @@ The project is production-oriented and comes with [backward compatibility guaran ## Key features * **Fast and efficient execution on CPU and GPU**
The execution [is significantly faster and requires less resources](#benchmarks) than general-purpose deep learning frameworks on supported models and tasks thanks to many advanced optimizations: layer fusion, padding removal, batch reordering, in-place operations, caching mechanism, etc. -* **Quantization and reduced precision**
The model serialization and computation support weights with [reduced precision](https://opennmt.net/CTranslate2/quantization.html): 16-bit floating points (FP16), 16-bit integers (INT16), and 8-bit integers (INT8). +* **Quantization and reduced precision**
The model serialization and computation support weights with [reduced precision](https://opennmt.net/CTranslate2/quantization.html): 16-bit floating points (FP16), 16-bit brain floating points (BF16), 16-bit integers (INT16), and 8-bit integers (INT8). * **Multiple CPU architectures support**
The project supports x86-64 and AArch64/ARM64 processors and integrates multiple backends that are optimized for these platforms: [Intel MKL](https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/onemkl.html), [oneDNN](https://github.com/oneapi-src/oneDNN), [OpenBLAS](https://www.openblas.net/), [Ruy](https://github.com/google/ruy), and [Apple Accelerate](https://developer.apple.com/documentation/accelerate). * **Automatic CPU detection and code dispatch**
One binary can include multiple backends (e.g. Intel MKL and oneDNN) and instruction set architectures (e.g. AVX, AVX2) that are automatically selected at runtime based on the CPU information. * **Parallel and asynchronous execution**
Multiple batches can be processed in parallel and asynchronously using multiple GPUs or CPU cores.