Skip to content

Commit

Permalink
Integrating review feedback from the ARC
Browse files Browse the repository at this point in the history
  • Loading branch information
nibrunie committed Sep 20, 2024
1 parent c4f3549 commit 9b12009
Showing 1 changed file with 15 additions and 3 deletions.
18 changes: 15 additions & 3 deletions src/vector-crypto-additional.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -53,10 +53,17 @@ and hashing (e.g., Elliptic curve cryptography, GHASH, CRC).
These instructions are only defined for `SEW`=32.
Zvbc32e can be supported when `ELEN >=32`.

This extension covers two gaps of `Zvbc`:

- allowing vector implementation with smaller `ELEN=32` (e.g. implementations selecting `Zve32*`) to implement some support for vector carry-less multiplication (this is not allowed by `Zvbc` which required `ELEN >= 64`)
- for implementations which have `ELEN >= 64`: allowing more efficient implementations of algorithm relying on 32-bit carry less multiplication. The list of such algorithms includes the folding algorithm used to compute the widespread 32-bit CRCs (e.g. ethernet CRC) This technique can already be implemented with `Zvbc` but only half the 64-bit multiplication is exploited.


Note:: The extension `Zvbc32e` is independent from `Zvbc` which defines the same instructions for `SEW=64`.
When `ELEN>=64` both extensions can be combined to have `vclmul.v[vx]` and `vclmulh.v[vx]` defined for both `SEW=32` and `SEW=64`.

Note:: The extra cost of supporting `Zvbc32e` on top of `Zvbc` should be minimal, as the hardware required to implement the instructions in `Zvbc32e` is a subset of the hardware required to implement `Zvbc`'s instructions.

[%autowidth]
[%header,cols="^2,4"]
|===
Expand Down Expand Up @@ -90,6 +97,11 @@ The number of element groups to be processed is `vl`/`EGS`.
therefore must be a multiple of `EGS=4`. +
Likewise, `vstart` must be a multiple of `EGS=4`.

One of the key use cases for the vector instructions `vghsh.vv` and `vgmul.vv` is to speed-up GCM cipher mode for a single stream by computing the GHASH algorithm for multiple blocks of the same message in parallel.
This factorization multiplies multiple blocks of the message by the same power of H (encryption of `0` by the cipher key). The power being equal to the number of blocks processed in parallel.
With `Zvkg` only, a full vector register was required to old the multiple copies of the power of H. `Zvkgs` reduces this requirement: a smaller vector register group able to contain at least a 128-bit wide element group is required freeing some vector registers.
This exploits the same scalar element group mechanism as other instructions defined in the vector crypto extensions (e.g. `vaesm.vs` from **Zvkned**).

[%autowidth]
[%header,cols="^2,4,4,4"]
|===
Expand Down Expand Up @@ -334,7 +346,7 @@ Encoding (Vector-Scalar)::
[wavedrom, , svg]
....
{reg:[
{bits: 7, name: 'OP-P'},
{bits: 7, name: 'OP-VE'},
{bits: 5, name: 'vd'},
{bits: 3, name: 'OPMVV'},
{bits: 5, name: 'vs1'},
Expand Down Expand Up @@ -473,7 +485,7 @@ Encoding (Vector-Scalar)::
[wavedrom, , svg]
....
{reg:[
{bits: 7, name: 'OP-P'},
{bits: 7, name: 'OP-VE'},
{bits: 5, name: 'vd'},
{bits: 3, name: 'OPMVV'},
{bits: 5, name: '10001'},
Expand Down Expand Up @@ -601,7 +613,7 @@ Included in::
[[crypto_vector_instructions_Zvkgs]]
==== Additional Vector Cryptographic Instructions

OP-P (0x77)
OP-VE (0x77)
Vector Crypto instructions, including `Zvkgs`, except `Zvbb` and `Zvbc`.
The new/modified encodings are in bold.

Expand Down

0 comments on commit 9b12009

Please sign in to comment.