Completing vghsh.vs/vgmul.vs descriptions

nibrunieAtSi5 · Aug 14, 2023 · bc7f527 · bc7f527
1 parent a1bfcfc
commit bc7f527
Show file tree

Hide file tree

Showing 2 changed files with 48 additions and 26 deletions.
diff --git a/doc/vector/insns/vghsh.adoc b/doc/vector/insns/vghsh.adoc
@@ -6,7 +6,7 @@ Vector Add-Multiply over GHASH Galois-Field
 
 Mnemonic::
 vghsh.vv vd, vs2, vs1 +
-vghsh.vs vd, vs2, vs1
+vghsh.vs vd, rs2, vs1
 
 Encoding (Vector-Vector)::
 [wavedrom, , svg]
@@ -40,6 +40,7 @@ Encoding (Vector-Scalar)::
 
 Reserved Encodings::
 * `SEW` is any value other than 32
+* `vghsh.vs` encoding (except if `Zvkgb` is enabled)
 
 Arguments::
 
@@ -62,7 +63,15 @@ Arguments::
 Description::
 A single "iteration" of the GHASH~H~ algorithm is performed.
 
-This instruction treats all of the inputs and outputs as 128-bit polynomials and
+
+The previous partial hashes are read as 4-element groups from 'vd',
+the cipher texts are read as 4-element groups from `vs1`
+ and the hash subkeys are read from either the corresponding 4-element group
+in `vs2` (vector-vector form) or the scalar element group in `vs2`
+(vector-scalar form, `Zvkgb` only). The resulting partial hashes are writen as 4-element groups into `vd`.
+
+
+This instruction treats all of the input and output element groups as 128-bit polynomials and
 performs operations over GF[2].
 It produces the next partial hash (Y~i+1~) by adding the current partial
 hash (Y~i~) to the cipher text block (X~i~) and then multiplying (over GF(2^128^))
@@ -92,17 +101,11 @@ with the NIST specification. These reversals are inexpensive to implement as the
 swap bit positions and therefore do not require any logic.
 ====
 
-[NOTE]
-====
-Since the same hash subkey `H` will typically be used repeatedly on a given message,
-a future extension might define a vector-scalar version of this instruction where
-`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1.
-====
 
 Operation::
 [source,pseudocode]
 --
-function clause execute (VGHSH(vs2, vs1, vd)) = {
+function clause execute (VGHSH(vs2, vs1, vd, suffix)) = {
   // operands are input with bits reversed in each byte
   if(LMUL*VLEN < EGW)  then {
     handle_illegal();  // illegal instruction exception

diff --git a/doc/vector/insns/vgmul.adoc b/doc/vector/insns/vgmul.adoc
@@ -7,7 +7,7 @@ Vector Multiply over GHASH Galois-Field
 Mnemonic::
 vgmul.vv vd, vs2
 
-Encoding::
+Encoding (Vector-Vector)::
 [wavedrom, , svg]
 ....
 {reg:[
@@ -20,8 +20,25 @@ Encoding::
 {bits: 6, name: '101000'},
 ]}
 ....
+
+
+Encoding (Vector-Scalar)::
+[wavedrom, , svg]
+....
+{reg:[
+{bits: 7, name: 'OP-P'},
+{bits: 5, name: 'vd'},
+{bits: 3, name: 'OPMVV'},
+{bits: 5, name: '10001'},
+{bits: 5, name: 'vs2'},
+{bits: 1, name: '1'},
+{bits: 6, name: '101001'},
+]}
+....
+
 Reserved Encodings::
-* `SEW` is any value other than 32 
+* `SEW` is any value other than 32
+* `vgmul.vs` encoding (except if `Zvkgb` is enabled)
 
 Arguments::
 
@@ -40,9 +57,14 @@ Arguments::
 | Vd  | output | 128  | 4 | 32 | Product
 |===
 
-Description:: 
+Description::
 A GHASH~H~ multiply is performed.
 
+The multipliers are read as 4-element groups from 'vd',
+ the multiplicands subkeys are read from either the corresponding 4-element group
+in `vs2` (vector-vector form) or the scalar element group in `vs2`
+(vector-scalar form, `Zvkgb` only). The resulting products are written as 4-element groups into `vd`.
+
 This instruction treats all of the inputs and outputs as 128-bit polynomials and 
 performs operations over GF[2].
 It produces the product over GF(2^128^) of the two 128-bit inputs.
@@ -67,27 +89,23 @@ with the NIST specification. These reversals are inexpensive to implement as the
 swap bit positions and therefore do not require any logic.
 ====
 
-[NOTE]
-====
-Since the same multiplicand will typically be used repeatedly on a given message,
-a future extension might define a vector-scalar version of this instruction where
-`vs2` is the scalar element group. This would help reduce register pressure when `LMUL` > 1. 
-====
 
 [NOTE]
 ====
-This instruction is identical to `vghsh.vv` with vs1=0.
+The instruction `vgmul.vv` is identical to `vghsh.vv` with vs1=0.
 This instruction is often used in GHASH code. In some cases it is followed
 by an XOR to perform a multiply-add. Implementations may choose to fuse these
-two instructions to improve performance on GHASH code that 
-doesn't use the add-multiply form of the `vghsh.vv` instruction. 
+two instructions to improve performance on GHASH code that
+doesn't use the add-multiply form of the `vghsh.vv` instruction.
+
+Similarly, the instruction `vgmul.vs` is identical to `vghsh.vs` with vs1=0.
 ====
 
 
 Operation::
 [source,pseudocode]
 --
-function clause execute (VGMUL(vs2, vs1, vd)) = {
+function clause execute (VGMUL(vs2, vs1, vd, suffix)) = {
   // operands are input with bits reversed in each byte
   if(LMUL*VLEN < EGW)  then {
     handle_illegal();  // illegal instruction exception
@@ -96,10 +114,11 @@ function clause execute (VGMUL(vs2, vs1, vd)) = {
 
   eg_len = (vl/EGS)
   eg_start = (vstart/EGS)
-  
+
   foreach (i from eg_start to eg_len-1) {
+    let helem = if suffix == "vv" then i else 0;
     let Y = brev8(get_velem(vd,EGW=128,i));  // Multiplier
-    let H = brev8(get_velem(vs2,EGW=128,i)); // Multiplicand
+    let H = brev8(get_velem(vs2,EGW=128, helem)); // Multiplicand
     let Z : bits(128) = 0;
 
     for (int bit = 0; bit < 128; bit++) {
@@ -113,7 +132,7 @@ function clause execute (VGMUL(vs2, vs1, vd)) = {
     }
 
 
-    let result = brev8(Z); 
+    let result = brev8(Z);
     set_velem(vd, EGW=128, i, result);
   }
   RETIRE_SUCCESS
@@ -122,4 +141,4 @@ function clause execute (VGMUL(vs2, vs1, vd)) = {
 --
 
 Included in::
-<<zvkg>>, <<zvkng>>, <<zvksg>>
+<<zvkg>>, <<zvkgb>>, <<zvkng>>, <<zvksg>>