[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs #362

nibrunieAtSi5 · 2023-08-24T13:41:11Z

/!\ This pull request has been moved to the main riscv-isa-manual repository: riscv/riscv-isa-manual#1306

This pull requests draft the changes associated with two fast track extensions for vector crypto.

During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.

The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing

New features:

Zvbc32e: Extending vclmul[h].v[vh] instruction to support SEW=32-bit value
- should be available standalone (ELEN >= 32) or in addition to Zvbc (ELEN >= 64)
- no new encoding
Zvkgs: Adding .vs variants to vghsh and vghmul
- should depend on Zvkgs
- new encodings

Open questions:

Should Zvbc32e be allowed when ELEN >= 32 without depending on Zvbc ? (Answer: YES)
Should Zvbc32e support SEW=16 ? (SEW=8 ?)
Find encodings
~~How to name the two new extensions~~
Do we need to define a Zvkt(bc/bc32e) to extend Zvkt to the extension of vclmul[h/] defined in Zvbc32e ?

Related changes:

spike-isa-sim modifications
- add support for Zvbc32e: [vector-crypto] add support for zvbc32e nibrunieAtSi5/riscv-isa-sim#3
encoding proposal in riscv-opcodes repo: Vector crypto fast track nibrunieAtSi5/riscv-opcodes#1
adapt vector crypto code samples for Zvbce32 https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvbc-test.c
adapt vector crypto code samples for Zvkgs https://github.com/riscv/riscv-crypto/blob/main/doc/vector/code-samples/zvkg-test.c
Sail models
- Zvkgs
- Zvbc32e

Draft versions:

Version	pdf
v0.0.1 (August 31st 2023)	https://github.com/riscv/riscv-crypto/files/12487628/riscv-crypto-spec-vector-extra.pdf
v0.0.2 (January 17th 2024)	https://github.com/riscv/riscv-crypto/files/13970691/riscv-crypto-spec-vector-extra.pdf
v0.0.3 (February 1st 2024)	https://github.com/riscv/riscv-crypto/files/14146438/riscv-crypto-spec-vector-extra_v0.0.3.pdf
v0.0.4 (February 6th 2024)	riscv-crypto-spec-vector-extra_v0.0.4.pdf
v0.0.5 (March 7th 2024)	riscv-crypto-spec-vector-extra_v0.0.5.pdf

|

Original Plan for the fast track schedule

References

project announcement on the unpriv mailing list (https://lists.riscv.org/g/tech-unprivileged/message/568) and the crypto TG mailing list (https://lists.riscv.org/g/tech-crypto-ext/message/944)

kdockser · 2023-08-25T19:15:19Z

Hi Nicolas.
Are you intending to merge these new instructions into the existing Vector Crypto specification? That cannot be done as Vector Crypto is frozen. You will need to create a different specification for these new extensions.

nibrunieAtSi5 · 2023-08-25T19:40:36Z

I was wondering what was the best way to eventually specify the fast track, I think @jjscheel mentioned that some fast tracks were specified as patch / diff over existing specifications.
At this point, this patch should not be seen as something that I intend to merge as is but as a basis for the proposal (describing its content using a diff with the frozen spec under ratification) I will adapt it to a proper patch (on a different directory certainly) if the crypto group agrees with the proposal submission, the committee authorizes the presentation to the ARC and the ARC gives its approval to move forward.

kdockser · 2023-08-25T21:30:22Z

My suggestion (which could very well be a bad idea) would be to create a new directory under /riscv-crypto/doc for this extension. That would match what we have done for scalar, vector, and vector-all rounds.

The Zve32 extension should be able to stand on its own. Thus, for example, it could be implemented to just add 32-bit vclmul* vector instructions without the need for any of the Vector Crypto Extensions. Or it could implemented to augment Zvbc. It might only find use in Zve32 implementations, or it might be used to add 32-bit clmul to, for example, V implementations.

The Zvkgs extensions, on the other hand, only make sense as an optional augmentation of the Zvkg extension. One option would be to define this as standalone extension that could optionally be combined with Zvkg as yet another extension. Or, perhaps Zvkgs is the superset. In such a case, I would think that it would reference the Zvkg specification and then define the new variants. I don't think it would be wise to have this extension redefine the existing instructions, even if the exact same words are used - it is best to define each instruction only once.

I hope that helps.

nibrunieAtSi5 · 2023-08-26T05:45:56Z

Thank you @kdockser that helps a lot. I will perform the changes before the next TG meeting.

This reverts commit bc7f527.

This reverts commit a1bfcfc.

This reverts commit 4a59d4b.

…ra/insns/

doc/vector-extra/insns/vclmulh-32e.adoc

Signed-off-by: Nicolas Brunie <[email protected]>

doc/vector-extra/riscv-crypto-spec-vector-extra.adoc

Signed-off-by: Nicolas Brunie <[email protected]>

nibrunie · 2023-08-31T15:24:25Z

Draft generated August 31st 2023 (version 0.0.1):
riscv-crypto-spec-vector-extra.pdf

lianakoleva

Great work! Definitely the next logical step, especially as explicitly outlined as a future direction in the ratified v1.0 spec. Just in case you missed it, I think it would be good to change 128-bit to 64-bit in vclmul.adoc and vclmulh.adoc.

nibrunie · 2024-01-18T00:55:00Z

Draft generated Jan 17th 2024 (thank you to @lianakoleva for spotting typos)
riscv-crypto-spec-vector-extra.pdf

nibrunie · 2024-02-02T23:24:00Z

Draft generated Feb 1st 2024 (more cleanups and self review): riscv-crypto-spec-vector-extra_v0.0.3.pdf

ebiggers · 2024-02-02T23:48:00Z

doc/vector-extra/riscv-crypto-vector-extra-introduction.adoc

+This document describes the proposed _vector_ _extra_ cryptography
+extensions for RISC-V.
+Those extensions extend the _vector_ cryptography extensions for RISC-V,
+providing extra features not mandatory for a high performace implementation but which


What does "not mandatory for a high performance implementation" mean? RISC-V extensions do not specify whether they are mandatory or not. That is the job of separate documents.

Note: performace is misspelled.

This was formulated to distinguish the extra vector crypto extension from the vector crypto extension itself.
This set of instructions is considered to be an addition to the current vector crypto spec
The vector crypto spec is the main extension implementers and users may want to adopt to get good performance for the supported cryptographic primitives.
These new extensions are additional improvements to the base vector crypto extensions.

I will rephrase this and fix the typo.

ebiggers · 2024-02-02T23:51:19Z

doc/vector-extra/riscv-crypto-vector-extra-introduction.adoc

+extensions for RISC-V.
+Those extensions extend the _vector_ cryptography extensions for RISC-V,
+providing extra features not mandatory for a high performace implementation but which
+can help further improve the efficiency of the algorithms that use them.


What is the "them" in "algorithms that use them"? The new extensions or the existing ones?

it was suppose to be "extra features" so "new extensions".

Well, without the extensions there are no algorithms that use them, so that phrasing is a bit strange. Maybe list some of the specific algorithms that are intended to be improved?

ebiggers · 2024-02-02T23:53:05Z

doc/vector-extra/riscv-crypto-vector-extra-zvkgs.adoc

+[[zvkgs,Zvkgs]]
+=== `Zvkgs` - Vector-Scalar GCM/GMAC
+
+`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.


", it" => ". It "

Suggested change

`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.

`Zvkgs` depends on `Zvkg`. It extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.

ebiggers · 2024-02-02T23:57:32Z

doc/vector-extra/riscv-crypto-spec-vector-extra.adoc

+[colophon]
+= Colophon
+
+This document describes the Vector Cryptography Extra extensions to the


Is there a better name for this? What name will be used for the next set of "extra" extensions? What is the difference between an extra extension and a non-extra extension?

Zfa is using "additional" which may be better here (https://github.com/riscv/riscv-isa-manual/blob/main/src/zfa.adoc).

That document is titled ""Zfa" Standard Extension for Additional Floating-Point Instructions, Version 1.0". That's less ambiguous because it includes the actual extension code Zfa, which won't be reused by any future extension. This document just refers to "Vector Cryptography Extra extensions" as if there will never be any more. Maybe include the actual extension codes in the title?

Good idea, I will update the title to manage room for future "extra"/additional extensions.

ebiggers · 2024-02-03T00:00:31Z

doc/vector-extra/riscv-crypto-spec-vector-extra.adoc

+It is important to note that the Vector Crypto instructions are independent of the
+implementation of the `Zkt` extension and do not require that `Zkt` is implemented.
+
+//This specification includes a <<Zvkt>> extension that, when implemented, requires certain vector instructions


Why doesn't Zvkt require that these new instructions be constant-time?

Zvkt will require that these new extensions be constant time, I need to clarify this.
More precisely, Similarly to Zvkg which mandates

To help avoid side-channel timing attacks, these instructions shall be implemented with data-independent timing.

Zvkgs should inherit the same mandate and require constant-time even when Zvkt is not implemented.

Constant time for Zvbc32e will depend on Zvkt.

ebiggers · 2024-02-03T00:02:18Z

doc/vector-extra/riscv-crypto-vector-extra-zvbc32e.adoc

@@ -0,0 +1,23 @@
+[[zvbc32e,Zvbc32e]]
+=== `Zvbc32e` - Vector Carryless Multiplication


Maybe "32-bit Vector Carryless Multiplication"?

Also, are there specific use cases in mind where this feature would be useful? #309 mentioned CRC-32, but I'm not sure why that would be true. CRC-32 implementations that use "folding" with carryless multiplication multiply segments of the data by a 32-bit multiplicand, but those segments of data can be any length. Other CPUs (and the currently ratified Zvbc extension) provide 64-bit carryless multiplication, so CRC-32 implementations typically multiply 64 bits of data by a 32-bit multiplier to get a 95-bit result. You could multiply 32 bits of data by a 32-bit multiplier to get a 63-bit result, but that would result in twice as many multiplications being needed to compute the CRC-32, as each one would process half as much data. That probably wouldn't be faster.

A similar observation applies to e.g. CRC-16 where the multiplications are typically 64-bit by 16-bit.

BTW, if adding 32-bit, why not also add 16-bit and 8-bit? What is special about 32?

Those are very good points.
There are multiple considerations at play here.

32-bit CLM allows vector implementation with smaller ELEN=32 to implement it (which was not possible for Zvbc)

64-bit or 32-bit CLM for folding CRC does not make a performance difference as far as I can tell but should make a power difference. It is true that in folding CRC the data segment block can be of arbitrary length but the constant multiplicand is of the size of the CRC so in the case of a CRC-32 at least half the 64-bit CLM would be unused (which might not be an issue on all micro-arch, but 32-bit CRC should allow easier determination of expected multiplier activity).

And finally there is nothing special about 32-bit. It was initially introduced because of the ELEN=32 case, but we listed in the questions above

Should Zvbc32e support SEW=16 ? (SEW=8 ?)

I think there is a good case for supporting other SEW:

no need encoding required

should allow other use cases

If we want to go that route the question is:

Should we split different SEW support into different sub-extensions (Zvbc16e, Zvbc8e) so people can pick and chose ? I think this might not be such a good idea because I don't expect supporting smaller element width to have a big cost (although I did not measure it yet) and the added complexity of numerous extensions does not seem worth it.

(One of the option which was discarded for the fast track was suggesting a widening CLM to merge vclmul and vclmulh together, but it seems too big of a change to fit into a fast track and the potential impact of having to switch SEW back and forth was not yet properly evaluated on the target use cases).

ebiggers · 2024-02-03T00:03:00Z

doc/vector-extra/riscv-crypto-vector-extra-zvkgs.adoc

+
+`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.
+
+Instructions to enable the efficient implementation of parallel versions of GHASH~H~ which is used in Galois/Counter Mode (GCM) and


This is not a complete sentence.

More importantly, it's not clear to me how these vs instructions for GHASH are helpful, because parallelized GHASH requires multiplying by powers of the key (like H, H^2, H^3, H^4, ...). Does that not preclude the use of these vs instructions?

There are multiple different implementations of parallel GHASH.
One implementation on 4 block in paralell uses a vector of 4 constants H^4 in the loop body which could leverage the .vs. It still needs a reduction with H, H^2, H^3 in the loop epilog.

I rephrased things a bit (in the upcoming v0.0.4). The introduction sentence is re-used from https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkg.adoc.

I need to work on an example of use of these instructions.

Signed-off-by: Nicolas Brunie <[email protected]>

nibrunie · 2024-02-07T04:29:08Z

Draft generated Feb 6th 2024:
riscv-crypto-spec-vector-extra_v0.0.4.pdf

Changelog:

Integrating @ebiggers' feedback

nibrunieAtSi5 · 2024-03-07T23:21:39Z

I did some experiments around a multi-width vector carry-less multiply unit.
This unit has a 128-bit datapath and support high or low operations.
I tested several variants, with one or more element widths supported. The unit provides 128 / SEW results (e.g. 2 results for 64-bit vclmul or vclmulh, 4 results for 32-bit, ...).

Synthesis were all done at the same frequency / corner / tech node.
The max number of logic levels is 17 for the 8/16/32/64 variant.

Here are the normalized results:

adding 32-bit support costs about 14% extra area for the vector carry-less multiplier (combinational area increase)
adding 16 and 32-bit support costs about 25% extra area (+9 points from adding 32-bit only)
adding 8, 16 and 32-bit support costs about 32% extra area

nibrunie · 2024-03-08T04:27:51Z

Draft generated March 7th 2024 (v0.0.5)
riscv-crypto-spec-vector-extra_v0.0.5.pdf

Changelog:

Numerous typo fixes and corrections
Adding RVV encoding array (with Zvbc32e in bold)

wmat · 2024-03-22T17:11:22Z

Note that this PR will need to be applied against the integrated chapter in the riscv-isa-manual repository if it is still relevant. This repository will be made read only and archived.

nibrunieAtSi5 · 2024-03-22T17:12:47Z

Note that this PR will need to be applied against the integrated chapter in the riscv-isa-manual repository if it is still relevant. This repository will be made read only and archived.

Ok, I will transition to a PR on the main repo.

nibrunieAtSi5 · 2024-03-28T14:38:38Z

Moving to the main riscv-isa-manual repository: riscv/riscv-isa-manual#1306

nibrunieAtSi5 added 3 commits August 14, 2023 02:39

[Zv fast track] prototyping vclmul* changes

4a59d4b

[Zv fast track] prototyping vg* changes

a1bfcfc

Completing vghsh.vs/vgmul.vs descriptions

bc7f527

nibrunieAtSi5 and others added 12 commits August 27, 2023 10:35

adding directory with vector-crypto extra skeleton

6b8eadb

Revert "Completing vghsh.vs/vgmul.vs descriptions"

c10f745

This reverts commit bc7f527.

Revert "[Zv fast track] prototyping vg* changes"

4e70f70

This reverts commit a1bfcfc.

Revert "[Zv fast track] prototyping vclmul* changes"

b0af277

This reverts commit 4a59d4b.

refactoring Zvkgs and vghsh.vs specifications

72084cd

fixing vghsh.vs/vgmul.vs descriptions

8c5a9f2

adding vclmul/vclmulh instruction specification for Zve32e

056dd04

moving vghsh.vs/vgmul.vs spec from doc/vector-extra to doc/vector-ext…

11bd8af

…ra/insns/

adding instruction table

5e836da

main document for vector extra

c986a6f

renaming vclmul/vclmulh 32e spec files

4ae2021

fixing vector-extra build issues

0083833