A plan for developing the riscv architectural tests for the Scalar Crypto Extension.
The point of this test plan is to:
-
Explain what the RISC-V architectural tests try to achieve, both generally and for the scalar crypto instructions in particular.
-
List the kinds of coverage that the architectural tests try to meet, and to explain more and less important coverage cases for different kinds of instruction.
-
Act as a starting point for verification engineers writing verification plans. It describes real-world usage patterns of the instructions which constrained random stimulus generation flows can focus on.
Some useful links:
Some simple stimulus patterns described here and referred too later when talking about individual instructions.
-
single-bit-1
- Each source register input has a single bit set. Test for all bits0⇐i<XLEN
. Likewise, havesingle-bit-0
. These are sometimes also referred too as walking ones or walking zeros. -
uniform-random
- Each source register input is a uniform random number,XLEN
-bits long. -
byte-count
- Each source register input is divided into bytes, and each byte is incremented individually, starting at zero. Hence, for RV32, the first two input patterns would be0x03020100
and0x07060504
.
Note
|
Where some unknown number of test vectors will be needed to hit
coverage, this is usually left as N , which can be tuned later.
E.g. "generate N uniform random numbers…".
|
These are coverage points relevant for single instructions, and act as a bare minimum standard to hit for every instruction.
-
Have all values of
rd
,rs1
andrs2
been covered where applicable? -
Have we seen:
-
rd==rs1
,rd!=rs1
-
rd==rs2
,rd!=rs2
-
rs1==rs2
,rs1!=rs2
-
-
The immediates for all of the scalar crypto instructions are either
2
or4
bits, so we should aim for complete coverage of these.
-
Have we seen every bit set for each register input?
-
Have we seen every bit clear for each register input?
-
For instructions with an SBox (AES,SM4), do we have complete input coverage for each input to the SBox? For all instructions, this is just
0..255
for each input byte.
The RV32 instructions have been put into groups of instructions which are similar from a coverage and stimulus perspective.
aes32dsi rt, rs2, bs aes32dsmi rt, rs2, bs aes32esi rt, rs2, bs aes32esmi rt, rs2, bs sm4ed rt, rs2, bs sm4ks rt, rs2, bs
All of these instructions have the same basic input patterns, and apply
an SBox to a single byte of rs2
.
The bs
immediate is 2
bits, and is used to select a byte of rs2
for further processing.
Note
|
The aes32* and sm4* instructions read and write the rt register.
It can be thought of as both rs1 and rd .
|
-
Test pattern 1: SBox Testing
-
This uses the
byte-count
pattern described above. -
Generate a 256-byte sequence
0..255
and pack the sequence into 32-bit words. -
Each word in the sequence is the
rs2
input. Thers1
input is set to zero so we do not alter the SBox output value. -
For each input word, generate
4
instructions, withbs=0..3
. This will mean that every possible SBox input pattern is tested.
-
-
Test pattern 2: Uniform Random
-
Generate uniform random values for
rs1
,rs2
andbs
. -
Let register values be un-constrained:
0..31
. -
Repeat
N
times for each instruction until sufficient coverage is reached.
-
-
Test pattern 3: real-world patterns:
-
Execute
4
of each instruction adjacently. Each instruction has the samerd
andrs1
value, a differentrs2
and a differentbs
value. This mimics how the instructions will appear in real-world code, and tests things like pipeline forwarding.li a0, <random> li a1, <random> li a2, <random> li a3, <random> li a4, <random> aes32* a4, a0, 0 // This is the expected use-case sequence aes32* a4, a1, 1 // for these instructions. aes32* a4, a2, 2 aes32* a4, a3, 3
-
Note
|
These instructions are un-likely to ever appear interleaved with one another, so this pattern is left out for now. Forwarding between like-instructions is much more common. |
sha256sig0 rd, rs1 sha256sig1 rd, rs1 sha256sum0 rd, rs1 sha256sum1 rd, rs1 sm3p0 rd, rs1 sm3p1 rd, rs1
These instructions are all designed to accelerate hash functions, and
essentially perform rotations and/or shifts of rs1
by several different
constants, before xor’ing the results together.
-
Test pattern 1: Single bit testing
-
For each instruction, generate
XLEN
inputs with a single bit set. -
For each instruction, generate
XLEN
inputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
For each instruction, generate
N
XLEN
bit uniform random inputs.
-
-
Test pattern 3: Real-world usage.
-
Check forwarding result of
add
/xor
/not
/andn
/add
instruction into these instructions. -
Check forwarding result of these instructions into
add
/xor
/not
/andn
/add
instructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1
ofsw
instruction.
-
sha512sig0h rd, rs1, rs2 sha512sig0l rd, rs1, rs2 sha512sig1h rd, rs1, rs2 sha512sig1l rd, rs1, rs2 sha512sum0r rd, rs1, rs2 sha512sum1r rd, rs1, rs2
These instructions are similar to the SHA2-256 and SM3 instructions.
The rs1
and rs2
operands are shifted left/right by several constants,
then xor’d together.
Note
|
The plan for these instructions is identical to the one for SHA2-256 and SM3, but with an additional register input to cover. |
-
Test pattern 1: Single bit testing
-
For each instruction, generate
XLEN
inputs with a single bit set. Do this for eachrs1
andrs2
. -
For each instruction, generate
XLEN
inputs with a single bit clear. Do this for eachrs1
andrs2
.
-
-
Test pattern 2: Uniform random.
-
For each instruction, generate
N
XLEN
bit uniform random inputs forrs1
andrs2
.
-
-
Test pattern 3: Real-world usage.
-
Check forwarding result of
add
/xor
/not
/andn
/add
instruction into these instructions. -
Check forwarding result of these instructions into
add
/xor
/not
/andn
/add
instructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1
ofsw
instruction.
-
The RV64 instructions have been put into groups of instructions which are similar from a coverage and stimulus perspective.
aes64ds rd, rs1, rs2 aes64dsm rd, rs1, rs2 aes64es rd, rs1, rs2 aes64esm rd, rs1, rs2
-
Test pattern 1: SBox Testing
-
This uses the
byte-count
pattern described above. -
Generate a 256-byte sequence
0..255
and pack the sequence into 64-bit words. -
For each pair of 64-bit words
i
andj
, wherej=i+1
: -
Execute two of each instruction. One where
rs1=i, rs2=j
, and one wherers1=j
andrs2=i
. Store the results of each instruction to the signature.
-
-
Test pattern 2: Uniform Random Testing
-
For
rs1
andrs2
, generate uniform random values and store the results to the signature.
-
-
Test pattern 3: Real-world usage
-
Execute two adjacent instructions of the same type, with:
-
Different destination registers.
-
The first instruction has
rs1=x, rs2=y
, and the second instruction hasrs1=y, rs2=x
. -
This is the most common usage pattern for the instructions.
-
-
Forward the result of an
xor
instruction into the instructions and vice-versa.
-
aes64ks1i rd, rs1, rcon
This instruction applies the AES Forward SBox to the low 32-bits
of rs1
, with an optional rotation and xor depending on rcon.
rcon
is 4-bits wide, with only values 0⇐rcon⇐0xA
permitted.
-
Test pattern 1: SBox coverage
-
Uses the
byte-count
pattern described above. -
Generate
64
double-word inputs, such that the low4
bytes of each double-word completely cover the0..255
SBox input space. -
Execute one instruction per double-word input to get complete SBox input coverage.
-
The
rcon
immediate should be set to0xA
for this, to avoid it altering the SBox output value and make debugging easier.
-
-
Test pattern 2: Uniform Random testing
-
Generate random 64-bit values for
rs1
and random 4-bit values forrcon
, where0⇐rcon⇐0xA
. Record each result to the signature.
-
aes64ks2 rd, rs1, rs2
This instruction simply performs xor
operations between high and low
words of rs1
and rs2
to produce a result.
-
Test pattern 1: Single bit testing
-
Generate
XLEN
inputs with a single bit set. -
Generate
XLEN
inputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
Generate
N
XLEN
bit uniform random inputs.
-
sha256sig0 rd, rs1 sha256sig1 rd, rs1 sha256sum0 rd, rs1 sha256sum1 rd, rs1 sha512sig0 rd, rs1 (RV64 Only) sha512sig1 rd, rs1 (RV64 Only) sha512sum0 rd, rs1 (RV64 Only) sha512sum1 rd, rs1 (RV64 Only) sm3p0 rd, rs1 sm3p1 rd, rs1 aes64im rd, rs1 (RV64 Only)
The SHA256 and SM3 instructions listed here are very similar to the RV32 SHA and SM3 listed instructions, but with zero extended 32-bit outputs and they ignore the high 32-bits of their inputs.
The SHA512 instructions are similar to the SHA256 instructions, but work across the entire 64-bits of the input.
The aes64im
instruction implements the AES Inverse MixColumn transform
on each 32-bit word of rs1
.
-
Test pattern 1: Single bit testing
-
Generate
XLEN
inputs with a single bit set. -
Generate
XLEN
inputs with a single bit clear.
-
-
Test pattern 2: Uniform random.
-
Generate
N
XLEN
bit uniform random inputs.
-
-
Test pattern 3: Real-world usage - SHA and SM3
-
Check forwarding result of
add
/xor
/not
/andn
/add
instruction into these instructions. -
Check forwarding result of these instructions into
add
/xor
/not
/andn
/add
instructions. -
Check load-to-use hazard into these instructions.
-
Check forwarding of these instructions into
rs1
ofsw
instruction.
-
sm4ed rt, rs2, bs sm4ks rt, rs2, bs
Note
|
These instructions are identical to the RV32 versions, but are also available on RV64. On RV64, they ignore the high 32-bits of their register inputs, and zero extend the low 32-bits of their outputs. The same test plan may be used, accounting for the wider registers on RV64. |
Note
|
It is worth having a copy of the specification ready for this. |
The Entropy Source Extension consists of two machine-mode CSRs, and two pseudo-instructions to access them:
-
pollentropy rd
: An alias forcsrrs rd, mentropy, x0
. -
getnoise rd
: An alias forcsrrs rd, mentropy, x0
.
-
It must be possible to read and write
mnoise
in machine mode.-
If
mnoise
is not implemented, it must always return zeros. -
An implementation can check if
mnoise
is implemented if it can set and clear bit31
(NOISE_TEST
). This is the only architecturally defined bit. -
Tests must determine if
mnoise
is implemented first, before checking any other behaviour, and accommodate this case in the test signature.
-
-
Accesses to
mnoise
in any privilege mode other than machine mode must raise an Illegal Opcode Exception.
Note
|
It is possible that pre-tapeout or pre-validation, mnoise will
have different behaviour after post-silicon-validation. This is because
it is designed as a validation / certification interface to check that
the noise source is functioning correctly.
Once the noise source is validated, the interface may be disabled
permanently. Tests must account for this in their signature generation.
|
The following tests must be written specifically for the mentropy
CSR related behaviour.
-
This is a machine-mode, read-only CSR. Tests should check that it is accessible only in machine mode.
-
Per section 2.1 of the privileged architecture specification: any write to
mentropy
must raise an Illegal Instruction Exception. Tests must check this for all variants of CSR write instructions.
The following tests must be written to check for behaviour related to
values read from the mentropy
CSR.
-
If the returned
OPST
field is notES16
, then theSEED
field must be zero. A test may check this by readingpollentropy
many times, and setting a bit iffOPST!=ES16 && SEED!=0
is ever seen. Coverage bins should be used to check thatpollentropy
returned different values ofOPST
. -
On RV64, the upper 32-bits of the return value must be zero.
-
When
mnoise.NOISE_TEST=1
, thenpollentropy
must always return withOPST=BIST
.
-
The
wfi
instruction must be implemented, and not raise an Illegal Opcode Exception unless themstatus.TW
bit is set. Thewfi
instruction may be implemented as anop
. It is sufficient to check thatwfi
executes without raising an Illegal Opcode Exception whenmstatus.TW=0
using something like a contrived timer interrupt.
The scalar crypto ISE places additional constraints on instructions which are present in the base ISA, or Bitmanip standard extension.
mul rd, rs1, rs2 mulh rd, rs1, rs2 mulhu rd, rs1, rs2 mulhsu rd, rs1, rs2 mulw rd, rs1, rs2 clmul rd, rs1, rs2 clmulh rd, rs1, rs2
Per section 3.6 of the scalar crypto extension draft specification,
all of these instructions must execute in constant time with respect to their
inputs when rs1 ⇐ rs2
.
If they are not, they create a (remotely) exploitable timing channel and are insecure from a cryptographic perspective. Common micro-architectural performance optimisations for these instructions include early termination and macro-op fusion.
Note
|
Do we also need to consider operand memoisation for multiplication? Yes: It does introduce a timing channel. No: That timing channel is very hard to exploit. |
-
Test pattern 1: Leading Ones
-
For each
rs
register input, generate a randomXLEN
input value, and set the most-significanti
bits. See the otherrs
input, pick a random value. -
Repeat for values
0⇐i⇐XLEN
. Thei
value can be stepped by a value greater than1
to manage the test size.
-
-
Test pattern 2: Leading Zeros.
-
Repeat test pattern 1, but clear the top
i
bits instead.
-
-
Test pattern 3: Trailing Zeros
-
Repeat test pattern 1, but clear the least-significant
i
bits instead.
-
-
Test pattern 4: Trailing Ones
-
Repeat test pattern 1, but set the least-significant
i
bits instead.
-
After executing each test input, the time rdcycle
instruction is
used to record the amount of time taken to execute the relevant multiply
instruction.
Each execution time is recorded and compared to the previous
measurement.
If the two are not identical, a fail code is recorded to the
test signature, along with the inputs which caused the failure.
It may be more accurate to run several multiplication instructions in
sequence, so as to amortise any overhead introduced by rdcycle
.
Caution
|
Will this give consistent results on modern micro-architectures?
Can we expect rdcycle ordering with respect to the multiplies to
be respected?
Chapter 10 of the user-level ISA spec has a long discussion on how
defining a cycle is hard, and offers no guarantees of portability.
Hence, it becomes much easier to identify when multiplication is not
constant time (and so insecure), but very hard to portably show that
multiplication is constant time.
We do not want to artificially limit the range of possible implementations
due to un-necessesarily restrictive compliance tests.
|
As well as individual instructions, recommended fusion pairs must also be tested. These are:
mulhu ra, rs1, rs2 // ra != rs1, rs2 mul rb, rs1, rs2 // rb != ra, rs1, rs2
and
clmulh ra, rs1, rs2 // ra != rs1, rs2 clmul rb, rs1, rs2 // rb != ra, rs1, rs2
The same set of test patterns can be used, treating rs1
,rs2
as a
single 2*XLEN
input.