Zcb: Code Size Reduction Extension #3362

Abdulwadoodd · 2023-05-15T03:02:55Z

Type of change: other enhancement

Impact: API addition (no impact on existing code)

Development Phase: implementation

Background
RISC-V code size reduction extension supersedes the existing standard C (compressed) extension. It divides instructions in the C extension into three sub-extensions (Zca, Zcf, and Zcd) and exploits the reserved encodings in C to add more instructions in three sub-extensions (Zcb, Zcmp and Zcmt).
RocketCore already supports C-extension i.e. Zca, Zcf, and Zcd. The un-privilege spec classifies compressed instructions into three different parts: Quadrant 0, Quadrant 1, and Quadrant 2. This classification is done based on the Opcode of compressed instruction: bits[1:0].

Implementation
Zcb instructions fall into Quadrant 0 and Quadrant 1 having opcodes 2'b00 and 2'b01 respectively. Hence, Implementation is done in two parts:

Part 1:
In part 1, the Opcodes of the instructions belonging to Quadrant 0 are added, and the corresponding 16-bit->32-bit decode logic is hooked up with Q0 logic for the existing instructions. The instructions in this part are the compressed version of I-type load-store instructions: lbu, lh, lhu, sb, sh.

Part 2:
In part 2, the Opcodes of the instructions belonging to Quadrant 1 are added, and the corresponding 16-bit->32-bit decode logic is hooked up with Q1 logic for the existing instructions. The instructions in this part are the has dependency on I-type, M-type, and Bit-manip (Zba, Zbb) instructions: c.zext.b, c.sext.b, c.zext.h, c.sext.h, c.zext.w, c.mul, c.not.

Verification
A simple assembly program is written including all the Zcb instructions to test their functionality. It is available here. It is a bit crude way of testing in the early stage. A plan to develop more mature tests covering the corner cases is in the pipeline which will come as part of riscv-arch tests.

Release Notes
This is one of three PR as part of the GSoC'23 project under the mentorship Hongren Zheng, to add Zc* support in Rocket. This PR adds hardware implementation to support the Zcb extension which is one of the variants of the recently ratified code size reduction extension. RocketCore has the infrastructure for the uncompressed version of Zcb instructions. So, mainly there are changes implemented in RVCDecoder to decode the compressed Zcb instruction into its 32-bit equivalent instructions. The implementation works for both 32-bit and 64-bit architectures.

Acknowledgement
Thanks to @ZenithalHourlyRate for his guidance and help.

linux-foundation-easycla · 2023-05-15T03:02:59Z

The committers listed above are authorized under a signed CLA.

✅ login: Abdulwadoodd / name: Wadood (e160e98, b943cfb, face7de, 323d57d, 4a67839, 98064d1, 0ebed64, 05ee78a)

ZenithalHourlyRate

Generally LGTM. Have not checked the detail of the RTL implementation against the spec. Some other small changes are needed.

First, this should be made against the dev branch, it will then be backported the to master branch. @sequencer should we add a contribution guide in the PR template as there are multiple contributors that have fallen into this pitfall?

As for the misc changes, they should be introduced in another PR.

You should also add zcb to the dts. zca should also be included. In the future when you have implemented all Zc*, a zce should be added.

ZenithalHourlyRate · 2023-05-15T06:08:48Z

src/main/scala/rocket/RVC.scala

    def sd = inst(Cat(ldImm >> 5, rs2p, rs1p, 3.U(3.W), ldImm(4,0), 0x23.U(7.W)), rs2p, rs1p, rs2p)
    def sw = inst(Cat(lwImm >> 5, rs2p, rs1p, 2.U(3.W), lwImm(4,0), 0x23.U(7.W)), rs2p, rs1p, rs2p)
    def fsd = inst(Cat(ldImm >> 5, rs2p, rs1p, 3.U(3.W), ldImm(4,0), 0x27.U(7.W)), rs2p, rs1p, rs2p)
    def fsw = {
      if (xLen == 32) inst(Cat(lwImm >> 5, rs2p, rs1p, 2.U(3.W), lwImm(4,0), 0x27.U(7.W)), rs2p, rs1p, rs2p)
      else sd
    }
-    Seq(addi4spn, fld, lw, flw, unimp, fsd, sw, fsw)
+    Seq(addi4spn, fld, lw, flw, zcb_q0, fsd, sw, fsw)


Zcb should be populated with conditions

For example, c.zext.b should be added only when usingBitManip, and c.mul should be populated only when usingMulDiv, zext.w should be added only when RV64. This could save a tiny bit of area.

Other insts like c.lbu and c.not will always be added as I or E are always present. This default behavior will add area overhead for end users as they may not use it. I think we provide an option like usingCompressedBitManip to control this behavior. Asking @jerryz123 for advice.

A mistake. Should @jerryz123.

Parameters to the RVCDecode object should be added that disable the Zcb/Zcmp/Zcmt extensions if not needed.

Hi @jerryz123 , @ZenithalHourlyRate
In order to enable/disable Zcb, usingCompressedSuiteB parameter is added and routed (along with usingMulDiv and usingBitManip) to RVCDecoder to apply the condition on the corresponding instructions.

src/main/scala/rocket/Instructions.scala

src/main/scala/rocket/RVC.scala

- Zcb Instructions with `OP[1:0] = 2'b0` added. - These are Compressed version of following I-type instructins. - lbu - lhu - lh - sb - sh - These instructions lies in Quadrant 0 based on compressed Opcode. Signed-off-by: Abdul Wadood <[email protected]>

- Implement the 16 to 32 bit Decoder for `Zcb` Instructions having OP[1:0] == 00 - This decodes the following compressed instructions into its 32-bit equivalent instructions. c.lbu, c.lhu, c.lh, c.cb, c.sh Signed-off-by: Abdul Wadood <[email protected]>

- Zcb Instructions with `OP[1:0] = 2'b01` added. - These instructions lies in Q1 of Compressed Instructions. Signed-off-by: Abdul Wadood <[email protected]>

- Implement the 16 to 32 bit Decoder for `Zcb` Instructions having OP[1:0] = 2'b01 - The 16 bit -> 32 bit instruction mapping is given as follows: c.zext.b -> andi c.sext.b -> sext.b (require Zbb) c.zext.h -> zext.h (require Zbb) c.sext.h -> sext.h (require Zbb) c.zext.w -> add.uw (require Zba, only for RV64) c.not -> xori c.mul -> mul (Require M standard extension) Signed-off-by: Abdul Wadood <[email protected]>

Abdulwadoodd · 2023-05-21T17:53:04Z

Hi @jerryz123 @ZenithalHourlyRate

Comments are added in RVC.scala describing the purpose of added parameters in RVCDecoder.
zca, zcb, zcd, zcf are updated in DTS.
After rebase, updates the base branch from master to dev.

Please let me know if additional changes are required.

ZenithalHourlyRate

I'm happy with the non-circuit part now. Need more time to check the circuit part.

src/main/scala/rocket/RVC.scala

ZenithalHourlyRate · 2023-05-22T18:25:39Z

src/main/scala/rocket/RVC.scala

+          if(usingCompressedSuiteB){
+            def zextb = inst(Cat(0xff.U, rs1p, 7.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
+            def not   = inst(Cat(0xFFF.U, rs1p, 4.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
+            def zexth64 = inst(Cat(0x80.U, rs1p, 4.U(3.W), rs1p, 0x3B.U(7.W)), rs1p, rs1p, rs2p)


style: I believe this should defined in zexth?

I followed the existing convention as a selection between fs and sd based on the xlen.
Anyways, it is updated as you suggested

ZenithalHourlyRate · 2023-05-22T18:50:06Z

src/main/scala/tile/BaseTile.scala

+      Option.when(tileParams.core.useCompressed)(Seq("zca")) ++
+      Option.when(tileParams.core.useCompressedSuiteB)(Seq("zcb")) ++
+      Option.when(tileParams.core.useCompressed && tileParams.core.fpu.nonEmpty && tileParams.core.fpu.get.fLen > 32)(Seq("zcd")) ++
+      Option.when(tileParams.core.useCompressed && tileParams.core.fpu.nonEmpty)(Seq("zcf")) ++


Yes currently for rocket when fpu.nonEmpty F-ext will always be enabled

rocket-chip/src/main/scala/tile/FPU.scala

Lines 162 to 166 in 47f7b71

val insns = (minFLen, fLen) match {

case (32, 32) => f

case (16, 32) => h ++ f

case (32, 64) => f ++ d

case (16, 64) => h ++ f ++ d ++ fcvt_hd

But the consistency is not kept everywhere, e.g. misa has a different way

rocket-chip/src/main/scala/tile/BaseTile.scala

Line 103 in bd3276d

val f = if (tileParams.core.fpu.nonEmpty) "f" else ""

rocket-chip/src/main/scala/rocket/CSR.scala

Line 597 in bd3276d

(if (fLen >= 32) "F" else "") +

To reduce such inconsistency, I believe we should finish this TODO? Are you interested in this TODO? If so, things should be done in another PR.

rocket-chip/src/main/scala/tile/BaseTile.scala

Line 98 in bd3276d

// TODO merge with isaString in CSR.scala

src/main/scala/rocket/RVC.scala

ZenithalHourlyRate · 2023-05-22T18:59:53Z

src/main/scala/rocket/RVC.scala

      }
-      inst(Seq(srli, srai, andi, rtype)(x(11,10)), rs1p, rs1p, rs2p)
+      Seq(srli, srai, andi, rtype)(x(11,10))


Such change is costly. You now have a much wider Mux and the user will have a larger area even the user does not enable usingCompressedSuiteB

Is there any special reason for it? The x0 in zextw?

Is there any special reason for it? The x0 in zextw?

Yes, this is the reason.

the user will have a larger area even the user does not enable usingCompressedSuiteB

I'm a little skeptical about it, and I have following two reasons to back my argument:

As per my (vague) understanding of hardware (Verilog) generation using CHISEL, computation is done in software by keeping into account the parameters, and then the Verilog is generated including (only) that hardware part that was allowed by the parameter.

The above can be verified by observing the generated Verilog (by looking at the reference at the end of each line of generated Verilog that points to the corresponding scala file and its line number). For example, when I build Rocket with usingCompressedSuiteB = false, the generated Verilog simply excludes the corresponding definition of 16->32 instructions. But when I build using usingCompressedSuiteB = true, generated Verilog does include the references of the definition of those instructions which are defined under the if clause of this parameter.

So, this should not increase the area if usingCompressedSuiteB is disabled, unless I'm missing something.

Further details should be checked. There are two steps of selection, static selection like if/elses in scala and dynamic selection like x(11,10) in the circuit.

The DefaultConfig in dev branch

// the final mux assign io_out_bits = _GEN_19[_io_out_T_2]; assign io_out_rd = _GEN_20[_io_out_T_2]; assign io_out_rs1 = _GEN_21[_io_out_T_2]; assign io_out_rs2 = _GEN_22[_io_out_T_2]; assign io_out_rs3 = _GEN_23[_io_out_T_2]; // lets check rs1 wire [31:0][4:0] _GEN_21 = ... {io_out_s_12_rs1}, // this coresponds to arith in scala ... assign io_out_s_12_rs1 = {2'h1, io_in[9:7]};

DefaultConfig in this branch

// the final mux assign io_out_bits = _GEN_23[_io_out_T_2]; assign io_out_rd = _GEN_24[_io_out_T_2]; assign io_out_rs1 = _GEN_25[_io_out_T_2]; assign io_out_rs2 = _GEN_26[_io_out_T_2]; assign io_out_rs3 = _GEN_27[_io_out_T_2]; // lets check rs1 wire [31:0][4:0] _GEN_25 = ... {io_out_s_12_rd}, // this coresponds to arith in scala ... wire [4:0] io_out_s_12_rs1 = _GEN_8[io_in[11:10]]; // extra mux wire [3:0][4:0] _GEN_8 = {{(&_io_out_s_T_235) ? io_out_s_res_4_rs1 : io_out_s_res_5_rs1}, {io_out_s_res_3_rs1}, {io_out_s_res_2_rs1}, {io_out_s_res_1_rs1}}; assign io_out_s_res_4_rs1 = {2'h1, io_in[9:7]}; assign io_out_s_res_5_rs1 = {2'h1, io_in[9:7]}; assign io_out_s_res_3_rs1 = {2'h1, io_in[9:7]}; assign io_out_s_res_2_rs1 = {2'h1, io_in[9:7]}; assign io_out_s_res_1_rs1 = {2'h1, io_in[9:7]};

Chisel is faithful of generating what you write. It might be argued that the synthesizer will optimize this out, but I think avoiding this from source is a better option.

Thanks for the detailed explanation, I understand now. Pushed fixes in the latest commit.

ZenithalHourlyRate · 2023-05-22T19:00:59Z

src/main/scala/rocket/RVC.scala

+              }
+              else unimp
+            }
+            Seq(zextb, sextb, zexth, sexth, zextw, not)(x(4,2))


When usingCompressedSuiteB but not usingBitManip, this will become Mux between a lot of unimp. Is there better way to do it?

Actually, not all of these (zextb, sextb, zexth, sexth, zextw, not) become Mux. The problem is that only sextb, zexth, sexth, zextw depends upon the usingBitManip parameter and zextb and not are decoded in I-type instructions. So, if we keep the Seq statement constant in the end, then EITHER sextb, zexth, sexth, zextw needs to be decoded if usingBitManip is enabled OR each one of these must corresponds tounimp if usingBitManip is disabled. Hence, comes up these bad if/else statements.

What I meant is something like this

if(usingCompressedSuiteB){ def sexth = inst(Cat(0x605.U, rs1p, 1.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) } def zextw = inst(Cat(4.U, x0, rs1p, 0.U(3.W), rs1p, 0x3B.U(7.W)), rs1p, rs1p, x0) } def zexth = { val zexth_common = Cat(0x80.U, rs1p, 4.U(3.W), rs1p) if (xLen == 32) inst(Cat(zexth_common, 0x33.U(7.W)), rs1p, rs1p, rs2p) else inst(Cat(zexth_common, 0x3B.U(7.W)), rs1p, rs1p, rs2p) } } def zextb = inst(Cat(0xFF.U, rs1p, 7.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) def not = inst(Cat(0xFFF.U, rs1p, 4.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) def sextb = inst(Cat(0x604.U, rs1p, 1.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) if (usingBitManip) { Seq(zextb, sextb, zexth, sexth, zextw, not)(x(4,2)) } else { Mux(x(4,2) === 5.U, not, Mux(x(4,2) === 0.U, zextb, unimp)) } }

And like the comment in arith, try to avoid selecting between a lot of rs1p/rs2p

Updated the changes as suggested. Thanks

- Added `usingCompressedSuiteB` to enable/disable `Zcb` extension. - This parameter depends upon existing `usingCompressed` parameter. - By default `Zcb` is disable in RocketCore. - Passed this parameter along with `usingBitManip` and `usingMulDiv` to RVCDecoder enable/disable the decoding of corresponding `Zcb` instructions. Signed-off-by: Abdul Wadood <[email protected]>

Signed-off-by: Abdul Wadood <[email protected]>

ZenithalHourlyRate requested changes May 15, 2023

View reviewed changes

Abdulwadoodd force-pushed the Zcb_dev branch from dadae60 to bd09848 Compare May 17, 2023 22:14

Abdulwadoodd changed the base branch from master to dev May 17, 2023 22:14

Abdulwadoodd changed the base branch from dev to master May 17, 2023 22:15

jerryz123 reviewed May 18, 2023

View reviewed changes

src/main/scala/rocket/RVC.scala Show resolved Hide resolved

Abdulwadoodd force-pushed the Zcb_dev branch 2 times, most recently from 4fd9882 to 245c770 Compare May 21, 2023 03:11

Abdulwadoodd changed the base branch from master to dev May 21, 2023 03:11

Abdulwadoodd added 4 commits May 21, 2023 08:27

Added Zcb Opcodes for Quadrant 1

face7de

- Zcb Instructions with `OP[1:0] = 2'b01` added. - These instructions lies in Q1 of Compressed Instructions. Signed-off-by: Abdul Wadood <[email protected]>

Abdulwadoodd force-pushed the Zcb_dev branch from 245c770 to bb50f29 Compare May 21, 2023 03:29

Abdulwadoodd requested review from ZenithalHourlyRate and jerryz123 May 21, 2023 18:00

ZenithalHourlyRate reviewed May 22, 2023

View reviewed changes

Abdulwadoodd added 3 commits May 29, 2023 09:33

Added Decription for RVCDecoder Parameters

98064d1

Signed-off-by: Abdul Wadood <[email protected]>

Added zca and zcb in Device Tree Source

0ebed64

Signed-off-by: Abdul Wadood <[email protected]>

Abdulwadoodd force-pushed the Zcb_dev branch from 1a8f229 to 0ebed64 Compare May 29, 2023 05:03

Optimization: Update RVC.scala for zcb instructions

05ee78a

Signed-off-by: Abdul Wadood <[email protected]>

Abdulwadoodd force-pushed the Zcb_dev branch from c945483 to 05ee78a Compare July 5, 2023 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Zcb: Code Size Reduction Extension #3362

Zcb: Code Size Reduction Extension #3362

Abdulwadoodd commented May 15, 2023 •

edited

Loading

linux-foundation-easycla bot commented May 15, 2023 •

edited

Loading

ZenithalHourlyRate left a comment

ZenithalHourlyRate May 15, 2023 •

edited

Loading

ZenithalHourlyRate May 15, 2023

jerryz123 May 15, 2023

Abdulwadoodd May 17, 2023 •

edited

Loading

Abdulwadoodd commented May 21, 2023

ZenithalHourlyRate left a comment

ZenithalHourlyRate May 22, 2023

Abdulwadoodd May 29, 2023

ZenithalHourlyRate May 22, 2023

ZenithalHourlyRate May 22, 2023

Abdulwadoodd May 29, 2023

ZenithalHourlyRate Jun 1, 2023

Abdulwadoodd Jul 5, 2023

ZenithalHourlyRate May 22, 2023

Abdulwadoodd May 29, 2023 •

edited

Loading

ZenithalHourlyRate Jun 1, 2023

Abdulwadoodd Jul 5, 2023

	val insns = (minFLen, fLen) match {
	case (32, 32) => f
	case (16, 32) => h ++ f
	case (32, 64) => f ++ d
	case (16, 64) => h ++ f ++ d ++ fcvt_hd

Zcb: Code Size Reduction Extension #3362

Are you sure you want to change the base?

Zcb: Code Size Reduction Extension #3362

Conversation

Abdulwadoodd commented May 15, 2023 • edited Loading

linux-foundation-easycla bot commented May 15, 2023 • edited Loading

ZenithalHourlyRate left a comment

Choose a reason for hiding this comment

ZenithalHourlyRate May 15, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abdulwadoodd May 17, 2023 • edited Loading

Choose a reason for hiding this comment

Abdulwadoodd commented May 21, 2023

ZenithalHourlyRate left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abdulwadoodd May 29, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Abdulwadoodd commented May 15, 2023 •

edited

Loading

linux-foundation-easycla bot commented May 15, 2023 •

edited

Loading

ZenithalHourlyRate May 15, 2023 •

edited

Loading

Abdulwadoodd May 17, 2023 •

edited

Loading

Abdulwadoodd May 29, 2023 •

edited

Loading