-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Zcb: Code Size Reduction Extension #3362
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM. Have not checked the detail of the RTL implementation against the spec. Some other small changes are needed.
First, this should be made against the dev branch, it will then be backported the to master branch. @sequencer should we add a contribution guide in the PR template as there are multiple contributors that have fallen into this pitfall?
As for the misc changes, they should be introduced in another PR.
You should also add zcb
to the dts. zca
should also be included. In the future when you have implemented all Zc*
, a zce
should be added.
def sd = inst(Cat(ldImm >> 5, rs2p, rs1p, 3.U(3.W), ldImm(4,0), 0x23.U(7.W)), rs2p, rs1p, rs2p) | ||
def sw = inst(Cat(lwImm >> 5, rs2p, rs1p, 2.U(3.W), lwImm(4,0), 0x23.U(7.W)), rs2p, rs1p, rs2p) | ||
def fsd = inst(Cat(ldImm >> 5, rs2p, rs1p, 3.U(3.W), ldImm(4,0), 0x27.U(7.W)), rs2p, rs1p, rs2p) | ||
def fsw = { | ||
if (xLen == 32) inst(Cat(lwImm >> 5, rs2p, rs1p, 2.U(3.W), lwImm(4,0), 0x27.U(7.W)), rs2p, rs1p, rs2p) | ||
else sd | ||
} | ||
Seq(addi4spn, fld, lw, flw, unimp, fsd, sw, fsw) | ||
Seq(addi4spn, fld, lw, flw, zcb_q0, fsd, sw, fsw) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Zcb should be populated with conditions
For example, c.zext.b
should be added only when usingBitManip
, and c.mul
should be populated only when usingMulDiv
, zext.w
should be added only when RV64. This could save a tiny bit of area.
Other insts like c.lbu
and c.not
will always be added as I
or E
are always present. This default behavior will add area overhead for end users as they may not use it. I think we provide an option like usingCompressedBitManip
to control this behavior. Asking @jerryz123 for advice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A mistake. Should @jerryz123.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parameters to the RVCDecode object should be added that disable the Zcb/Zcmp/Zcmt extensions if not needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jerryz123 , @ZenithalHourlyRate
In order to enable/disable Zcb, usingCompressedSuiteB
parameter is added and routed (along with usingMulDiv
and usingBitManip
) to RVCDecoder to apply the condition on the corresponding instructions.
4fd9882
to
245c770
Compare
- Zcb Instructions with `OP[1:0] = 2'b0` added. - These are Compressed version of following I-type instructins. - lbu - lhu - lh - sb - sh - These instructions lies in Quadrant 0 based on compressed Opcode. Signed-off-by: Abdul Wadood <[email protected]>
- Implement the 16 to 32 bit Decoder for `Zcb` Instructions having OP[1:0] == 00 - This decodes the following compressed instructions into its 32-bit equivalent instructions. c.lbu, c.lhu, c.lh, c.cb, c.sh Signed-off-by: Abdul Wadood <[email protected]>
- Zcb Instructions with `OP[1:0] = 2'b01` added. - These instructions lies in Q1 of Compressed Instructions. Signed-off-by: Abdul Wadood <[email protected]>
- Implement the 16 to 32 bit Decoder for `Zcb` Instructions having OP[1:0] = 2'b01 - The 16 bit -> 32 bit instruction mapping is given as follows: c.zext.b -> andi c.sext.b -> sext.b (require Zbb) c.zext.h -> zext.h (require Zbb) c.sext.h -> sext.h (require Zbb) c.zext.w -> add.uw (require Zba, only for RV64) c.not -> xori c.mul -> mul (Require M standard extension) Signed-off-by: Abdul Wadood <[email protected]>
Hi @jerryz123 @ZenithalHourlyRate
Please let me know if additional changes are required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the non-circuit part now. Need more time to check the circuit part.
src/main/scala/rocket/RVC.scala
Outdated
if(usingCompressedSuiteB){ | ||
def zextb = inst(Cat(0xff.U, rs1p, 7.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) | ||
def not = inst(Cat(0xFFF.U, rs1p, 4.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p) | ||
def zexth64 = inst(Cat(0x80.U, rs1p, 4.U(3.W), rs1p, 0x3B.U(7.W)), rs1p, rs1p, rs2p) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
style: I believe this should defined in zexth?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the existing convention as a selection between fs
and sd
based on the xlen
.
Anyways, it is updated as you suggested
Option.when(tileParams.core.useCompressed)(Seq("zca")) ++ | ||
Option.when(tileParams.core.useCompressedSuiteB)(Seq("zcb")) ++ | ||
Option.when(tileParams.core.useCompressed && tileParams.core.fpu.nonEmpty && tileParams.core.fpu.get.fLen > 32)(Seq("zcd")) ++ | ||
Option.when(tileParams.core.useCompressed && tileParams.core.fpu.nonEmpty)(Seq("zcf")) ++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes currently for rocket when fpu.nonEmpty F-ext will always be enabled
rocket-chip/src/main/scala/tile/FPU.scala
Lines 162 to 166 in 47f7b71
val insns = (minFLen, fLen) match { | |
case (32, 32) => f | |
case (16, 32) => h ++ f | |
case (32, 64) => f ++ d | |
case (16, 64) => h ++ f ++ d ++ fcvt_hd |
But the consistency is not kept everywhere, e.g. misa has a different way
val f = if (tileParams.core.fpu.nonEmpty) "f" else "" |
rocket-chip/src/main/scala/rocket/CSR.scala
Line 597 in bd3276d
(if (fLen >= 32) "F" else "") + |
To reduce such inconsistency, I believe we should finish this TODO? Are you interested in this TODO? If so, things should be done in another PR.
// TODO merge with isaString in CSR.scala |
src/main/scala/rocket/RVC.scala
Outdated
} | ||
inst(Seq(srli, srai, andi, rtype)(x(11,10)), rs1p, rs1p, rs2p) | ||
Seq(srli, srai, andi, rtype)(x(11,10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Such change is costly. You now have a much wider Mux and the user will have a larger area even the user does not enable usingCompressedSuiteB
Is there any special reason for it? The x0
in zextw?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any special reason for it? The x0 in zextw?
Yes, this is the reason.
the user will have a larger area even the user does not enable usingCompressedSuiteB
I'm a little skeptical about it, and I have following two reasons to back my argument:
- As per my (vague) understanding of hardware (Verilog) generation using CHISEL, computation is done in software by keeping into account the parameters, and then the Verilog is generated including (only) that hardware part that was allowed by the parameter.
- The above can be verified by observing the generated Verilog (by looking at the reference at the end of each line of generated Verilog that points to the corresponding scala file and its line number). For example, when I build Rocket with
usingCompressedSuiteB = false
, the generated Verilog simply excludes the corresponding definition of 16->32 instructions. But when I build usingusingCompressedSuiteB = true
, generated Verilog does include the references of the definition of those instructions which are defined under the if clause of this parameter.
So, this should not increase the area if usingCompressedSuiteB
is disabled, unless I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further details should be checked. There are two steps of selection, static selection like if/elses in scala and dynamic selection like x(11,10)
in the circuit.
The DefaultConfig in dev branch
// the final mux
assign io_out_bits = _GEN_19[_io_out_T_2];
assign io_out_rd = _GEN_20[_io_out_T_2];
assign io_out_rs1 = _GEN_21[_io_out_T_2];
assign io_out_rs2 = _GEN_22[_io_out_T_2];
assign io_out_rs3 = _GEN_23[_io_out_T_2];
// lets check rs1
wire [31:0][4:0] _GEN_21 =
...
{io_out_s_12_rs1}, // this coresponds to arith in scala
...
assign io_out_s_12_rs1 = {2'h1, io_in[9:7]};
DefaultConfig in this branch
// the final mux
assign io_out_bits = _GEN_23[_io_out_T_2];
assign io_out_rd = _GEN_24[_io_out_T_2];
assign io_out_rs1 = _GEN_25[_io_out_T_2];
assign io_out_rs2 = _GEN_26[_io_out_T_2];
assign io_out_rs3 = _GEN_27[_io_out_T_2];
// lets check rs1
wire [31:0][4:0] _GEN_25 =
...
{io_out_s_12_rd}, // this coresponds to arith in scala
...
wire [4:0] io_out_s_12_rs1 = _GEN_8[io_in[11:10]]; // extra mux
wire [3:0][4:0] _GEN_8 =
{{(&_io_out_s_T_235) ? io_out_s_res_4_rs1 : io_out_s_res_5_rs1},
{io_out_s_res_3_rs1},
{io_out_s_res_2_rs1},
{io_out_s_res_1_rs1}};
assign io_out_s_res_4_rs1 = {2'h1, io_in[9:7]};
assign io_out_s_res_5_rs1 = {2'h1, io_in[9:7]};
assign io_out_s_res_3_rs1 = {2'h1, io_in[9:7]};
assign io_out_s_res_2_rs1 = {2'h1, io_in[9:7]};
assign io_out_s_res_1_rs1 = {2'h1, io_in[9:7]};
Chisel is faithful of generating what you write. It might be argued that the synthesizer will optimize this out, but I think avoiding this from source is a better option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed explanation, I understand now. Pushed fixes in the latest commit.
src/main/scala/rocket/RVC.scala
Outdated
} | ||
else unimp | ||
} | ||
Seq(zextb, sextb, zexth, sexth, zextw, not)(x(4,2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When usingCompressedSuiteB
but not usingBitManip
, this will become Mux between a lot of unimp. Is there better way to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, not all of these (zextb, sextb, zexth, sexth, zextw, not
) become Mux. The problem is that only sextb, zexth, sexth, zextw
depends upon the usingBitManip
parameter and zextb
and not
are decoded in I-type instructions. So, if we keep the Seq
statement constant in the end, then EITHER sextb, zexth, sexth, zextw
needs to be decoded if usingBitManip
is enabled OR each one of these must corresponds tounimp
if usingBitManip
is disabled. Hence, comes up these bad if/else statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is something like this
if(usingCompressedSuiteB){
def sexth = inst(Cat(0x605.U, rs1p, 1.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
}
def zextw = inst(Cat(4.U, x0, rs1p, 0.U(3.W), rs1p, 0x3B.U(7.W)), rs1p, rs1p, x0)
}
def zexth = {
val zexth_common = Cat(0x80.U, rs1p, 4.U(3.W), rs1p)
if (xLen == 32) inst(Cat(zexth_common, 0x33.U(7.W)), rs1p, rs1p, rs2p)
else inst(Cat(zexth_common, 0x3B.U(7.W)), rs1p, rs1p, rs2p)
}
}
def zextb = inst(Cat(0xFF.U, rs1p, 7.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
def not = inst(Cat(0xFFF.U, rs1p, 4.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
def sextb = inst(Cat(0x604.U, rs1p, 1.U(3.W), rs1p, 0x13.U(7.W)), rs1p, rs1p, rs2p)
if (usingBitManip) {
Seq(zextb, sextb, zexth, sexth, zextw, not)(x(4,2))
} else {
Mux(x(4,2) === 5.U, not, Mux(x(4,2) === 0.U, zextb, unimp))
}
}
And like the comment in arith
, try to avoid selecting between a lot of rs1p/rs2p
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated the changes as suggested. Thanks
- Added `usingCompressedSuiteB` to enable/disable `Zcb` extension. - This parameter depends upon existing `usingCompressed` parameter. - By default `Zcb` is disable in RocketCore. - Passed this parameter along with `usingBitManip` and `usingMulDiv` to RVCDecoder enable/disable the decoding of corresponding `Zcb` instructions. Signed-off-by: Abdul Wadood <[email protected]>
Signed-off-by: Abdul Wadood <[email protected]>
Signed-off-by: Abdul Wadood <[email protected]>
Signed-off-by: Abdul Wadood <[email protected]>
Type of change: other enhancement
Impact: API addition (no impact on existing code)
Development Phase: implementation
Background
RISC-V code size reduction extension supersedes the existing standard
C
(compressed) extension. It divides instructions in theC
extension into three sub-extensions (Zca
,Zcf
, andZcd
) and exploits the reserved encodings inC
to add more instructions in three sub-extensions (Zcb
,Zcmp
andZcmt
).RocketCore already supports C-extension i.e. Zca, Zcf, and Zcd. The un-privilege spec classifies compressed instructions into three different parts: Quadrant 0, Quadrant 1, and Quadrant 2. This classification is done based on the Opcode of compressed instruction: bits[1:0].
Implementation
Zcb instructions fall into Quadrant 0 and Quadrant 1 having opcodes 2'b00 and 2'b01 respectively. Hence, Implementation is done in two parts:
Part 1:
In part 1, the Opcodes of the instructions belonging to Quadrant 0 are added, and the corresponding 16-bit->32-bit decode logic is hooked up with Q0 logic for the existing instructions. The instructions in this part are the compressed version of I-type load-store instructions:
lbu
,lh
,lhu
,sb
,sh
.Part 2:
In part 2, the Opcodes of the instructions belonging to Quadrant 1 are added, and the corresponding 16-bit->32-bit decode logic is hooked up with Q1 logic for the existing instructions. The instructions in this part are the has dependency on I-type, M-type, and Bit-manip (
Zba
,Zbb
) instructions:c.zext.b
,c.sext.b
,c.zext.h
,c.sext.h
,c.zext.w
,c.mul
,c.not
.Verification
A simple assembly program is written including all the
Zcb
instructions to test their functionality. It is available here. It is a bit crude way of testing in the early stage. A plan to develop more mature tests covering the corner cases is in the pipeline which will come as part of riscv-arch tests.Release Notes
This is one of three PR as part of the GSoC'23 project under the mentorship Hongren Zheng, to add Zc* support in Rocket. This PR adds hardware implementation to support the
Zcb
extension which is one of the variants of the recently ratifiedcode size reduction
extension. RocketCore has the infrastructure for the uncompressed version ofZcb
instructions. So, mainly there are changes implemented in RVCDecoder to decode the compressedZcb
instruction into its 32-bit equivalent instructions. The implementation works for both 32-bit and 64-bit architectures.Acknowledgement
Thanks to @ZenithalHourlyRate for his guidance and help.