Code Anatomy on CSR module #3028

DecodeTheEncoded · 2022-08-18T04:32:30Z

DecodeTheEncoded
Aug 18, 2022

CSR module is mainly for handling the access of control and status registers; The module also conducts logic to make sure functionality that corresponds to specific csrs actually works. The CSR module has intensive coupling with RocketCore, RocketCore feeds interrupts request collecting from elsewhere into the CSR module(csr.io.interrupts := io.interrupts), interrupt arbitration will be conducted(choosing one high priority interrupt among simultaneous firing ones) and the arbitration result is asserted(csr.io.interrupt) by CSR module and causes the instruction at ID stage being killed(ctrl_killd := !ibuf.io.inst(0).valid || ibuf.io.inst(0).bits.replay || take_pc_mem_wb || ctrl_stalld || csr.io.interrupt), and all instructions already in pipeline before the one at ID are guaranteed to complete, since the stages that modify microstructural states are WB and later(scoreboard), RocketCore therefore supports precise interrupt. The arbitrated interrupt and all other sync exceptions occurring during the pipeline will flow downwards, and send csr.io.exception := wb_xcpt to CSR module with other auxiliary signals like csr.io.cause := wb_cause, csr.io.tval := Mux(tval_valid, encodeVirtualAddress(wb_reg_wdata, wb_reg_wdata), 0.U), and csr.io.pc := wb_reg_pcfor further handling at WB stage. The csr access request are also asserted into csr module at WB(csr.io.rw.addr := wb_reg_inst(31,20) csr.io.rw.cmd := CSR.maskCmd(wb_reg_valid, wb_ctrl.csr) csr.io.rw.wdata := wb_reg_wdata);
The csr access request will have its effective response back at the same cycle:WB, the response is the old value stored in the csr, and will be stored into rd register: wb_waddr:

val wb_valid = wb_reg_valid && !replay_wb && !wb_xcpt
val wb_wen = wb_valid && wb_ctrl.wxd
val rf_wen = wb_wen || ll_wen
val rf_waddr = Mux(ll_wen, ll_waddr, wb_waddr)
val rf_wdata = Mux(dmem_resp_valid && dmem_resp_xpu, io.dmem.resp.bits.data(xLen-1, 0),
                Mux(ll_wen, ll_wdata,
                Mux(wb_ctrl.csr =/= CSR.N, csr.io.rw.rdata,
                Mux(wb_ctrl.mul, mul.map(_.io.resp.bits.data).getOrElse(wb_reg_wdata),
                wb_reg_wdata))))
when (rf_wen) { rf.write(rf_waddr, rf_wdata) }

Also note that the instruction at ID stage will be sent to CSR module for csr-related decoding:csr.io.decode(0).inst := id_inst(0), the csr module will decide whether instruction is illegal or needs to be stalled under current csr setting:

  for (io_dec <- io.decode) {
    val addr = io_dec.inst(31, 20)

    def decodeAny(m: LinkedHashMap[Int,Bits]): Bool = m.map { case(k: Int, _: Bits) => addr === k }.reduce(_||_)
    def decodeFast(s: Seq[Int]): Bool = DecodeLogic(addr, s.map(_.U), (read_mapping -- s).keys.toList.map(_.U))

    val _ :: is_break :: is_ret :: _ :: is_wfi :: is_sfence :: is_hfence_vvma :: is_hfence_gvma :: is_hlsv :: Nil =
      DecodeLogic(io_dec.inst, decode_table(0)._2.map(x=>X), decode_table).map(_.asBool)
    val is_counter = (addr.inRange(CSR.firstCtr, CSR.firstCtr + CSR.nCtr) || addr.inRange(CSR.firstCtrH, CSR.firstCtrH + CSR.nCtr))
    /*
    * hjr
    * WFI is available in all privileged modes, and optionally available to U-mode.
    * This instruction may raise an illegal instruction exception when TW=1 in mstatus
    *
    * The TW (Timeout Wait) bit supports intercepting the WFI instruction (see Section 3.2.3). When
    * TW=0, the WFI instruction may execute in lower privilege modes when not prevented for some
    * other reason. When TW=1, then if WFI is executed in any less-privileged mode, and it does not
    * complete within an implementation-specific, bounded time limit, the WFI instruction causes an
    * illegal instruction exception. The time limit may always be 0, in which case WFI always causes an
    * illegal instruction exception in less-privileged modes when TW=1. TW is hard-wired to 0 when
    * there are no modes less privileged than M.
    *
    * In RC implementation below, WFI is always ok in M-mode, but only ok in S or U mode iff mstatus.TW is asserted.
    * */
    val allow_wfi = Bool(!usingSupervisor) || reg_mstatus.prv > PRV.S || !reg_mstatus.tw && (!reg_mstatus.v || !reg_hstatus.vtw)
    /*
    * hjr
    * The TVM (Trap Virtual Memory) bit supports intercepting supervisor virtual-memory management
    * operations. When TVM=1, attempts to read or write the satp CSR or execute the
    * SFENCE.VMA instruction while executing in S-mode will raise an illegal instruction exception.
    * When TVM=0, these operations are permitted in S-mode. TVM is hard-wired to 0 when S-mode
    * is not supported.
    *
    * When only M-mode is available, SFENCE.VMA is ok to go.
    * When there is usingVM(generally, this is S-mode) in the system, SFENCE.VMA is ok to go when the current mode is M-mode
    * When in S-mode, SFENCE.VMA is only available when reg_mstatus.tvm is asserted
    * */
    val allow_sfence_vma = Bool(!usingVM) || reg_mstatus.prv > PRV.S || !Mux(reg_mstatus.v, reg_hstatus.vtvm, reg_mstatus.tvm)
    val allow_hfence_vvma = Bool(!usingHypervisor) || !reg_mstatus.v && (reg_mstatus.prv >= PRV.S)
    val allow_hlsv = Bool(!usingHypervisor) || !reg_mstatus.v && (reg_mstatus.prv >= PRV.S || reg_hstatus.hu)
    //hjr this is generally the same as allow_sfence_vma(ignoring H mode for now)
    val allow_sret = Bool(!usingSupervisor) || reg_mstatus.prv > PRV.S || !Mux(reg_mstatus.v, reg_hstatus.vtsr, reg_mstatus.tsr)
    val counter_addr = addr(log2Ceil(read_mcounteren.getWidth)-1, 0)
    /*
    * hjr
    * The counter-enable registers mcounteren and scounteren are 32-bit registers that control the
    * availability of the hardware performance-monitoring counters to the next-lowest privileged mode.
    *
    * If S-mode is implemented, the same bit positions in the scounteren register analogously control
    * access to these registers while executing in U-mode. If S-mode is permitted to access a counter
    * register and the corresponding bit is set in scounteren, then U-mode is also permitted to access
    * that register.
    *
    * (reg_mstatus.prv > PRV.S || read_mcounteren(counter_addr)):
    * The priv mode is M-mode(ignoring H for now), or access to these counters at the next-lowest privileged mode other than M is allowed.
    *
    * (!usingSupervisor || reg_mstatus.prv >= PRV.S || read_scounteren(counter_addr)):
    * !usingSupervisor-> If there is no S-mode, then the enabling of mcounteren is already enough to make sure U-mode has access to these counter.
    * If the priv mode now is S-mode, the enabling of mcounteren is already enough to make sure U-mode has access to these counter.
    * If the priv mode now is U, then we need to make sure the U mode has access to these counters by checking the scounteren.
    * */
    val allow_counter = (reg_mstatus.prv > PRV.S || read_mcounteren(counter_addr)) &&
      (!usingSupervisor || reg_mstatus.prv >= PRV.S || read_scounteren(counter_addr)) &&
      (!usingHypervisor || !reg_mstatus.v || read_hcounteren(counter_addr))
    io_dec.fp_illegal := io.status.fs === 0 || reg_mstatus.v && reg_vsstatus.fs === 0 || !reg_misa('f'-'a')
    io_dec.vector_illegal := io.status.vs === 0 || reg_mstatus.v && reg_vsstatus.vs === 0 || !reg_misa('v'-'a')
    io_dec.fp_csr := decodeFast(fp_csrs.keys.toList)
    io_dec.rocc_illegal := io.status.xs === 0 || reg_mstatus.v && reg_vsstatus.vs === 0 || !reg_misa('x'-'a')
    val csr_addr_legal = reg_mstatus.prv >= CSR.mode(addr) ||
      usingHypervisor && !reg_mstatus.v && reg_mstatus.prv === PRV.S && CSR.mode(addr) === PRV.H
    val csr_exists = decodeAny(read_mapping)
    io_dec.read_illegal := !csr_addr_legal ||//hjr lower mode can never access higher-priv level csrs.
      !csr_exists ||//hjr accessing non-existent csrs
      ((addr === CSRs.satp || addr === CSRs.hgatp) && !allow_sfence_vma) ||
      is_counter && !allow_counter ||
      //hjr accessing debug related csrs when not in the debug mode.
      decodeFast(debug_csrs.keys.toList) && !reg_debug ||
      decodeFast(vector_csrs.keys.toList) && io_dec.vector_illegal ||
      io_dec.fp_csr && io_dec.fp_illegal
    io_dec.write_illegal := addr(11,10).andR//hjr can not write read-only registers.
    io_dec.write_flush := {
      /*
      * hjr m(s)scratch, m(s)epc,m(s)cause,m(s)tval are among io_dec.csr >= CSRs.mscratch && io_dec.csr <= CSRs.mtval
      * todo why writing to these registers will flush the pipeline.
      *
      * */

      val addr_m = addr | (PRV.M << CSR.modeLSB)
      !(addr_m >= CSRs.mscratch && addr_m <= CSRs.mtval)
    }
    io_dec.system_illegal := !csr_addr_legal && !is_hlsv ||
      is_wfi && !allow_wfi ||
      is_ret && !allow_sret ||
      is_ret && addr(10) && addr(7) && !reg_debug ||//hjr executing dret when not in the debug mode
      (is_sfence || is_hfence_gvma) && !allow_sfence_vma ||
      is_hfence_vvma && !allow_hfence_vvma ||
      is_hlsv && !allow_hlsv

    io_dec.virtual_access_illegal := reg_mstatus.v && csr_exists && (
      CSR.mode(addr) === PRV.H ||
      is_counter && read_mcounteren(counter_addr) && (!read_hcounteren(counter_addr) || !reg_mstatus.prv(0) && !read_scounteren(counter_addr)) ||
      CSR.mode(addr) === PRV.S && !reg_mstatus.prv(0) ||
      addr === CSRs.satp && reg_mstatus.prv(0) && reg_hstatus.vtvm)

    io_dec.virtual_system_illegal := reg_mstatus.v && (
      is_hfence_vvma ||
      is_hfence_gvma ||
      is_hlsv ||
      is_wfi && (!reg_mstatus.prv(0) || !reg_mstatus.tw && reg_hstatus.vtw) ||
      is_ret && CSR.mode(addr) === PRV.S && (!reg_mstatus.prv(0) || reg_hstatus.vtsr) ||
      is_sfence && (!reg_mstatus.prv(0) || reg_hstatus.vtvm))
  }

If the corresponding logic in CSR module decides instruction at ID is illegal(for examle, write to a read only csr, a floating-point instruction while no fp module is supported in the system, etc.), id_illegal_insn is asserted, therefore indicating an exception happens at ID stage(id_xcpt):

  val id_illegal_insn = !id_ctrl.legal ||
    (id_ctrl.mul || id_ctrl.div) && !csr.io.status.isa('m'-'a') ||
    id_ctrl.amo && !csr.io.status.isa('a'-'a') ||
    id_ctrl.fp && (csr.io.decode(0).fp_illegal || io.fpu.illegal_rm) ||
    id_ctrl.dp && !csr.io.status.isa('d'-'a') ||
    ibuf.io.inst(0).bits.rvc && !csr.io.status.isa('c'-'a') ||
    id_raddr2_illegal && !id_ctrl.scie && id_ctrl.rxs2 ||
    id_raddr1_illegal && !id_ctrl.scie && id_ctrl.rxs1 ||
    id_waddr_illegal && !id_ctrl.scie && id_ctrl.wxd ||
    id_ctrl.rocc && csr.io.decode(0).rocc_illegal ||
    id_ctrl.scie && !(id_scie_decoder.unpipelined || id_scie_decoder.pipelined) ||
    id_csr_en && (csr.io.decode(0).read_illegal || !id_csr_ren && csr.io.decode(0).write_illegal) ||
    !ibuf.io.inst(0).bits.rvc && (id_system_insn && csr.io.decode(0).system_illegal)

  val (id_xcpt, id_cause) = checkExceptions(List(
    (csr.io.interrupt, csr.io.interrupt_cause),
    (bpu.io.debug_if,  UInt(CSR.debugTriggerCause)),
    (bpu.io.xcpt_if,   UInt(Causes.breakpoint)),
    (id_xcpt0.pf.inst, UInt(Causes.fetch_page_fault)),
    (id_xcpt0.gf.inst, UInt(Causes.fetch_guest_page_fault)),
    (id_xcpt0.ae.inst, UInt(Causes.fetch_access)),
    (id_xcpt1.pf.inst, UInt(Causes.fetch_page_fault)),
    (id_xcpt1.gf.inst, UInt(Causes.fetch_guest_page_fault)),
    (id_xcpt1.ae.inst, UInt(Causes.fetch_access)),
    (id_virtual_insn,  UInt(Causes.virtual_instruction)),
    (id_illegal_insn,  UInt(Causes.illegal_instruction))))

As mentioned above, exception happening at any stage of pipeline will flow downwards to WB, so is id_xcpt. It will flow to WB and assert exception request to csr module there. If an exception happens at one specific stage, the RocketCore has logic to prevent that instruction(or any instruction after that instruction in program order) from modifying the microstructural state of the core, I actually feel lost in the jungle of replay, kill, take_pc, etc.
In CSR module, read_mapping is a collection of mapping from csr address to the specific registers that holds the value for all supported csrs, part of the read_mapping is as follows:

  val read_mapping = LinkedHashMap[Int,Bits](
    CSRs.tselect -> reg_tselect,
    CSRs.tdata1 -> reg_bp(reg_tselect).control.asUInt,
    CSRs.tdata2 -> reg_bp(reg_tselect).address.sextTo(xLen),
    CSRs.tdata3 -> reg_bp(reg_tselect).textra.asUInt,
    CSRs.misa -> reg_misa,
    CSRs.mstatus -> read_mstatus,
    CSRs.mtvec -> read_mtvec,
    CSRs.mip -> read_mip,
    CSRs.mie -> reg_mie,
    CSRs.mscratch -> reg_mscratch,
    CSRs.mepc -> readEPC(reg_mepc).sextTo(xLen),
    CSRs.mtval -> reg_mtval.sextTo(xLen),
    CSRs.mcause -> reg_mcause,
    CSRs.mhartid -> io.hartid)

And decoded_addr is a collection of mapping from csr address to a bool indicating if that csr address is accessed by the core, the previous impl of decoded_addr is quite straightforward: val decoded_addr = read_mapping map { case (k, v) => k -> (io.rw.addr === k) }. The read portion of csr access is easy: io.rw.rdata := Mux1H(for ((k, v) <- read_mapping) yield decoded_addr(k) -> v). In terms of csr write access, note that there are situations that a csrrs/c or csrrs/ci will not write to the corresponding csr if the rs1 or the uimm is 0, this has been handled in RocketCore: val id_csr_ren = id_ctrl.csr.isOneOf(CSR.S, CSR.C) && id_expanded_inst(0).rs1 === UInt(0) means this csr access only reads the old value, the csr access request flowing downwards to WB will be CSR.R: val id_csr = Mux(id_system_insn && id_ctrl.mem, CSR.N, Mux(id_csr_ren, CSR.R, id_ctrl.csr)) instead of CSR.C or CSR.S. The val csr_wen = io.rw.cmd.isOneOf(CSR.S, CSR.C, CSR.W) in CSR module therefore is correct for indicating a csr write. the val wdata = readModifyWriteCSR(io.rw.cmd, io.rw.rdata, io.rw.wdata) prepares the data to be written, see code below:

  //hjr io.rw.rdata is the existing csr value:io.rw.rdata := Mux1H(for ((k, v) <- read_mapping) yield decoded_addr(k) -> v)
  val wdata = readModifyWriteCSR(io.rw.cmd, io.rw.rdata, io.rw.wdata)
  /*
  * hjr
  * for R(010)->(rdata|wdata)---> this is just meaningless.
  * for W(101)->(wdata)
  * for S(110)-> (rdata|wdata)
  * for C(111)->(rdata|wdata)&(~wdata)
  * */
  def readModifyWriteCSR(cmd: UInt, rdata: UInt, wdata: UInt) = {
    (Mux(cmd(1), rdata, UInt(0)) | wdata) & ~Mux(cmd(1,0).andR, wdata, UInt(0))
  }

Note the comment in code section above, this is a sleek, generic way for generating data to be written no matter it's W, S, or C.
The csr write logic is straightforward after the depiction above, there are many csr registers in the RC system that could be written, therefore that's a tedious yet long when(csr_wen){when(isSpecificCSR){/*conducting write*/}}, below is code sections for write to some of the csrs:

when (csr_wen) {
    val scause_mask = ((BigInt(1) << (xLen-1)) + 31).U /* only implement 5 LSBs and MSB */
    val satp_valid_modes = 0 +: (minPgLevels to pgLevels).map(new PTBR().pgLevelsToMode(_))

    when (decoded_addr(CSRs.mstatus)) {
      val new_mstatus = new MStatus().fromBits(wdata)
      reg_mstatus.mie := new_mstatus.mie
      reg_mstatus.mpie := new_mstatus.mpie

      if (usingUser) {
        reg_mstatus.mprv := new_mstatus.mprv
        reg_mstatus.mpp := legalizePrivilege(new_mstatus.mpp)
        if (usingSupervisor) {
          reg_mstatus.spp := new_mstatus.spp
          reg_mstatus.spie := new_mstatus.spie
          reg_mstatus.sie := new_mstatus.sie
          reg_mstatus.tw := new_mstatus.tw
          reg_mstatus.tsr := new_mstatus.tsr
        }
        if (usingVM) {
          reg_mstatus.mxr := new_mstatus.mxr
          reg_mstatus.sum := new_mstatus.sum
          reg_mstatus.tvm := new_mstatus.tvm
        }
        if (usingHypervisor) {
          reg_mstatus.mpv := new_mstatus.mpv
          reg_mstatus.gva := new_mstatus.gva
        }
      }

      if (usingSupervisor || usingFPU) reg_mstatus.fs := formFS(new_mstatus.fs)
      reg_mstatus.vs := formVS(new_mstatus.vs)
    }
    when (decoded_addr(CSRs.misa)) {
      val mask = UInt(isaStringToMask(isaMaskString), xLen)
      val f = wdata('f' - 'a')
      // suppress write if it would cause the next fetch to be misaligned
      when (!usingCompressed || !io.pc(1) || wdata('c' - 'a')) {
        if (coreParams.misaWritable)
          reg_misa := ~(~wdata | (!f << ('d' - 'a'))) & mask | reg_misa & ~mask
      }
    }
    when (decoded_addr(CSRs.mip)) {
      // MIP should be modified based on the value in reg_mip, not the value
      // in read_mip, since read_mip.seip is the OR of reg_mip.seip and
      // io.interrupts.seip.  We don't want the value on the PLIC line to
      // inadvertently be OR'd into read_mip.seip.
      val new_mip = readModifyWriteCSR(io.rw.cmd, reg_mip.asUInt, io.rw.wdata).asTypeOf(new MIP)
      if (usingSupervisor) {
        reg_mip.ssip := new_mip.ssip
        reg_mip.stip := new_mip.stip
        reg_mip.seip := new_mip.seip
      }
      if (usingHypervisor) {
        reg_mip.vssip := new_mip.vssip
      }
    }
}

Note that there are register-specific rules in terms of csr write. For instance, some fields of the sip can only be writable when corresponding interrupts are delegated to S-mode, etc.. Refer to privilege spec for specific info.
It's worth noting that privileged instructions are handled not in the RocketCore pipeline, they are sent to csr module because they just need to dance with some of the csrs. These instructions have the same opcode and encoding with the CSR access instruction, and use the insn[31:20](this field is originally for csr address encoding in terms of normal csr access) to encode which specific instruction it is. Therefore they can be sent to csr module using the same interface with the csr access:

  csr.io.rw.addr := wb_reg_inst(31,20)
  csr.io.rw.cmd := CSR.maskCmd(wb_reg_valid, wb_ctrl.csr)
  csr.io.rw.wdata := wb_reg_wdata

There are 5 privileged instructions, insn_call :: insn_break :: insn_ret :: insn_cease :: insn_wfi. ecall, ebreak, insn_ret are exception related and will be depicted below. the Wait for Interrupt instruction (WFI) provides a hint to the implementation that the current hart can be stalled until an interrupt might need servicing. Refer to 3.2.3 of priv spec for detailed info.
In general, a wfi instruction will stall the pipeline (io.csr_stall := reg_wfi || io.status.cease) and notify other parts of the soc that this hart is waiting for an interrupt, maybe logic elsewhere will direct an interrupt to this hart, like what 's said in the spec: Execution of the WFI instruction can also be used to inform the hardware platform that suitable interrupts should preferentially be routed to this hart.:

//In CSR.scala
io.status.wfi := reg_wfi
//In RocketCore.scala
io.wfi := csr.io.status.wfi
In RocketTile.scala 
outer.reportWFI(Some(core.io.wfi))

// Report when the tile is waiting for an interrupt
val wfiNode = IntSourceNode(IntSourcePortSimple())

def reportWFI(could_wfi: Option[Bool]): Unit = {
  val (wfi, _) = wfiNode.out(0)
  wfi(0) := could_wfi.map(RegNext(_, init=false.B)).getOrElse(false.B)
}

Note that once a wfi executes, reg_wfi := true will be asserted, indicating this hart is waiting for interrupt. In RC impl, any sign of forthcoming interrupt will deassert the wait for interrupt state:when (pending_interrupts.orR || io.interrupts.debug || exception) { reg_wfi := false } and io.interrupts.nmi.map(nmi => when (nmi.rnmi) { reg_wfi := false } )
Another privileged instructions supported in RC impl while not depicted in priv spec is the CEASE instruction, this is a custom instruction, and also uses the SYSTEM encoding space like wfi. Refer to 7.3 of U54_MC core complex manual for further detail, it's only available in M-mode. CEASE is mainly for instigating the power-down sequence, after retiring CEASE, hart shall not retire another instruction until reset, and debug haltreq will not work after a CEASE instruction has retired. CEASE will eventually raise the cease_from_tile_N signal
to the outside of the Core Complex, indicating that it is safe to power down. Refer to 14.4 and 14.9 for detailed depictions in terms of system power down. The corresponding code in terms of CEASE is as follows:

//In CSR.scala
  io.csr_stall := reg_wfi || io.status.cease
  //hjr this RegEnable makes sure that `io.status.cease` will keep asserting until next reset
  io.status.cease := RegEnable(true.B, false.B, insn_cease)
  //hjr no interrupt is fired under debug mode.--EVEN THE DEBUG INTERRUPT ITSELF.
  //hjr Debug haltreq will not work after a CEASE instruction has retired.
  //hjr interrupt is masked if single-step is enabled.
  io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease)
//In RocketCore.scala
  io.cease := csr.io.status.cease && !clock_en_reg


  // Report when the tile has ceased to retire instructions
  val ceaseNode = IntSourceNode(IntSourcePortSimple())

  def reportCease(could_cease: Option[Bool], quiescenceCycles: Int = 8): Unit = {
    def waitForQuiescence(cease: Bool): Bool = {
      // don't report cease until signal is stable for longer than any pipeline depth
      val count = RegInit(0.U(log2Ceil(quiescenceCycles + 1).W))
      val saturated = count >= quiescenceCycles.U
      when (!cease) { count := 0.U }
      when (cease && !saturated) { count := count + 1.U }
      saturated
    }
    val (cease, _) = ceaseNode.out(0)
    cease(0) := could_cease.map{ c => 
      val cease = (waitForQuiescence(c))
      // Test-Only Code --
      val prev_cease = RegNext(cease, false.B)
      assert(!(prev_cease & !cease), "CEASE line can not glitch once raised") 
      cease
    }.getOrElse(false.B)
  }

One thing to note is that the CEASE in WB will stall the pipeline, meaning that no new instructions will be fetched, my confusion however is that the insn in EX and MEM may retire successfully, which violates the depiction in U54 manual that after retiring CEASE, hart shall not retire another instruction until reset. Maybe it's guaranteed by software so that there is no extra instructions after CEASE.
Besides the logic for csr access, there is huge amount of code for interrupt and exception handling in csr module. Handling of exception and interrupt is a very essential part of processor core and the RISC-V privilege spec has very confusing depictions in terms of the interrupt and exception, therefore some of my clarifications may not be right.
As depicted before, the interrupts signals(io.interrupts) are accumulated from elsewhere and injected into csr module, some form of arbitration will be conducted in the csr module, and one with highest priority will be chosen(io.interrupt) as the effective interrupt at play. This chosen interrupt and corresponding io.interrupt_cause will be asserted to RocketCore, causing id_xcpt and id_cause
at ID stage being asserted, all instructions before the one at ID stage will complete, while the one at ID will assert ctrl_killd, and an alternative nop will flow downwards instead, along with extra indicator signal for exception(interrupt)--ex_reg_xcpt, mem_reg_xcpt, wb_reg_xcpt, these exception indicator signal will assure instruction at the same stage and subsequent instructions fed into pipeline will not cause the microstructural state change; There are lots of complications in terms of not causing microstructural state change, for example, the RocketCore normally initiates dmem(dcache) request at EX stage, therefore there is logic to kill this request one cycle later(io.dmem.s1_kill) at MEM if the instruction before(now at WB) has any unexpected situation happening(reflected in take_pc_wb), or the MEM stage exception indicator signal mem_reg_xcpt is asserted. And WB stage version of exception indicator signal (wb_xcpt) will cause the wb_valid not being asserted, therefore no register will be update at WB(indicated by rf_wen). These are just very superficial depictions in terms of RocketCore pipeline. Maybe I will post another code anatomy for RocketCore, details will be clarified there.
Among many interrupts, we need to find the one with highest priority that can be taken. According to the spec:

An interrupt i will be taken if bit i is set in both mip and mie, and if interrupts are globally enabled. By default, M-mode interrupts are globally enabled if the hart’s current privilege mode is less than M, or if the current privilege mode is M and the MIE bit in the mstatus register is set. If bit i in mideleg is set, however, interrupts are considered to be globally enabled if the hart’s current privilege mode equals the delegated privilege mode (S or U) and that mode’s interrupt enable bit (SIE or UIE in mstatus) is set, or if the current privilege mode is less than the delegated privilege mode.

Also note the mip and mie also have subset versions, according to the 3.1.9 section of priv spec:

Restricted views of the mip and mie registers appear as the sip/sie, and uip/uie registers in S-mode and U-mode respectively. If an interrupt is delegated to privilege mode x by setting a bit in the mideleg register, it becomes visible in the x ip register and is maskable using the x ie register. Otherwise, the corresponding bits in x ip and x ie appear to be hardwired to zero.

The clarification above just means that if an interrupt gets delegated to lower priv levels. Whether that interrupt is pending and enabled should be decided by fields in xip and xie, instead of the corresponding fields in mip and mie. However, according to the 4.1.5 of priv spec:

The sip and sie registers are subsets of the mip and mie registers. Reading any field, or writing any writable field, of sip/sie effects a read or write of the homonymous field of mip/mie.

That means basically the fields in xip/xie and mip/mie are the same, if corresponding bit exists in lower xip/xie. Therefore, the RC impl just use mip and mie to decide the pending interrupts.
Now, let's dive into the interrupt arbitration code:

  val mip = Wire(init=reg_mip)
  mip.lip := (io.interrupts.lip: Seq[Bool])
  mip.mtip := io.interrupts.mtip
  mip.msip := io.interrupts.msip
  mip.meip := io.interrupts.meip
  // seip is the OR of reg_mip.seip and the actual line from the PLIC
  io.interrupts.seip.foreach { mip.seip := reg_mip.seip || _ }
  mip.rocc := io.rocc_interrupt
  val read_mip = mip.asUInt & supported_interrupts
  val read_hip = read_mip & hs_delegable_interrupts
  val high_interrupts = (if (usingNMI) 0.U else io.interrupts.buserror.map(_ << CSR.busErrorIntCause).getOrElse(0.U))
  // Similar sort of thing would apply if the PLIC had a VSEIP line:
  //io.interrupts.vseip.foreach { mip.vseip := reg_mip.vseip || _ }
  mip.rocc := io.rocc_interrupt
  val read_mip = mip.asUInt & supported_interrupts
  val read_hip = read_mip & hs_delegable_interrupts
  val high_interrupts = (if (usingNMI) 0.U else io.interrupts.buserror.map(_ << CSR.busErrorIntCause).getOrElse(0.U))

  val pending_interrupts = high_interrupts | (read_mip & reg_mie)

  val d_interrupts = io.interrupts.debug << CSR.debugIntCause
  val (nmi_interrupts, nmiFlag) = io.interrupts.nmi.map(nmi =>
    (((nmi.rnmi && reg_rnmie) << CSR.rnmiIntCause) |
    io.interrupts.buserror.map(_ << CSR.rnmiBEUCause).getOrElse(0.U),
      //hjr nmi interrupts are not available in debug mode.
    !io.interrupts.debug && nmi.rnmi && reg_rnmie)).getOrElse(0.U, false.B)

There are some clarifications in terms of code above:

the io.interrupts only has mtip, msip, meip, seip in terms of pending interrupts in mip. These interrupts are coming from PLIC(meip and seip) and CLINT(mtip and msip), and are read-only in terms of csr access. This complies with the depiction in 3.1.9 of spec that Only the bits corresponding to lower-privilege software interrupts (USIP, SSIP), timer interrupts (UTIP, STIP), and external interrupts (UEIP, SEIP) in mip are writable through this CSR address; the remaining bits are read-only. Note that the supervisor external interrupt (seip) can be fired by either from signals coming from PLIC or by writing to the seip bits in mip csr, the rationale for this has been depicted in the spec: The SEIP field behavior is designed to allow a higher privilege layer to mimic external interrupts cleanly, without losing any real external interrupts. The behavior of the CSR instructions is slightly modified from regular CSR accesses as a result.
The buserror(io.interrupts.buserror) will be treated as a nmi interrupt if the nmi is supported, otherwise it will be treated as a normal interrupt(as a member of pending_interrupts) with highest priority.
nmi is non-maskable interrupt for short, that means this kind of interrupt is not maskable through mie. There are actually two forms of nmi: unmi and rnmi. unmi means unresumable non-maskable interrupts, where the NMI jumps to a handler in machine mode, overwriting the current mepc and mcause register values. If the hart had been executing machine-mode code in a trap handler, the previous values in mepc and mcause would not be recoverable and so execution is not generally resumable. That is to say that the unmi is handled using the m-mode exception facility. There is another type of nmi: rnmi, rnmi has its own interrupt handling facility and 4 extra csrs(mnepc, mncause, mnstatus, and mnscratch) are added to the csr space. Refer to 8.11 of the u54mc_core_complex manual and this rnmi proposal for further detail. What's worth noting is that there is an internal micro-architectural state bit rnmie exists to reflect whether there is an on-going rnmi, rnmie is cleared to indicate that the processor is in an RNMI handler and cannot take a new RNMI interrupt. When clear, all other interrupts are disabled except debug interrupts.
Once all pending interrupts are decided, we need to choose one of the interrupts as the firing one. The holistic criteria is simple: choose the pending one(decided via mie and mip, also note that the debug interrupt and rnmi has their own pending indicator signals: nmi_interrupts and d_interrupts) with the highest priority that is not global disabled(decided by corresponding fields in mstatus and current priv level).See code below:

  /* hjr
   * An interrupt i will be taken if bit i is set in both mip and mie, and if interrupts are globally enabled.
   * By default, M-mode interrupts are globally enabled if the hart’s current privilege mode is less than
   * M, or if the current privilege mode is M and the MIE bit in the mstatus register is set. If bit i
   * in mideleg is set, however, interrupts are considered to be globally enabled if the hart’s current
   * privilege mode equals the delegated privilege mode (S or U) and that mode’s interrupt enable bit
   * (SIE or UIE in mstatus) is set, or if the current privilege mode is less than the delegated privilege
   * mode.
   *
   * */
  /*
  * hjr m_interrupts here really means the interrupt that is not delegated to lower priv levels, not MEIP, MSIP, MTIP.
  * (reg_mstatus.prv <= PRV.S || reg_mstatus.mie) is actually the definition of "interrupts are globally enabled."
  *
  * If an interrupt is delegated to lower pri level, that interrupt is masked at pri levels that is higher than the level it's delegated.
  * Fire of that interrupt when hart is in higher priv mode will be not taken. That is: this interrupt is globally disabled (3.1.6.1Interrupts for lower-privilege
  * modes,w<x, are always globally disabled regardless of the setting of the lower-privilege mode’s global w IE bit.)
  *
  * If an lower priv level interrupt like SSIP  is not delegated, then this SSIP will not be masked at higher priv level(specifically M-mode) if it's globally enabled
  * that is :if the hart’s current privilege mode is less than M, or if the current privilege mode is M and the MIE bit in the mstatus register is set.
  *
  * Surprisingly, chooseInterrupt could handle both the delegate&non-delegate case.
  * */
  val m_interrupts = Mux(nmie && (reg_mstatus.prv <= PRV.S || reg_mstatus.mie), ~(~pending_interrupts | read_mideleg), UInt(0))
  val s_interrupts = Mux(nmie && (reg_mstatus.v || reg_mstatus.prv < PRV.S || (reg_mstatus.prv === PRV.S && reg_mstatus.sie)), pending_interrupts & read_mideleg & ~read_hideleg, UInt(0))
  val vs_interrupts = Mux(nmie && (reg_mstatus.v && (reg_mstatus.prv < PRV.S || reg_mstatus.prv === PRV.S && reg_vsstatus.sie)), pending_interrupts & read_hideleg, UInt(0))
  val (anyInterrupt, whichInterrupt) = chooseInterrupt(Seq(vs_interrupts, s_interrupts, m_interrupts, nmi_interrupts, d_interrupts))

  def chooseInterrupt(masksIn: Seq[UInt]): (Bool, UInt) = {
    //hjr "to" means inclusion, 12 is the least non-standard exception code for *interrupt*
    val nonstandard = supported_interrupts.getWidth-1 to 12 by -1
    // MEI, MSI, MTI,  SEI, SSI, STI, VSEI, VSSI, VSTI, UEI, USI, UTI
    val standard = Seq(11, 3, 7, 9, 1, 5, 10, 2, 6, 8, 0, 4)
    val priority = nonstandard ++ standard//hjr the lower the index, the higher the priority
    //hjr 0 -> d_interrupts, 1 -> nmi_interrupts 2 -> m_interrupts 3 -> s_interrupts
    val masks = masksIn.reverse
    val any = masks.flatMap(m => priority.filter(_ < m.getWidth).map(i => m(i))).reduce(_||_)
    //hjr the filter method reserves the order of elements. This is the core reason why PriorityMux works.
    val which = PriorityMux(masks.flatMap(m => priority.filter(_ < m.getWidth).map(i => (m(i), i.U))))
    (any, which)
  }

Here comes a very important conception: exception(interrupt) delegation; Though all exceptions and interrupts are normally handled in m-mode, the RISC-V provides a way of delegating specific exceptions to lower priv levels, so that these exceptions are handled in a priv level where no context switching is needed. Refer to 3.1.8 of priv spec for further details. In short summary, if corresponding bit in mideleg or medeleg is set, the exception or interrupt that corresponds to that specific bits will be delegated into lower priv level(S or U,note that if a system has M,S,U, setting the corresponding bits in midelge or medeleg will just delegate the exception to S-mode, there are bits in sideleg and sedeleg that will be set to delegate that exception to U-mode), and xepc, xtval and xcause, etc. will be updated instead of mepc, mtval and mcause, the xPP field of mstatus is written with the active privilege mode at the time of the trap; the xPIE field of mstatus is written with the value of the xIE field at the time of the trap; and the xIE field of mstatus is cleared. The mcause and mepc registers and the MPP and MPIE fields of mstatus are not written.. What's worth noting is that a delegated interrupt will cause that interrupt at the delegator priv level being mask. For example, if the supervisor timer interrupt (STI) is delegated to S-mode by setting mideleg[5], STIs will not be taken when executing in M-mode. Also, if an (sync) exception is delegated to a less-priv level(S-mode for example), when that exception happens at the M-mode, that exception will still be handled in m-mode, instead of s-mode. A more priv interrupt shall never be delegated to less priv levels, for instance, msip, meip, mtip shall never be delegated to S-mode. Some exceptions can not happen at less priv levels, therefore corresponding bits in xedeleg should be hardwired to 0;
When deciding which interrupt to fire, the RC impl first finds ones that are delegated and one that are not. Because more priv interrupt shall never be delegated to less priv levels, so interrupts that are in m_interrupts(ones that are not delegated) surely have higher priority than ones in s_interrupts(ones that are delegated to S-mode). Note that there is a holistic order: debug_interrupt(d_interrupts), rnmi_interrupt(nmi_interrupts), m_interrupt, s_interrupts, vs_interrupts, that is to say that the debug interrupt are of highest priority, the second is rnmi, etc. In terms of specific implementation, chooseInterrupt takes masksIn of the reverse order, therefore reverses it(val masks = masksIn.reverse, 0 -> d_interrupts, 1 -> nmi_interrupts 2 -> m_interrupts 3 -> s_interrupts), and for each item in masks, there is also an implicit order(for example, MEI > MSI> MTI; SEI> SSI> STI; VSEI> VSSI> VSTI; UEI> USI> UTI). Consequently, val which = PriorityMux(masks.flatMap(m => priority.filter(_ < m.getWidth).map(i => (m(i), i.U)))) will choose the appropriate interrupt that fulfills the spec requirements. Note that the chooseInterrupt still holds for situations that no s-mode interrupts(SEI, SSI, STI) are delegated, these interrupts therefore are represented in m_interrupt, there is still an order in m_interrupt: MEI > MSI> MTI> SEI> SSI> STI.
After one interrupt are chosen, io.interrupt and corresponding cause io.interrupt_cause are asserted to RocketCore:

  val (anyInterrupt, whichInterrupt) = chooseInterrupt(Seq(vs_interrupts, s_interrupts, m_interrupts, nmi_interrupts, d_interrupts))
  val interruptMSB = BigInt(1) << (xLen-1)
  val interruptCause = UInt(interruptMSB) + (nmiFlag << (xLen-2)) + whichInterrupt// nmi interrupt will have the second msb to be asserted.
  //hjr no interrupt is fired under debug mode.--EVEN THE DEBUG INTERRUPT ITSELF.
  //hjr Debug haltreq will not work after a CEASE instruction has retired.
  //hjr interrupt is masked if singlestep is enabled.
  io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease)
  io.interrupt_cause := interruptCause

Note that RC impl also supports single step debugging facility, detailed info can be obtained in section 4.4 of debug spec; The step field in dcsr can be modified in debug mode; After it's set and the harts snaps out of debug mode(io.singleStep := reg_dcsr.step && !reg_debug), it will execute exactly one instruction(io.retire(0)) and re-trap to debug mode again. The specific mechanism that make this work is as follows:
dret at WB will assert take_pc_wb, therefore flush(kill) the ongoing instructions in pipeline(including the ones in ibuf). io.singleStep is asserted once the dret finish executing. New instructions starting at csr.io.evec will enter the pipeline, note that once one instruction flows downwards beyond ID successfully, the ID stage will be stalled(val ctrl_stalld = ...csr.io.singleStep && (ex_reg_valid || mem_reg_valid || wb_reg_valid) ), meaning that no extra instruction will enter the ID stage. Once that instruction retires at WB or an exception happens along with it(io.retire(0) || exception), reg_singleStepped will be asserted next cycle; The assertion of reg_singleStepped will assert the io.interrupt to RocketCore, indicating that an (debug) interrupt needs to be handled. io.interrupt will cause execution flow change to debug entry, entering the debug mode and asserting reg_debug therefore, the assertion of reg_debug will cause io.singleStep being false, consequently code sequence in the debug rom will not be stalled because of assertion of ctrl_stalld. Note that io.singleStep will keep asserting from the very moment after leaving debug mode in which the dcsr.step is configured, to the moment debug mode is re-trapped. De-assertion of io.singleStep will cause reg_singleStepped deasserted, marking a round of single step operation. The corresponding code is as follows:

  /*
  * hjr
  * the debug specs says(4.4.1):
  * If control is transferred to a trap handler while executing the instruction, then Debug Mode is
  * re-entered immediately after the PC is changed to the trap handler, and the appropriate tval and
  * cause registers are updated. In this case none of the trap handler is executed, and if the cause was
  * a pending interrupt no instructions might be executed at all.
  *
  * When an exception comes, reg_singleStepped will be true.B next cycle(io.retire(0) || exception), During this cycle(before next), the standard exception handling process is taking effect
  * like updating MIE MPP, setting the pc to handler, etc. When next cycle comes,these exception related info has been update. But this "next cycle", a debug
  * interrupt will happen:
  * io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease)
  * val trapToDebug = Bool(usingDebug) && (reg_singleStepped || causeIsDebugInt || causeIsDebugTrigger || causeIsDebugBreak || reg_debug)
  * SO, this complies with the spec:
  * If control is transferred to a trap handler while executing the instruction, then Debug Mode is
  * re-entered immediately after the PC is changed to the trap handler, and the appropriate tval and
  * cause registers are updated.
  *
  * Also, See spec description below:
  * If executing or fetching the instruction causes a trigger to fire with action=1, Debug Mode is reentered
  * immediately after that trigger has fired. In that case cause is set to 2 (trigger) instead of 4
  * (single step). Whether the instruction is executed or not depends on the specific configuration of
  * the trigger.
  * In terms of impl in RC, a trigger is detected at ID stage: val (id_xcpt, id_cause) = (bpu.io.debug_if,  UInt(CSR.debugTriggerCause))
  * When a trigger is fired, that insn will be marked xcpt, and when it flows to the wb stage, retire indicator will not be asserted, therefore the
  * debug mode will not be entered because of singlestep request, instead the exception will be handled(action=1 will make hart enter into debug mode,
  * but this is not because the step.So,In that case cause is set to 2 (trigger) instead of 4(single step).) .Note that if debug mode is entered, the singlestep
  * request will be deasserted.
  * What's worth noting here is that the last connection wins principle: Normally, io.singleStep will not be asserted, and pipeline will keep retiring instructions.
  * Under this scenario,  reg_singleStepped := true and reg_singleStepped := false will take effect at the same time, and the last connection wins says that
  * reg_singleStepped := false
  * */
  when (io.retire(0) || exception) { reg_singleStepped := true }
  when (!io.singleStep) { reg_singleStepped := false }//hjr once entering debug mode(io.singleStep being false.B), reg_singleStepped is deasserted.
  assert(!io.singleStep || io.retire <= UInt(1))//hjr when singleStep is asserted, only one insn should be retired(not a problem for RC)
  assert(!reg_singleStepped || io.retire === UInt(0))//hjr no further insn should retire once reg_singleStepped := true

  io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease)

From io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease) we can also see that 1,no interrupt is fired under debug mode.--EVEN THE DEBUG INTERRUPT ITSELF; 2,All interrupts will be masked(including debug req) after a CEASE instruction has retired; 3,interrupt is masked if singlestep is enabled.
The io.interrupt comes into RocketCore, and kills the instruction at ID(ctrl_killd), and flows downwards through the pipeline. Also note that the exceptions happen at any pipeline stage will also flow downward. The exception or interrupt will finally assert wb_xcptat WB, notifying the CSR module that an exception needs to be handled. Besides wb_xcpt, these signals are sent to csr module: csr.io.cause := wb_cause; csr.io.pc := wb_reg_pc; csr.io.tval := Mux(tval_valid, encodeVirtualAddress(wb_reg_wdata, wb_reg_wdata), 0.U), etc. Once csr.io.exception := wb_xcpt is asserted, the csr module has to decide where the exception handler is: io.evec := tvec is asserted at the same cycle of wb_xcpt's assertion. And logic in RocketCore will direct the FrontEnd to begin fetching instructions from there(evec):

  io.imem.req.valid := take_pc
  io.imem.req.bits.speculative := !take_pc_wb
  io.imem.req.bits.pc :=
    Mux(wb_xcpt || csr.io.eret, csr.io.evec, // exception or [m|s]ret
    Mux(replay_wb,              wb_reg_pc,   // replay
                                mem_npc))    // flush or branch misprediction

Specifically, below are key signals in terms of determining property of the coming exception:

trapToDebug

val reg_singleStepped = Reg(Bool())
when (io.retire(0) || exception) { reg_singleStepped := true }
when (!io.singleStep) { reg_singleStepped := false }//hjr once entering debug mode(io.singleStep being false.B), reg_singleStepped is deasserted.
assert(!io.singleStep || io.retire <= UInt(1))//hjr when singleStep is asserted, only one insn should be retired(not a problem for RC)
assert(!reg_singleStepped || io.retire === UInt(0))//hjr no further insn should retire once reg_singleStepped := true

val cause =
  //hjr this is slick, 0x8(user) + 0x0(user) = Environment call from U-mode
  //0x8(user) + 0x1(supervisor) = Environment call from S-mode
  //0x8(user) + 0x2(machine) = Environment call from M-mode
  Mux(insn_call, Causes.user_ecall + Mux(reg_mstatus.prv(0) && reg_mstatus.v, PRV.H: UInt, reg_mstatus.prv),
  Mux[UInt](insn_break, Causes.breakpoint, io.cause))
val cause_lsbs = cause(log2Ceil(1 + CSR.busErrorIntCause)-1, 0)
val causeIsDebugInt = cause(xLen-1) && cause_lsbs === CSR.debugIntCause
val causeIsDebugTrigger = !cause(xLen-1) && cause_lsbs === CSR.debugTriggerCause
val causeIsDebugBreak = !cause(xLen-1) && insn_break && Cat(reg_dcsr.ebreakm, reg_dcsr.ebreakh, reg_dcsr.ebreaks, reg_dcsr.ebreaku)(reg_mstatus.prv)


val trapToDebug = Bool(usingDebug) && (reg_singleStepped || causeIsDebugInt || causeIsDebugTrigger || causeIsDebugBreak || reg_debug)

val debugEntry = p(DebugModuleKey).map(_.debugEntry).getOrElse(BigInt(0x800))
val debugException = p(DebugModuleKey).map(_.debugException).getOrElse(BigInt(0x808))

The very first scenario that causes trap-to-debug is single stepping, which has been depicted above. Normally, there are 3 cases that will cause a debug interrupt(causeIsDebugInt):1, explicit haltreq request from dmcontrol; 2, a hart with halt-on-reset option configured just snaps of reset; 3, one of the hart or external triggers which are in the same halt groups with the specified hart halts:


  io.hgDebugInt := hgDebugInt | hrDebugInt

    for (component <- 0 until nComponents) {
    /*
    * hjr debug interrupt may happen by
    * 1,debugger writing to haltreq
    * 2,io.hgDebugInt(component)
    *   2.1: reset of a 'hart-on-reset' hart(this dmi register lives inside DMInner)
    *   2.2: a hart or ext trigger fires inside a haltgroup.(DMI register dmcs2 is for handling haltgroup, it lives inside the DMInner )
    *
    * */
    intnode_out(component)(0) := debugIntRegs(component) | io.hgDebugInt(component)
  }

Normally, an ebreak will cause an breakpoint exception(0x03) and trap to BASE of xtvec; But if one of the ebreakm, ebreaks, ebreaku are set, the ebreak will trap to debug when the hart runs in corresponding priv level:

val causeIsDebugBreak = !cause(xLen-1) && insn_break && Cat(reg_dcsr.ebreakm, reg_dcsr.ebreakh, reg_dcsr.ebreaks, reg_dcsr.ebreaku)(reg_mstatus.prv)

The trigger facility(internal debugging) may also trap to debug if the configured trigger is triggered, further detail will be clarified later when depicting the trigger mechanism. The last scenario that may re-trap-to-debug is the situation of debug exception. Synced exception may happen in debug mode(note that all interrupts are masked in debug mode：io.interrupt := (anyInterrupt && !io.singleStep || reg_singleStepped) && !(reg_debug || io.status.cease)), in this case reg_singleStepped, causeIsDebugInt, causeIsDebugTrigger and causeIsDebugBreak are all deasserted, the final reg_debug inside val trapToDebug = Bool(usingDebug) && (reg_singleStepped || causeIsDebugInt || causeIsDebugTrigger || causeIsDebugBreak || reg_debug) will take effect, meaning that an exception happens during the execution of debug rom code or code constructed in ABSTRACTS and program buffer.
With depictions above in terms of debug interrupt, where to jump is pretty straightforward:

  /*
* hjr
* reg_debug below is for handling the debug exception
* */
val trapToDebug = Bool(usingDebug) && (reg_singleStepped || causeIsDebugInt || causeIsDebugTrigger || causeIsDebugBreak || reg_debug)
val debugEntry = p(DebugModuleKey).map(_.debugEntry).getOrElse(BigInt(0x800))
val debugException = p(DebugModuleKey).map(_.debugException).getOrElse(BigInt(0x808))
/*
* hjr I read somewhere that when an exception happens in Debug Mode, the execution flow should go to a specific location
* that is just for handling debug mode exception, maybe it is debugException here.
* */
/*
* hjr--This is very cool.
* 1, an ebreak in the debug mode(reg_debug being high) will jump to the beginning of the ROM code
* 2, If reg_debug is true.B and there is an exception happening, will jump to the debug exception handler:debugException
* 3, When reg_debug is false.B, this means there is a haltreq(or ebreak for DebugMode, or debugTrigger), just jump to debugEntry
* */
val debugTVec = Mux(reg_debug, Mux(insn_break, debugEntry.U, debugException.U), debugEntry.U)

Note that an ebreak in debug mode is used by ABSTRACT or program buffer to move pc to beginning of debug rom.

trapToNmi
Corresponding logic for nmi trap is follows:

val causeIsRnmiInt = cause(xLen-1) && cause(xLen-2) && (cause_lsbs === CSR.rnmiIntCause || cause_lsbs === CSR.rnmiBEUCause)
val causeIsRnmiBEU = cause(xLen-1) && cause(xLen-2) && cause_lsbs === CSR.rnmiBEUCause
val causeIsNmi = causeIsRnmiInt
val nmiTVecInt = io.interrupts.nmi.map(nmi => nmi.rnmi_interrupt_vector).getOrElse(0.U)
val nmiTVecXcpt = io.interrupts.nmi.map(nmi => nmi.rnmi_exception_vector).getOrElse(0.U)
val trapToNmiInt = usingNMI.B && causeIsNmi
val trapToNmiXcpt = usingNMI.B && !nmie
val trapToNmi = trapToNmiInt || trapToNmiXcpt
val nmiTVec = (Mux(causeIsNmi, nmiTVecInt, nmiTVecXcpt)>>1)<<1

val tvec = Mux(trapToDebug, debugTVec, Mux(trapToNmi, nmiTVec, notDebugTVec))

When an exception happens in the rnmi handler, trapToNmiXcpt will be asserted(val trapToNmiXcpt = usingNMI.B && !nmie); Note that the nmi interrupt or exception handler entry point are sent into the RocketCore through rnmi_interrupt_vector and rnmi_exception_vector instead of what's normally indicated in xtvec csr register. Also note that there is no delegation specification in terms of non-maskable interrupt and exception, the rnmi interrupt and exception are all trap to m-mode.

notDebugTVec
Except for trapToDebug and trapToNmi, normally the interrupt or exception handler locates in a place whose address is specified by the mtvec or stvec(if the corresponding exception or interrupt is delegated, note that the RC impl does not implement user mode interrupts, that is no N extension is supported in the system). The related code is as follows:

val delegate = Bool(usingSupervisor) && reg_mstatus.prv <= PRV.S && Mux(cause(xLen-1), read_mideleg(cause_lsbs), read_medeleg(cause_lsbs))
val delegateVS = reg_mstatus.v && delegate && Mux(cause(xLen-1), read_hideleg(cause_lsbs), read_hedeleg(cause_lsbs))
def mtvecBaseAlign = 2
def mtvecInterruptAlign = {
  require(reg_mip.getWidth <= xLen)
  log2Ceil(xLen)
}
val notDebugTVec = {
  val base = Mux(delegate, Mux(delegateVS, read_vstvec, read_stvec), read_mtvec)
  val interruptOffset = cause(mtvecInterruptAlign-1, 0) << mtvecBaseAlign
  val interruptVec = Cat(base >> (mtvecInterruptAlign + mtvecBaseAlign), interruptOffset)
  val doVector = base(0) && cause(cause.getWidth-1) && (cause_lsbs >> mtvecInterruptAlign) === 0
  Mux(doVector, interruptVec, base >> mtvecBaseAlign << mtvecBaseAlign)
}

Besides notifying the FrontEnd fetching instructions at the handler entry point. The exception context needs to be constructed so that the execution can be restored once exception handler finishes(marked by xret instruction). See code below:

  val epc = formEPC(io.pc)

  when (exception) {//hjr exception and interrupt are regarded as "exception"
    when (trapToDebug) {
      //hjr when trapToDebug and reg_debug are both asserted, this happens at the moment when debug ROM code has an exception or meets an ebreak
      //hjr under this situation, just jump to the debug exception handler(0x808) or debug entry, do nothing here, debug mode register is not destroyed.
      when (!reg_debug) {
        reg_mstatus.v := false//hjr todo what's this for? H extension stuff.
        reg_debug := true
        reg_dpc := epc
        reg_dcsr.cause := Mux(reg_singleStepped, 4, Mux(causeIsDebugInt, 3, Mux[UInt](causeIsDebugTrigger, 2, 1)))
        reg_dcsr.prv := trimPrivilege(reg_mstatus.prv)
        reg_dcsr.v := reg_mstatus.v
        new_prv := PRV.M//hjr debug mode and m-mode are basically the same.
      }
    }.elsewhen (trapToNmiInt) {
      when (reg_rnmie) {
        reg_mstatus.v := false
        reg_mnstatus.mpv := reg_mstatus.v
        reg_rnmie := false.B
        reg_mnepc := epc
        reg_mncause := (BigInt(1) << (xLen-1)).U | Mux(causeIsRnmiBEU, 3.U, 2.U)
        reg_mnstatus.mpp := trimPrivilege(reg_mstatus.prv)
        new_prv := PRV.M
      }
    }.elsewhen (delegateVS && nmie) {
      reg_mstatus.v := true
      reg_vsstatus.spp := reg_mstatus.prv
      reg_vsepc := epc
      reg_vscause := Mux(cause(xLen-1), Cat(cause(xLen-1, 2), 1.U(2.W)), cause)
      reg_vstval := io.tval
      reg_vsstatus.spie := reg_vsstatus.sie
      reg_vsstatus.sie := false
      new_prv := PRV.S
    }.elsewhen (delegate && nmie) {
      reg_mstatus.v := false
      reg_hstatus.spvp := Mux(reg_mstatus.v, reg_mstatus.prv(0),reg_hstatus.spvp)
      reg_hstatus.gva := io.gva
      reg_hstatus.spv := reg_mstatus.v
      reg_sepc := epc
      reg_scause := cause
      reg_stval := io.tval
      reg_htval := io.htval
      reg_mstatus.spie := reg_mstatus.sie
      reg_mstatus.spp := reg_mstatus.prv
      reg_mstatus.sie := false
      new_prv := PRV.S
    }.otherwise {
      reg_mstatus.v := false
      reg_mstatus.mpv := reg_mstatus.v
      reg_mstatus.gva := io.gva
      reg_mepc := epc
      reg_mcause := cause
      reg_mtval := io.tval
      reg_mtval2 := io.htval
      reg_mstatus.mpie := reg_mstatus.mie
      reg_mstatus.mpp := trimPrivilege(reg_mstatus.prv)
      reg_mstatus.mie := false
      new_prv := PRV.M
    }
  }

In general, once an exception or interrupt is taken, the execution flow will transfer to exception(interrupt) handler, which generally runs under more priv level than the interrupted context(unless that specific interrupt or exception is delegated). Therefore we need places to hold the previous priv level, pc and also the exception cause so that the execution flow can be restored after the handler finishes, that's what fields mpp, epc and cause are for. Also, according to the spec(3.1.6.1):

to support nested traps, each privilege mode x has a two-level stack of interrupt-enable bits and privilege modes. xPIE holds the value of the interrupt-enable bit active prior to the trap, and xPP holds the previous privilege mode. When a trap is taken from privilege mode y into privilege mode x, xPIE is set to the value of x IE; xIE is set to 0; and xPP is set to y. When executing an xRET instruction, supposing xPP holds the value y, x IE is set to x PIE; the privilege mode is changed to y; x PIE is set to 1; and xPP is set to U (or M if user-mode is not supported).

I still feel confused about the rationale of global interrupt-enable bit MIE, the spec says that These bits(MIE, SIE and UIE) are primarily used to guarantee atomicity with respect to interrupt handlers in the current privilege mode. This just doesn't clarify well for me. Also, the xPIE field is used to restore xIE once xret is executed. What's the rationale for this two-level XPIE and XIE stack?
Debug interrupt and rnmi interrupt has their own stack(in dcsr and corresponding nmx registers) to hold mpp, epc and cause instead of corresponding fields in mstatus or sstatus. Note that debug exception has trapToDebug & !reg_debug being true, therefore doesn't re-modify the debug interrupt stack, just notifying the FrontEnd begin fetching instructions at Debug_Exception(0x808). rnmi exception(with nmiebeing false) however will always trap m-mode(Somewhere in the rnmi spec asks for this) and will modify the corresponding mstatus fields like MPIE, MPP and MIE, so when a rnmi happens while one m-mode interrupt(interrupt that's not delegated) is being processed, a rnmi exception will smash the mstatus stack, therefore there is no way to return from previous m-mode interrupt.
I have some instinct that the reg_debug and reg_rnmie may partially act like the xIE fields for specific priv level in terms of enabling corresponding interrupt, but this is fuzzy.

The logic for xret(mret, sret, dret, mnret)logic is very straightforward. xret instructions marks end of handler execution. It restores the interrupted execution flow and priv level. For mret and sret, corresponding global interrupt enable bits for that m-mode and s-mode are also restored(MPIE->MIE, true->MPIE):

  when (insn_ret) {
    val ret_prv = WireInit(UInt(), DontCare)
    when (Bool(usingSupervisor) && !io.rw.addr(9)) {//hjr sret
      when (!reg_mstatus.v) {
        reg_mstatus.sie := reg_mstatus.spie
        reg_mstatus.spie := true
        reg_mstatus.spp := PRV.U
        ret_prv := reg_mstatus.spp
        reg_mstatus.v := usingHypervisor && reg_hstatus.spv
        io.evec := readEPC(reg_sepc)
        reg_hstatus.spv := false
      }.otherwise {
        reg_vsstatus.sie := reg_vsstatus.spie
        reg_vsstatus.spie := true
        reg_vsstatus.spp := PRV.U
        ret_prv := reg_vsstatus.spp
        reg_mstatus.v := usingHypervisor
        io.evec := readEPC(reg_vsepc)
      }
    }.elsewhen (Bool(usingDebug) && io.rw.addr(10) && io.rw.addr(7)) {//hjr dret
      ret_prv := reg_dcsr.prv
      reg_mstatus.v := usingHypervisor && reg_dcsr.v && reg_dcsr.prv <= PRV.S
      reg_debug := false//hjr  from haltreq to dret
      io.evec := readEPC(reg_dpc)
    }.elsewhen (Bool(usingNMI) && io.rw.addr(10) && !io.rw.addr(7)) {//hjr mnret
      ret_prv := reg_mnstatus.mpp
      reg_mstatus.v := usingHypervisor && reg_mnstatus.mpv && reg_mnstatus.mpp <= PRV.S
      reg_rnmie := true.B
      io.evec := readEPC(reg_mnepc)
    }.otherwise {//hjr mret
      reg_mstatus.mie := reg_mstatus.mpie
      reg_mstatus.mpie := true
      reg_mstatus.mpp := legalizePrivilege(PRV.U)
      reg_mstatus.mpv := false
      ret_prv := reg_mstatus.mpp
      reg_mstatus.v := usingHypervisor && reg_mstatus.mpv && reg_mstatus.mpp <= PRV.S
      io.evec := readEPC(reg_mepc)
    }

    new_prv := ret_prv
    when (usingUser && ret_prv <= PRV.S) {
      reg_mstatus.mprv := false
    }
  }

Note that io.evec is overloaded to indicate the address of the exception handler(when wb_xcpt is asserted) and address of the interrupted instruction(when csr.io.eret is true).
If an s-mode interrupt is being processed, the SIE in mstatus is cleared, indicating that any S-mode interrupt not be taken under this scenario. However, according to the spec, m-mode interrupt will be able to preempt that s-mode interrupt no matter whether the MIE bit is set because the m-mode interrupt is globally enabled when the hart is executing under less priv levels that m-mode . What's cool is that the m-mode interrupt will not smash the s-mode stack: SPIE, SIE, SPP etc., therefore the interrupted S-mode interrupt handler can be resumed. In general, the x mode exception(interrupt) can always preempt y mode exception if and only if x is more priv than y. Note that Waterman confirmed in this issue that interrupts for lower-privilege modes at 3.1.6.1 means interrupts that is delegated to lower privilege levels.

Besides csr access and exception configuration & handling logic, CSR module also has logic for hardware performance-monitoring facility. The RISC-V includes a basic hardware performance-monitoring scheme, read 3.1.11, 3.1.12 and 3.1.13 of the RISC-V privilege spec for detailed info; There are two categories of monitoring counters, fixed function monitoring counter and event-programmable monitoring counter.
There are 2 fixed function monitoring counter, mcycle and minstret; mcycle holds the number of clock cycles the hart has executed since some arbitrary time in the past. While the counter minstret holds the number of instructions the hart has retired since some arbitrary time in the past. The counter registers have an arbitrary value after system reset, and can be written with a given value.
Another category is event-programmable monitoring counter, there are at maximum 29 additional 64-bit event-programmable monitoring counters, mhpmcounter3–mhpmcounter31. The event selector CSRs, mhpmevent3–mhpmevent31, are MXLEN-bit WARL registers that control which event causes the corresponding counter to increment. The meaning of these events is defined by the platform, but event 0 is defined to mean “no event.” All counters should be implemented, but a legal implementation is to hard-wire both the counter and its corresponding event selector to 0.
It's worth noting that these counters are 64bits wide in both in RV32 and RV64 systems, on RV32 only, reads of the mcycle, minstret, and mhpmcountern CSRs return the low 32 bits, while reads of the mcycleh, minstreth, and mhpmcounternh CSRs return bits 63–32 of the corresponding counter.
The corresponding code in terms of accessing these csrs are as follows:

  val reg_mcountinhibit = RegInit(0.U((CSR.firstHPM + nPerfCounters).W))
  io.inhibit_cycle := reg_mcountinhibit(0)
  val reg_instret = WideCounter(64, io.retire, inhibit = reg_mcountinhibit(2))
  val reg_cycle = if (enableCommitLog) WideCounter(64, io.retire,     inhibit = reg_mcountinhibit(0))
    else withClock(io.ungated_clock) { WideCounter(64, !io.csr_stall, inhibit = reg_mcountinhibit(0)) }
  val reg_hpmevent = io.counters.map(c => Reg(init = UInt(0, xLen)))
    (io.counters zip reg_hpmevent) foreach { case (c, e) => c.eventSel := e }
  val reg_hpmcounter = io.counters.zipWithIndex.map { case (c, i) =>
    WideCounter(CSR.hpmWidth, c.inc, reset = false, inhibit = reg_mcountinhibit(CSR.firstHPM+i)) }


  //hjr csr read
  if (coreParams.haveBasicCounters) {
    read_mapping += CSRs.mcountinhibit -> reg_mcountinhibit
    read_mapping += CSRs.mcycle -> reg_cycle
    read_mapping += CSRs.minstret -> reg_instret

    for (((e, c), i) <- (reg_hpmevent.padTo(CSR.nHPM, UInt(0))// hjr todo conducting padding with UInt or Reg
                         zip reg_hpmcounter.map(x => x: UInt).padTo(CSR.nHPM, UInt(0))) zipWithIndex) {
      read_mapping += (i + CSR.firstHPE) -> e // mhpmeventN
      read_mapping += (i + CSR.firstMHPC) -> c // mhpmcounterN
      //hjr hpmcounter is the read-only shadows of mhpmcounter in S or U mode
      if (usingUser) read_mapping += (i + CSR.firstHPC) -> c // hpmcounterN
      if (xLen == 32) {
        //hjr the performance counters are all 64bits regardless of the XLEN
        read_mapping += (i + CSR.firstMHPCH) -> (c >> 32) // mhpmcounterNh
        if (usingUser) read_mapping += (i + CSR.firstHPCH) -> (c >> 32) // hpmcounterNh
      }
    }

    if (usingUser) {
      read_mapping += CSRs.mcounteren -> read_mcounteren//hjr if there is lower pri level, mcounteren must exist
      read_mapping += CSRs.cycle -> reg_cycle
      read_mapping += CSRs.instret -> reg_instret
    }

    if (xLen == 32) {
      read_mapping += CSRs.mcycleh -> (reg_cycle >> 32)
      read_mapping += CSRs.minstreth -> (reg_instret >> 32)
      if (usingUser) {
        read_mapping += CSRs.cycleh -> (reg_cycle >> 32)
        read_mapping += CSRs.instreth -> (reg_instret >> 32)
      }
    }
  }
  //hjr write 
  //hjr write init value to the event counter and event selector
  for (((e, c), i) <- (reg_hpmevent zip reg_hpmcounter) zipWithIndex) {
    writeCounter(i + CSR.firstMHPC, c, wdata)
    //hjr normalize the event selector register
    when (decoded_addr(i + CSR.firstHPE)) { e := perfEventSets.maskEventSelector(wdata) }
  }
  if (coreParams.haveBasicCounters) {
    //hjr mcountinhibit bit [1] is tied zero means that mcycle register can not be inhibited
    when (decoded_addr(CSRs.mcountinhibit)) { reg_mcountinhibit := wdata & ~2.U(xLen.W) }  // mcountinhibit bit [1] is tied zero
    writeCounter(CSRs.mcycle, reg_cycle, wdata)
    writeCounter(CSRs.minstret, reg_instret, wdata)
  }

  //In class EventSets
  def maskEventSelector(eventSel: UInt): UInt = {
  // allow full associativity between counters and event sets (for now?)--hjr todo why?
  /*
  * hjr
  * setMask is for masking the event class(7:0) at max
  * maskMask is for masking the specific event
  * */
  val setMask = (BigInt(1) << eventSetIdBits) - 1
  val maskMask = ((BigInt(1) << eventSets.map(_.size).max) - 1) << maxEventSetIdBits
  eventSel & (setMask | maskMask).U
  }

The spec also defines mcountinhibit, the counter-inhibit register mcountinhibit is a 32-bit WARL register that controls which of the hardware performance-monitoring counters increment. When the CY, IR, or HPMn bit in the mcountinhibit register is clear, the cycle, instret, or hpmcountern register increments as usual. When the CY, IR, or HPMn bit is set, the corresponding counter does not increment. Also note that mcountinhibit[1] corresponds to the mtime counter, mtime is not a csr, it's exposed as a memory mapped machine mode read-write register in CLINT, refer to 3.1.10 of priv spec for detailed info. mtime is shared among cores in the system, therefore RISCV spec decides that it can not be inhibited using the mcountinhibit mechanism.

Now, let's delve into the programmable event counter mess; Refer to section 4.9 for a more detailed depiction.
Specifically, the RC impl defines EventSets and EventSet. An EventSets has a series of EventSets in it. Each EventSet corresponds to a set of relating events, there are actually 3 EventSets in RC, instruction commit events, microstructural events and memory system events. Therefore, RC impl interprets event encoded in mhpmeventx in a hierachical way: some LSB(mhpmeventX[7:0]) bits represents which specific EventSet this event is in(mhpmeventX[7:0] = 0x0 for instruction commit events, 0x1 for microstructural events and 0x2 for memory system events), and the left MSB bits represents the specific event mask, one or more events can be programmed by setting the respective Event Mask bit for a given EventSet. Multiple events(RC impl regards left MSB bits of mhpmeventx except mhpmeventX[7:0] as one-hot encoding) will cause the counter to increment any time any of the selected events occur. The specific event selector mask encoding can be found in Table 18 of U54_MC core complex manual.
Below is the implementation of EventSet and EventSets:

/*
* hjr representation of a set of events
* gate: an event gate which decides whether a specific event happens(represented by the first parameter of this gate function--mask) among a
* series of events(specified by the second parameter of this gate function)in which some of them is firing.
* events: the event set and the specific firing condition for each event
* */
class EventSet(val gate: (UInt, UInt) => Bool, val events: Seq[(String, () => Bool)]) {
  def size = events.size
  val hits = Wire(Vec(size, Bool()))

  def check(mask: UInt) = {
    hits := events.map(_._2())
    gate(mask, hits.asUInt)
  }
  def dump(): Unit = {
    for (((name, _), i) <- events.zipWithIndex)
      when (check(1.U << i)) { printf(s"Event $name\n") }
  }
  def withCovers: Unit = {
    events.zipWithIndex.foreach {
      case ((name, func), i) => property.cover(gate((1.U << i), (func() << i)), name)
    }
  }
}

class EventSets(val eventSets: Seq[EventSet]) {
  def maskEventSelector(eventSel: UInt): UInt = {
    // allow full associativity between counters and event sets (for now?)--hjr todo why?
    /*
    * hjr
    * setMask is for masking the event class(7:0) at max
    * maskMask is for masking the specific event
    * */
    val setMask = (BigInt(1) << eventSetIdBits) - 1
    val maskMask = ((BigInt(1) << eventSets.map(_.size).max) - 1) << maxEventSetIdBits
    eventSel & (setMask | maskMask).U
  }
  //hjr decode the eventSel value,
  //hjr 1-->the event class
  //hjr 2-->the specific event in that class
  private def decode(counter: UInt): (UInt, UInt) = {
    require(eventSets.size <= (1 << maxEventSetIdBits))
    require(eventSetIdBits > 0)
    (counter(eventSetIdBits-1, 0), counter >> maxEventSetIdBits)
  }
  /*
  * hjr evaluate whether a specific event indicated by eventSel happens in one specific eventset
  * the event selector has two fields, the lower 8(max) bits indicates the class which this event belongs.
  * the upper bits indicates the specific event
  * */
  def evaluate(eventSel: UInt): Bool = {
    val (set, mask) = decode(eventSel)
    val sets = for (e <- eventSets) yield {
      require(e.hits.getWidth <= mask.getWidth, s"too many events ${e.hits.getWidth} wider than mask ${mask.getWidth}")
      e check mask
    }
    sets(set)
  }

  def cover() = eventSets.foreach { _ withCovers }

  private def eventSetIdBits = log2Ceil(eventSets.size)
  private def maxEventSetIdBits = 8

  require(eventSetIdBits <= maxEventSetIdBits)
}

Note that method check in EventSet decides whether events indicated by mhpmeventX[31:8] of a specific event selector happen under current setting of events of a specific EventSet. Note the first class constructor parameter of EventSet: val gate: (UInt, UInt) => Bool, I feel like this functional trick is overly complicated. gate directly decides whether specific events(represented by the first parameter of this gate function--mask, note this is just the event mask part of a whole event selector) happen among a series of events(the second parameter of the function, generated from hits := events.map(_._2())). Also note that private def decode(counter: UInt): (UInt, UInt) in EventSetsdecodes a specific event selector encoding to two parts: the event class(counter(eventSetIdBits-1, 0)) and the specific event mask in that class(counter >> maxEventSetIdBits); With check of EventSet and decode of EventSets, method evaluate in EventSets decides whether events indicated by value in one mhpmeventX actually happen.
Since the event signals are in the RocketCore while the mhpmeventX and mhpmcounterX are in CSR module, there exists interconnections among the two module in terms of programmable event counting:

val counters = Vec(nPerfCounters, new PerfCounterIO)
class PerfCounterIO(implicit p: Parameters) extends CoreBundle
    with HasCoreParameters {
  val eventSel = UInt(OUTPUT, xLen)
  val inc = UInt(INPUT, log2Ceil(1+retireWidth))
}

Note that for each pair of mhpmeventX and mhpmcounterX, the eventSel coming out of CSR module is driven by the specific event selector register specified in mhpmeventX:

  val reg_hpmevent = io.counters.map(c => Reg(init = UInt(0, xLen)))
    (io.counters zip reg_hpmevent) foreach { case (c, e) => c.eventSel := e }

For each event selector in mhpmeventX, the RocketCore has logic to evaluate whether the selected events happen, and drive PerfCounterIO.inc back to CSR module, consequently the corresponding mhpmcounterX will be incremented depending on that inc:

  csr.io.counters foreach { c => c.inc := RegNext(perfEvents.evaluate(c.eventSel)) }
  val reg_hpmcounter = io.counters.zipWithIndex.map { case (c, i) =>
    WideCounter(CSR.hpmWidth, c.inc, reset = false, inhibit = reg_mcountinhibit(CSR.firstHPM+i)) }

This post is becoming tedious rigmarole, but there are a lot more to go:
RISC-V Trigger Implementation:
The RISC-V debug spec defined Trigger Module(TM) in Chapter 5(Sdtrig ISA Extension), also refer section 15.3 in U54 MC core complex manual for further details. In general, there may be some number of triggers implemented in a soc, a trigger is fired when an instruction in a specific location is executed, or a load or store from a specific location(the trigger can also be configured to trigger on data values, the data value loaded or stored, or the instruction executed. select field in mcontrol specifies this). Each trigger defines a series of triggering condition besides the specific address or data to compare(tdata2), for example: under which priv level will this trigger take effect(m,s,u in mcontrol), the specific comparing operation(match field in mcontrol), etc. If one of the triggers fires, a specific action is conducted; According to configuration of that trigger, for example: 0 in action of mcontrol will cause a breakpoint exception happening, otherwise execution will trap to debug mode if 1 in action of mcontrol. Triggers in RISC-V can be configured in debug mode or m-mode(toggled via dmode in tdata1) through 4 csrs, tselect, tdata1, tdata2, tdata3; Write an index to tselect will cause a specific trigger that corresponds to that index being selected, here being selected means access of tdata1 tdata2 tdata3 will go to that specific trigger instead of other ones. See code below:

  //representation of a trigger 
  class BP(implicit p: Parameters) extends CoreBundle()(p) {
  val control = new BPControl//hjr as tdata1
  val address = UInt(vaddrBits.W)//hjr as tdata2
  val textra  = new TExtra//hjr as tdata3

  def contextMatch(mcontext: UInt, scontext: UInt) =
    (if (coreParams.mcontextWidth > 0) (!textra.mselect || (mcontext(textra.mvalueBits-1,0) === textra.mvalue)) else true.B) &&
    (if (coreParams.scontextWidth > 0) (!textra.sselect || (scontext(textra.svalueBits-1,0) === textra.svalue)) else true.B)

  def mask(dummy: Int = 0) =
    (0 until control.maskMax-1).scanLeft(control.tmatch(0))((m, i) => m && address(i)).asUInt

  def pow2AddressMatch(x: UInt) =
    (~x | mask()) === (~address | mask())

  def rangeAddressMatch(x: UInt) =
    (x >= address) ^ control.tmatch(0)

  def addressMatch(x: UInt) =
    Mux(control.tmatch(1), rangeAddressMatch(x), pow2AddressMatch(x))
}


  val reg_mcontext = (coreParams.mcontextWidth > 0).option(RegInit(0.U(coreParams.mcontextWidth.W)))
  val reg_scontext = (coreParams.scontextWidth > 0).option(RegInit(0.U(coreParams.scontextWidth.W)))

  val reg_tselect = Reg(UInt(width = log2Up(nBreakpoints)))
  val reg_bp = Reg(Vec(1 << log2Up(nBreakpoints), new BP))//hjr extra waste if nBreakpoints = 1, 1 << log2Up(nBreakpoints) = 2
  val read_mapping = LinkedHashMap[Int,Bits](
    CSRs.tselect -> reg_tselect,
    CSRs.tdata1 -> reg_bp(reg_tselect).control.asUInt,
    CSRs.tdata2 -> reg_bp(reg_tselect).address.sextTo(xLen),
    CSRs.tdata3 -> reg_bp(reg_tselect).textra.asUInt)

  //write
      if (nBreakpoints > 0) {
      when (decoded_addr(CSRs.tselect)) { reg_tselect := wdata }

      for ((bp, i) <- reg_bp.zipWithIndex) {
        //hjr dmode is 1->only debug mode can write the tdata registers(tdata1, tdata2, tdata3) at the selected tselect
        //hjr dmode is 0->both debug mode and m-mode can write the tdata registers(tdata1, tdata2, tdata3) at the selected tselect
        //hjr in terms of RC implementation,M-mode and debug mode are basically the same, except that debug mode will assert reg_debug
        when (i === reg_tselect && (!bp.control.dmode || reg_debug)) {
          when (decoded_addr(CSRs.tdata2)) { bp.address := wdata }
          when (decoded_addr(CSRs.tdata3)) {
            if (coreParams.mcontextWidth > 0) {
              //hjr mselectPos(25->32, 50->64) is the MSB of mselect field:mselect->0 disabled mselect->4 mcontext
              bp.textra.mselect := wdata(bp.textra.mselectPos)
              bp.textra.mvalue  := wdata >> bp.textra.mvaluePos
            }
            if (coreParams.scontextWidth > 0) {
              //hjr sselectPos(0) is the LSB of sselect field:sselect->0 disabled sselect->1 scontext
              bp.textra.sselect := wdata(bp.textra.sselectPos)
              bp.textra.svalue  := wdata >> bp.textra.svaluePos
            }
          }
          when (decoded_addr(CSRs.tdata1)) {
            bp.control := wdata.asTypeOf(bp.control)

            val prevChain = if (i == 0) false.B else reg_bp(i-1).control.chain
            val prevDMode = if (i == 0) false.B else reg_bp(i-1).control.dmode
            val nextChain = if (i >= nBreakpoints-1) true.B else reg_bp(i+1).control.chain
            val nextDMode = if (i >= nBreakpoints-1) true.B else reg_bp(i+1).control.dmode
            val newBPC = readModifyWriteCSR(io.rw.cmd, bp.control.asUInt, io.rw.wdata).asTypeOf(bp.control)
            //hjr Don't allow dmode to be set if the previous trigger doesn't belong to D-mode and has chain set.
            //hjr todo When this trigger has chain = 0, does the setting of previous dmode have effect on dMode of this trigger?
            /*
            * hjr to answer the above todo
            * The spec says:
            * A trigger chain starts on the first trigger with
            * chain = 1 after a trigger with chain = 0, or simply
            * on the first trigger if that has chain = 1. It ends
            * on the first trigger after that which has chain = 0.
            * This final trigger is part of the chain
            * Which means that the first trigger with chian=0 after a continuous series of triggers with all there chain being asserted is also part of
            * that trigger group;
            * */
            val dMode = newBPC.dmode && reg_debug && (prevDMode || !prevChain)
            bp.control.dmode := dMode
            /*hjr
            * This is slick
            * If dMode is true.B, the action can only be 1--entering the debug mode, newBPC.action holds 1 for sure(todo assertion elsewhere??)
            * If the newBPC.action > 1.U, this trigger is for trace
            * If none above is true, then this trigger will initiate a breakpoint exception: 0.U therefore.
            * */
            when (dMode || (newBPC.action > 1.U)) { bp.control.action := newBPC.action }.otherwise { bp.control.action := 0.U }
            //hjr Don't allow chain to be set if this trigger doesn't belong to D-mode but the next trigger does.
            //hjr Don't set the chain bit when both the previous or next breakpoint has its chain bit set. -Don't allow chains longer than 2.
            bp.control.chain := newBPC.chain && !(prevChain || nextChain) && (dMode || !nextDMode)
          }
        }
      }
    }

  if (nBreakpoints <= 1) reg_tselect := 0//hjr if there is only one bp, reg_tselect stays at 0.
  //hjr todo does this comply with the last connection wins; I think so.
  for (bpc <- reg_bp map {_.control}) {
    bpc.ttype := bpc.tType//hjr mcontrol
    bpc.maskmax := bpc.maskMax
    bpc.reserved := 0
    bpc.zero := 0
    bpc.h := false
    if (!usingSupervisor) bpc.s := false
    if (!usingUser) bpc.u := false
    if (!usingSupervisor && !usingUser) bpc.m := true//hjr trigger can only be used in m-mode if no other mode exists.
    when (reset) {
      bpc.action := 0.U
      bpc.dmode := false
      bpc.chain := false
      bpc.r := false
      bpc.w := false
      bpc.x := false
    }
  }

Note that the RC impl represents a trigger as a triple: (control,address,textra), control is just aggregate of type, dmode and fields in mcontrol(anyone who reads this should really read the debug spec first so that these fields can make sense for you), note that the RC only supports trigger of type 2: that is mcontrol trigger. address is predefined address or data value that needs to be compared with, textra adds extra restriction for this trigger(refer to textra32 or textra64 of debug spec and the contextMatch method in Class BP implementation for further info). When a trigger is selected using tselect, access to tdata1, tdata2, tdata3 is the same as access to control ,address and textra of the corresponding trigger. The depiction in debug spec is holistic while the RC impl make some fields hardwired, for instance sizehi, sizelo, hit, select(compare only corresponding address instead of data values), timing(the action of trigger happens before the instruction that triggered it is committed) are all fixed at 0, these hardwired fields will hugely simplify implementation difficulties.
Also note RISC-V debug spec defines scheme so that triggers can be chained. A chain of triggers are a series of triggers with chain being asserted appended by one that has chain being 0， the last trigger with chain being asserted is included in the chain. In RC impl, only two adjacent(adjacency in terms of index) triggers can be chained(!(prevChain || nextChain) in bp.control.chain := newBPC.chain && !(prevChain || nextChain) && (dMode || !nextDMode)). The action of the last trigger in a chain will be activated if and only if all of the triggers in a chain get triggered. A typical scenario for chain triggers is to provide breakpoints on an range, two neighboring breakpoints can be combined with the chain bit. The first breakpoint can be set to match on an address using match of 2 (greater than or equal). The second breakpoint can be set to match on address using match of 3 (less than). Setting the chain bit on the first breakpoint prevents the second breakpoint from firing unless they both match.
Though I don't know the rationale, but below is the requirement of debug spec in terms of coupling between chain and dmode:

hardware must zero chain in writes to mcontrol that set dmode to 0 if the next trigger has dmode of 1.
hardware should ignore writes to mcontrol that set dmode to 1 if the previous trigger has both dmode of 0 and chain of 1.
That is to say, triggers in a chain should have the same dmode configuration, which is reasonable. This is the reason why RC codebase has extra logic in terms of writing dmode and chain field:

//hjr note that dMode field can only be configured at d-mode
val dMode = newBPC.dmode && reg_debug && (prevDMode || !prevChain)
bp.control.dmode := dMode
bp.control.chain := newBPC.chain && !(prevChain || nextChain) && (dMode || !nextDMode)

The triggers logic in RC are located in CSR module. Since firing of trigger is based on comparison between the address in trigger triple and the address of instruction(or address to load to or store from), trigger related signals therefore are sent to the RocketCore via following interface:

  io.bp := reg_bp take nBreakpoints
  io.mcontext := reg_mcontext.getOrElse(0.U)
  io.scontext := reg_scontext.getOrElse(0.U)

These signal along with other auxiliary info like csr.io.status, csr.io.mcontext and csr.io.scontext are as follows in RocketCore:

  val bpu = Module(new BreakpointUnit(nBreakpoints))
  bpu.io.status := csr.io.status
  bpu.io.bp := csr.io.bp
    /*
    * hjr note time here:
    * ibuf.io.pc -> ID stage->may assert debug_if and xcpt_if, also the bpu.io.bpwatch.map { bpw => bpw.ivalid(0) } is also asserted at this stage.
    * bpu.io.ea:= mem_reg_wdata ->mem stage may assert xcpt_ld xcpt_st debug_st debug_ld, the watch point
    * bpu.io.bpwatch.map { bpw => (bpw.rvalid(0) && mem_reg_load) || (bpw.wvalid(0) && mem_reg_store) } is also asserted at this stage(mem)
    *
    * watchpoint hit info will flow to wb stage, and notify the corresponding trace module
    * wb_reg_wphit := mem_reg_wphit | bpu.io.bpwatch.map { bpw => (bpw.rvalid(0) && mem_reg_load) || (bpw.wvalid(0) && mem_reg_store) }
    * xcpt_if debug_if etc. will flow to wb stage, and notify the csr module that a breakpoint or halt req happens
    * */
  bpu.io.pc := ibuf.io.pc
  bpu.io.ea := mem_reg_wdata
  bpu.io.mcontext := csr.io.mcontext
  bpu.io.scontext := csr.io.scontext

Inside RocketCore, there is a BreakpointUnit, this Unit decides whether any trigger will match by comparing the triggering criteria (address in the trigger triple) with the pc or load&store address currently in the pipeline:

/*
* hjr
* this breakpoint unit uses signal coming from the pipeline (pc, ea) and CSR(bp, status,mcontext,scontext) to calculate
* 1,whether a trigger will fire a breakpoint exception or entering debug mode
* xcpt_if  xcpt_ld  xcpt_st for breakpoint  exception
* debug_if debug_ld debug_st for entering debug mode
* bpwatch for trace related firing
*
* a trigger with chain = 0  may be a normal trigger without being chained, or the last trigger of a chain group.
* */
class BreakpointUnit(n: Int)(implicit val p: Parameters) extends Module with HasCoreParameters {
  val io = IO(new Bundle {
    val status = Input(new MStatus())
    val bp = Input(Vec(n, new BP))
    val pc = Input(UInt(vaddrBits.W))
    val ea = Input(UInt(vaddrBits.W))
    val mcontext = Input(UInt(coreParams.mcontextWidth.W))
    val scontext = Input(UInt(coreParams.scontextWidth.W))
    val xcpt_if  = Output(Bool())//hjr instruction fetching
    val xcpt_ld  = Output(Bool())//hjr load
    val xcpt_st  = Output(Bool())//hjr store
    val debug_if = Output(Bool())//hjr instruction fetching entering debug mode
    val debug_ld = Output(Bool())//hjr load entering debug mode
    val debug_st = Output(Bool())//hjr store entering debug mode
    val bpwatch  = Output(Vec(n, new BPWatch(1)))
  })

  io.xcpt_if := false
  io.xcpt_ld := false
  io.xcpt_st := false
  io.debug_if := false
  io.debug_ld := false
  io.debug_st := false

  (io.bpwatch zip io.bp).foldLeft((true.B, true.B, true.B)) { case ((ri, wi, xi), (bpw, bp)) =>
    val en = bp.control.enabled(io.status)//hjr this trigger is enabled under io.status
    val cx = bp.contextMatch(io.mcontext, io.scontext)//hjr context match
    val r = en && bp.control.r && bp.addressMatch(io.ea) && cx
    val w = en && bp.control.w && bp.addressMatch(io.ea) && cx
    val x = en && bp.control.x && bp.addressMatch(io.pc) && cx
    val end = !bp.control.chain
    val action = bp.control.action

    bpw.action := action
    bpw.valid(0) := false.B
    bpw.rvalid(0) := false.B
    bpw.wvalid(0) := false.B
    bpw.ivalid(0) := false.B

    when (end && r && ri) { io.xcpt_ld := (action === 0.U); io.debug_ld := (action === 1.U); bpw.valid(0) := true.B; bpw.rvalid(0) := true.B }
    when (end && w && wi) { io.xcpt_st := (action === 0.U); io.debug_st := (action === 1.U); bpw.valid(0) := true.B; bpw.wvalid(0) := true.B }
    when (end && x && xi) { io.xcpt_if := (action === 0.U); io.debug_if := (action === 1.U); bpw.valid(0) := true.B; bpw.ivalid(0) := true.B }

    (end || r, end || w, end || x)
  }
}

io.pc is the instruction fetching address(bpu.io.pc := ibuf.io.pc) at ID stage while io.ea is the load & store address(mem_reg_wdata) at MEM(the load&store request initiated at EX, and the result will pour in at WB, so it's reasonable to detect trigger match in MEM), BreakpointUnit decides whether these address will fire any of the triggers, each item in io.bpwatch corresponds to the triggering situation for one specific trigger. bpw.valid(0) := true.B indicates this trigger is firing holistically regardless the reason of match, while bpw.w[r|i]valid(0) := true.Bspecifically means that this trigger is firing as store address(load address| instruction address) match. This signal io.bpwatch are send to trace modules.
The BreakpointUnit also decides for current address in io.pc or io.ea whether the trigger sets as a whole is fired(no matter some of the triggers are fired, or all triggers in a chain set get fired) and what action will be conducted(RC only supports trap to debug mode and raise a breakpoint exception). Specifically, io.xcpt[debug]_if[ld,st] indicates there is trigger firing because of if[ld|st] address match, and also specifies the subsequent action(xcpt->breakpoint exception debug -> debug interrupt). These xcpt[debug]_if[ld,st]signal will flow downwards in the pipeline, and assert csr.io.exception := wb_xcpt into csr module for exception handling, which has been depicted in previous clarification in terms of interrupt & exception handling.
I will seal this for now, but there are other contents like impl of PMP which is not covered in this post. The PMP stuff are straightforward once you read the spec.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Code Anatomy on CSR module #3028

{{title}}

Replies: 0 comments

Select a reply

Code Anatomy on CSR module #3028

DecodeTheEncoded Aug 18, 2022

Replies: 0 comments

DecodeTheEncoded
Aug 18, 2022