Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional support for hardware register save e.g. shadow register sets? #329

Open
brucehoult opened this issue Apr 23, 2023 · 3 comments
Labels
post-v1.0 To be handled after v1.0

Comments

@brucehoult
Copy link

brucehoult commented Apr 23, 2023

We currently support two methods of writing interrupt handlers in C (etc):

  • assembly-language handler that saves ra, a0-a7, t0-t6 then calls a standard ABI function that can freely use those registers (as usual) and saves s0-s11 if it uses them. When the C function returns all registers are restored before mret.

  • direct call of C __attribute__((interrupt)) function that saves only and exactly the registers it uses (if self-contained), and saves all volatile registers in the event that it must call a standard ABI function. The function restores all registers before using mret to return.

I propose that we consider adding a third option which might or might not be implemented in any particular core but if it is present has standardised interface and functionality.

  • some hardware mechanism exists that saves the volatile registers (ra, a0-a7, t0-t6), hopefully more quickly than software can, calls a standard ABI handler function, and when it returns the hardware quickly restores the saved registers.

In many cases the hardware save/restore might be an instantaneous (single cycle, maybe in parallel with fetching the interrupt vector) swapping of register sets. In other cases it might save registers to the stack and might not be any faster than normal store/load instructions, other than in not having to do instruction fetches (which might be cache misses, if there is an icache). Or it might be able to use a wider interface to dcache/RAM than normal stores&loads would.

As with Zcmp and Zcmt, this is more likely to be considered a desirable feature by the microcontroller community.

In particular, it matches the interrupt handling on ARMv7-M. Cortex-M{0+,3,4,7} save volatile registers to the stack and put a "fake" return address into LR ... 0xFFFFFFEn or 0xFFFFFFFn depending on whether FP registers have or have not been saved and depending on the exact interrupt return behaviour desired. The core then calls a standard ABI C function. A normal ret instruction (or any instruction that loads such a value into PC) then results in interrupt return.

RISC-V vendor WCH already implement a similar feature in their increasingly popular cores, which they call HPE (Hardware Prologue/Epilog). Their hardware saves the volatile registers then calls the handler, which is a normal ABI function except that it must end with mret not ret. They provide a special compiler that implements a special __attribute__((interrupt"WCH-Interrupt-fast"))).

This is a bit inconvenient, especially for people who want to use a standard toolchain. The correct work-around is to make a standard ABI function void my_handler() and also __attribute__((naked)) void my_handler_hpe(){asm("call my_handler; mret");}.

I have suggested to WCH that they also allow a function ending with ret to work (i.e. a completely standard C function), and a possible mechanism to achieve this: if ret opcode is encountered AND in_interrupt_handler AND sp == sp_on_entry_to_handler THEN execute mret. They seem interested in this idea for future core revs.

The Cortex-M technique of putting a distinguished value in ra (different ones for mret, sret, uret) would be simpler, if it is not patented.

Anyway, I think it is essential that if a hardware save/restore feature is standardised then it should be designed to call a completely standard ABI function, no special attributes needed.

Note that the lowest-end WCH chip, the CH32V003 (the "10c RISC-V" chip) pushes the 10 saved registers (it is an RV32E core) to a 48 byte stack frame, while the higher end WCH cores have multiple sets of shadow registers in the core. The same program code works with either implementation.

Once again, this is a feature more likely to be desired by users of microcontrollers.

A sequencer to save registers is a little additional hardware (possibly already present for Zcmp) in a simple core, but a PITA in an OoO core. Multiple extra register sets is a significant hardware investment that might well be more profitably used for other purposes (or simply avoided).

So it is not for everyone. But some of the market wants it, at least one major vendor is already implementing it and others may well follow.

Therefore, it is best if it is standardized.

@jnk0le
Copy link

jnk0le commented May 7, 2023

Regarding the __attribute__((interrupt("WCH-Interrupt-fast"))) inconvinience.
It can be solved by a standardised annotation of registers prestacked at function entry.

e.g. in case of ch32v003 it would be __attribute__((interrupt, prestacked("x1,x5-x7,x10-x15")))

a snippet from my Xteic spec:

==== prestacked annotation

Currently there is no universal solution to indicate which registers in interrupt handlers
can be freely used without stacking them.

- `\\__attribute__\((interrupt))` makes all registers callee saved and uses mret to return.
- `\\__attribute__\((interrupt("SiFive-CLIC-preemptible")))` extends regular interrupt by CLIC preemption
- `\\__attribute__\((interrupt("WCH-Interrupt-fast")))` requires custom build toolchain and is bound 
to selected ABI by `-mabi=` command line parameter, still uses mret
- Or just a plain C function that requires prestacking of all caller saved registers, reuses standard 
return mechanism to exit interrupt context

Even worse there are already hardware stackers designed for ilp32e and ilp32. When the new and better 
ABI will be introduced, it will be impossible to use with pre-existing HW stackers. The same applies 
to creating HW stackers that stack less registers to optimize interrupt latency.

Therefore we need universal way to annotate which registers are available for use in a given function
as a defacto calller saved one (aka create custom calling convention)

- `prestacked("")` attribute
- no whitespaces in string parameter
- register range cover all registers between and including specified (`x4-x6` is equivalent to `x4,x5,x6`)
- registers/ranges are separated by comma
- CSRs taking part in calling conventions are also subject to this mechanism
- must use raw names instead of ABI mnemonics as to make it ABI agnostic (more portable)
- registers must be be sorted (integer, floating point, vector, custom, then by lowest numbered)
- CSRs must be put after the architectural regfiles, those don't have to be sorted
- must not collide with `\\__attribute__\((interrupt))` as to support "legacy" handler return mechanisms
- for interop with <<IPRA - Inter procedural register allocation, IPRA>>, unnammed custom CSRs 
also have to be covered. e.g. `csr:0x801` or `csr:0x803-0x811` for a range

psABI caller saved:

`\\__attribute__\((prestacked("x5-x7,x10-x17,x28-x31")))`

Simplified range (e.g. shadow register file):

`\\__attribute__\((prestacked("x8-x15")))`

psABI with floating point, caller saved:

`\\__attribute__\((prestacked("x5-x7,x10-x17,x28-x31,f0-f7,f10-f17,f28-f31,fcsr")))`

ch32v003 irq (ilp32e + PFIC HW stacker, assuming `ra` doesn't have some undocumented use)

`\\__attribute__\((interrupt, prestacked("x1,x5-x7,x10-x15")))`

NOTE: unannotated `ra` is assumed as a valid return address, otherwise a special return mechanism must be
used

===== optimization for `noreturn` functions

gcc/llvm compilers can purge the epilogue (even down the call tree) by automatic 
detection of infinite loop or by using `\\__attribute__\((noreturn))` or `__builtin_unreachable()`.

It is not the case on prologues though, leading to waste of stack and codespace in the most typical
embedded scenario of main or thread functions with an infinite loops.

This missing optimization is intentional <<noreturnprologue>> to allow backtracing 
(`abort()` etc.) and throwing exceptions (of course under -fno-exceptions and exception less code)

By abusing the "prestacked annotation" we can get rid of this prologue 
by "prestacking" all of the available registers. +
e.g. `\\__attribute__\((noreturn, prestacked("x1,x4-x31,f0-f31,fcsr")))`

NOTE: addition of `noreturn_nobacktrace_noexcept` attribute is very unlikely, optimizing 
regular `noreturn` attribute is even less.

@dansmathers
Copy link
Collaborator

dansmathers commented May 10, 2023

It seems like this proposal could be standalone and apply to any interrupt controller (CLIC, CLINT, AIA)? Make a proposal to the SIG and see if they have it become a new task group? linking issue #108.

@kasanovic kasanovic added the post-v1.0 To be handled after v1.0 label May 23, 2023
@jnk0le
Copy link

jnk0le commented Apr 29, 2024

FYI, official proposal for prestacked annottion is here: riscv-non-isa/riscv-c-api-doc#53

It allows CLIC to be extended with minimal set of shadow or stacked registers without the need for extra compiler attributes (except the prestacked one)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
post-v1.0 To be handled after v1.0
Projects
None yet
Development

No branches or pull requests

4 participants