Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refer mapping symbol into R_RISCV_RELAX for rvc/norvc relaxations. #393

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Nelson1225
Copy link
Collaborator

Hey guys,

Original discussion,
#116

The idea came from @jrtc27 one year ago, to resolve the problem that both rvc and norvc codes are in the same object. Not really sure how to write the psabi correctly, so needs your helps to review the changes. Thanks :-)

cc @kito-cheng @jrtc27 @jim-wilson @rui314 @MaskRay @asb

@Nelson1225
Copy link
Collaborator Author

The implementation of binutils for this proposal,

RISC-V: Refer mapping symbol to R_RISCV_RELAX for rvc relaxations, https://sourceware.org/pipermail/binutils/2022-September/123191.html

@MaskRay
Copy link
Collaborator

MaskRay commented Jul 24, 2023

RISC-V: Refer mapping symbol to R_RISCV_RELAX for rvc relaxations, https://sourceware.org/pipermail/binutils/2022-September/123191.html

Thank you for proposing this! Oh, I did not know the patch has been there for a long time.

I think using the symbol index part of r_info nicely describes the semantics of a R_RISCV_RELAX relocation.
Technically, we can make R_RISCV_RELAX relocations do more awfulmagical relaxation for various ISA extensions...
But more likely, R_RISCV_RELAX will remain used only to distinguish RVC and non-RVC, as new relaxations should use new relocation types, there is not much we need from a marker relocation type.

I wonder whether we should use the r_addend field:

  • 0: e_flags & EF_RISCV_RVC decides the RVC state
  • 1: non-RVC, i.e., c.j/c.jal cannot be used for call/tail relaxation

By utilizing r_addend, the linker doesn't have to scan the referenced symbol name.
As a minor issue, we won't be able to use REL, but we already cannot use REL due to R_RISCV_ALIGN.

(I'll be out of town for ~2 weeks and will be slow in responding.)

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 24, 2023

CHERI-RISC-V is an example of where you can do more things. Zce's existence shows that RVC alone isn't enough. Plus you have all the fun of things like 48-bit encodings. And then who knows what other extensions will exist. Let's not try and be clever here and cause even more issues in the future, let's just do the general thing that should always work and requires no thought. I.e. no, abusing the addend like that is too narrow.

@MaskRay
Copy link
Collaborator

MaskRay commented Jul 24, 2023

CHERI-RISC-V is an example of where you can do more things. Zce's existence shows that RVC alone isn't enough. Plus you have all the fun of things like 48-bit encodings. And then who knows what other extensions will exist. Let's not try and be clever here and cause even more issues in the future, let's just do the general thing that should always work and requires no thought. I.e. no, abusing the addend like that is too narrow.

Do you have a self-contained example that there are clever uses of R_RISCV_RELAX and introducing a new one is insufficient?

The r_addend suggestion is primarily about linker performance, but is also relevant to another merit I want to uphold: the linker behavior should not be changed by magical symbol names (_c in the mapping symbol referenced by the R_RISCV_RELAX relocation).

@jrtc27
Copy link
Collaborator

jrtc27 commented Jul 24, 2023

is also relevant to another merit I want to uphold: the linker behavior should not be changed by magical symbol names

Then come up with a viable alternative that has the same flexibility as using the ISA string without reinventing the wheel.

@kito-cheng
Copy link
Collaborator

I prefer pointer to dummy mapping symbol since that's more flexible to deal with different ISA extension including the vendor extensions.

Also r_addend is only 32 bits (due to RV32...), and we will consume at least 5 bits immediately for c, zcmt, zcf, zcd and zcb for this scheme, so it might...run out soon in next few years I suspect.

@MaskRay
Copy link
Collaborator

MaskRay commented Jul 24, 2023

Can someone describe how these extensions are going to use R_RISCV_RELAX in a way that is close to the spirit of the current R_RISCV_RELAX, so that we should consider extensibility of R_RISCV_RELAX, instead of creating new relocation types, even if parsing referenced symbol names is a very odd thing to do? This is why I asked for a self-contained example.

@kito-cheng
Copy link
Collaborator

Let me try to made one with Zcmt:

1. rv32imafdc_zcmt

foo:
  # R_RISCV_RELAX symbol($xrv32imafdc_zcmt)
  # R_RISCV_CALL
  call bar # This can be relax to jal, c.jal or cm.jalt

2. rv32imafdc

bar:
  # R_RISCV_RELAX symbol($xrv32imafdc)
  # R_RISCV_CALL symbol(zoo)
  call zoo # This can be relax to jal, c.jal

3. rv32imafd

zoo:
  # R_RISCV_RELAX symbol($xrv32imafd)
  # R_RISCV_CALL symbol(abc)
  call abc # This can be relax to jal

And adding new relocation like R_RISCV_RELAX_NORVC can't distinguish 1 and 2.

@kito-cheng
Copy link
Collaborator

@Nelson1225 could you also add some more comment to R_RISCV_RELAX relocation? maybe like this:

diff --git a/riscv-elf.adoc b/riscv-elf.adoc
index 7eeddc5..1ba7f40 100644
--- a/riscv-elf.adoc
+++ b/riscv-elf.adoc
@@ -430,7 +430,7 @@ Description:: Additional information about the relocation
                                             <| S + A
 .2+| 47-50   .2+| *Reserved*                          .2+| -       |                   .2+| Reserved for future standard use
                                             <|
-.2+| 51      .2+| RELAX         .2+| Static  |                   .2+| Instruction can be relaxed, paired with a normal relocation at the same address
+.2+| 51      .2+| RELAX         .2+| Static  |                   .2+| Instruction can be relaxed, paired with a normal relocation at the same address. This relocation might point to an optional dummy mapping symbol, indicating which extension is permitted for use during the linker relaxation process.
                                             <|
 .2+| 52      .2+| SUB6          .2+| Static  | _word6_           .2+| Local label subtraction
                                             <| V - S - A

@Nelson1225
Copy link
Collaborator Author

@Nelson1225 could you also add some more comment to R_RISCV_RELAX relocation? maybe like this:

diff --git a/riscv-elf.adoc b/riscv-elf.adoc
index 7eeddc5..1ba7f40 100644
--- a/riscv-elf.adoc
+++ b/riscv-elf.adoc
@@ -430,7 +430,7 @@ Description:: Additional information about the relocation
                                             <| S + A
 .2+| 47-50   .2+| *Reserved*                          .2+| -       |                   .2+| Reserved for future standard use
                                             <|
-.2+| 51      .2+| RELAX         .2+| Static  |                   .2+| Instruction can be relaxed, paired with a normal relocation at the same address
+.2+| 51      .2+| RELAX         .2+| Static  |                   .2+| Instruction can be relaxed, paired with a normal relocation at the same address. This relocation might point to an optional dummy mapping symbol, indicating which extension is permitted for use during the linker relaxation process.
                                             <|
 .2+| 52      .2+| SUB6          .2+| Static  | _word6_           .2+| Local label subtraction
                                             <| V - S - A

Sure, fixed :-)

@kito-cheng
Copy link
Collaborator

We found an interesting case during implementing zcmt linker optimization, should we do zcmt optimization IF attached mapping symbol didn't contained zcmt?

If yes: do we have way to disable that? It will hurt performance (a lots) in general, so we might want to disable that in some code region, e.g. performance critical function.

If no: that means we must build multi-lib with zcmt to enable that, which may increase lots of multi lib configuration in practice, otherwise libraries can't optimize with zcmt.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 28, 2023

Mapping symbols are for disassemblers and have no other semantic meaning. Putting the ISA string information in the RELAX relocation is the way to solve these kinds of problems.

@kito-cheng
Copy link
Collaborator

Yeah, anyway, I am not super concern about the case I throw, I think that would be sort of implementation details.

@rui314
Copy link
Collaborator

rui314 commented Sep 29, 2023

If you want to identify the set of instruction extensions for a given piece of code, isn't that information already in the object file? For example, after the $xrv64imafdc symbol, the code is considered to be RV64GC until another mapping symbol appears. That information alone should be sufficient for the linker to learn what relaxation is allowed for that code region. It seems we don't need to repeat it for each RELAX relocation.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 29, 2023

Mapping symbols don't convey semantic information, that's the point. You can legitimately strip them and get a working (if potentially hard to disassemble) result. Doing it based off the actual symbols is fragile and a bad idea.

@rui314
Copy link
Collaborator

rui314 commented Sep 29, 2023

Then I'd suggest we add a new relocation type or redefine RELAX to switch the current set of extensions instead of repeating the same thing for each RELAX relocation.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 29, 2023

Then I'd suggest we add a new relocation type or redefine RELAX to switch the current set of extensions instead of repeating the same thing for each RELAX relocation.

Relocations should not be stateful. They should only affect relocations at the same address. Otherwise you have to process them in-order (or sort them and search them) and permuting sections changes the semantics.

@rui314
Copy link
Collaborator

rui314 commented Sep 29, 2023

Why shouldn't it be stateful? The same argument could have been applied to symbols, but with the introduction of mapping symbols, the symbols have become stateful in a sense. I don't see a strong reason to avoid it. The linker processes relocations in address order, so it is actually natural for the linker to handle them in-order. Permuting sections doesn't alter the semantics, as each"state change" relocation still points to the location where a new code region begins.

@rui314
Copy link
Collaborator

rui314 commented Sep 29, 2023

For RISC-V object files, we can't really handle relocations in an arbitrary order because of the relaxation that reduces section size. Unlike other psABIs, a relocation's r_offset is not sufficient to identify the location where the relocation is applied to. We need to know how many bytes have been removed by other relocations that appears before it to apply it to the correct location. So they are already stateful and needs to be processed in the address-increasing order.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 29, 2023

Only ALIGN needs to be processed in-order today. Everything else can be done in an arbitrary address order.

@rui314
Copy link
Collaborator

rui314 commented Sep 29, 2023

I don't think that's true. For example, if the first relocation with RELAX shrinks the section by, say, 2 bytes, the r_offset of all relocations with an address greater than the first one must be decreased by 2. To do so, they need to be processed in the address-increasing order.

@jrtc27
Copy link
Collaborator

jrtc27 commented Sep 29, 2023

Depends how you track things. And if you've resolved the relocation then you don't need to care about its r_offset.

@sorear
Copy link
Collaborator

sorear commented Jan 28, 2024

Are ISA string parsers reliable and interoperable enough now to consider using them for something load-bearing? I remember a lot of issues a few years ago.

@kito-cheng
Copy link
Collaborator

Are ISA string parsers reliable and interoperable enough now to consider using them for something load-bearing? I remember a lot of issues a few years ago.

ISA string is kinda stable than before after the adding huge extensions last two years I think, the only unstable thing is RISC-V ISA seems intend to split extension into more fine grained, e.g. M become M + Zmmul, A become A + Zalrsc + Zaamo, but that is less concern since that is not really backward incompatible change, and ISA string canonicalation could resolve that.

Back to the problem I throw before (#393 (comment)): the zcmt issue, we found a (little complicated) solution for that, which is allow some extension can be ignore that limitation IF the target ISA string isn't using (encoding) conflicted extensions, but anyway that's implementation define behavior, we just need make sure the spec has leave flexibility here.

Other than that I think it's viable solution, not sure how linker other guys think? @MaskRay @rui314

@sorear
Copy link
Collaborator

sorear commented Feb 26, 2024

This is an incomplete solution, since not all forms of linker code generation or rewriting occur in the context of a relocation. The recently proposed instruction table standard extension can rewrite arbitrary 32-bit instructions, and the FDPIC proposal I'm working on benefits from RVC in the PLT. How do the existing Andes etc toolchains with instruction tables know what regions to enable them in?

@jrtc27
Copy link
Collaborator

jrtc27 commented Feb 26, 2024

I don't know how you can hope to rewrite instructions without a relocation. You don't even know where the start of each instruction is. And you need the ability to turn it off for some assembly so you get exactly what you wrote.

@kito-cheng
Copy link
Collaborator

This is an incomplete solution, since not all forms of linker code generation or rewriting occur in the context of a relocation. The recently proposed instruction table standard extension can rewrite arbitrary 32-bit instructions,

Are you refer to Zcmt (cm.jt and cm.jalt)? that will still require a relocation to relax.

and the FDPIC proposal I'm working on benefits from RVC in the PLT.

Do you mind give more context about that? and that's good to know you are working FDPIC :)

How do the existing Andes etc toolchains with instruction tables know what regions to enable them in?

My memory tell me they are using relocation pair to disable that within specific region (e.g. R_START_NO_TABLE_INST_REGION, R_END_NO_TABLE_INST_REGION).

@sorear
Copy link
Collaborator

sorear commented Feb 27, 2024

Are you refer to Zcmt (cm.jt and cm.jalt)? that will still require a relocation to relax.

No, I'm referring to the instruction table extension Zcmi.

Do you mind give more context about that? and that's good to know you are working FDPIC :)

I just created #429 so that there is a point of reference for people on GitHub. #343 (comment) has more information on the requirements and design process for the relocation scheme.

@kito-cheng
Copy link
Collaborator

No, I'm referring to the instruction table extension Zcmi.

Thanks for that info, found the spec in the thread, but just a bookmark here to let other easier access that: https://drive.google.com/file/d/1qCAjZs6pZWK8q2nPVSsuVqs4aSBqkRTw/view

And it seems very similar to Andes's extension which I am family with...so yeah, the comment in my last comment for that still valid and right: using relocation pair to mark a region can't be optimized with table jump instruction.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants