[llvm-mca] Add optional identifier field to mca::Instruction #97867

chinmaydd · 2024-07-05T23:15:52Z

While using llvm-mca as a library (in MCA Daemon), we've been having trouble uniquely identifying instructions that go through the pipeline. Inspired by the discussion here, I believe it may make sense to add an optional identifier to each mca::Instruction.

A possible instance of use could be the InstrBuilder incrementing the Identifier by 1 after each instruction is created. I'm happy to add that too.

llvmbot · 2024-07-05T23:16:22Z

@llvm/pr-subscribers-tools-llvm-mca

Author: Chinmay Deshpande (chinmaydd)

Changes

While using llvm-mca as a library (in MCA Daemon), we've been having trouble uniquely identifying instructions that go through the pipeline. Inspired by the discussion here, I believe it may make sense to add an optional identifier to each mca::Instruction.

A possible instance of use could be the InstrBuilder incrementing the Identifier by 1 after each instruction is created. I'm happy to add that too.

Full diff: https://github.com/llvm/llvm-project/pull/97867.diff

1 Files Affected:

(modified) llvm/include/llvm/MCA/Instruction.h (+5)

diff --git a/llvm/include/llvm/MCA/Instruction.h b/llvm/include/llvm/MCA/Instruction.h
index e48a70164bec6..09822e43a827d 100644
--- a/llvm/include/llvm/MCA/Instruction.h
+++ b/llvm/include/llvm/MCA/Instruction.h
@@ -643,6 +643,8 @@ class Instruction : public InstructionBase {
   // True if this instruction has been optimized at register renaming stage.
   bool IsEliminated;
 
+  std::optional<uint64_t> Identifier;
+
 public:
   Instruction(const InstrDesc &D, const unsigned Opcode)
       : InstructionBase(D, Opcode), Stage(IS_INVALID),
@@ -690,6 +692,9 @@ class Instruction : public InstructionBase {
   bool isRetired() const { return Stage == IS_RETIRED; }
   bool isEliminated() const { return IsEliminated; }
 
+  std::optional<uint64_t> getIdentifier() const { return Identifier; }
+  void setIdentifier(uint64_t Id) { Identifier = Id; }
+
   // Forces a transition from state IS_DISPATCHED to state IS_EXECUTED.
   void forceExecuted();
   void setEliminated() { IsEliminated = true; }

boomanaiden154

Do you have more information on how you intend to use this? I'm struggling to see how this would be particularly useful, especially with instruction recycling where you would end up picking up the identifier of a previous instruction that has been recycled.

chinmaydd · 2024-07-09T01:14:45Z

Hi @boomanaiden154, thanks for your comment. We are using llvm-mca as a library in MCAD. MCAD tries to tweak the timings of individual instructions based on dynamic information from an emulator-like environment (if available). For instance, we have basic support for identifying aliasing loads and stores based on information retrieved from QEMU. We store metadata relating to address, and size of access with the key as the index of the instruction in the larger trace.

As you can see, the same instruction might have different metadata at different points in the program. For recycled instructions, we ensure to tag them appropriately when they are instantiated. Having a unique identifier to refer to instructions in the pipeline seems useful to me.

michaelmaitland · 2024-07-09T14:13:02Z

As you can see, the same instruction might have different metadata at different points in the program. For recycled instructions, we ensure to tag them appropriately when they are instantiated. Having a unique identifier to refer to instructions in the pipeline seems useful to me.

Could you please help me understand what new functionality this would allow in MCAD? It looks like we have a way to identify them.

Edit: I didn't see that the link above was a different branch from the main branch.

I don't have any strong objections to this change. In our private MCAD repo, we just removed the Metadata side of things to get it working with upstream LLVM. This change would allow us to get closer to using upstream LLVM with all MCAD functionality, which is desirable to me. One alternative is that you could create a subclass of mca::Instruction in MCAD that adds the identifier. Another alternative is to hash the mca::Instruction in MCAD.

chinmaydd · 2024-07-11T18:17:10Z

Commenting for bump.

boomanaiden154 · 2024-07-11T18:26:25Z

Commenting for bump.

Can you reply to Michael's questions above? I'd be interest to know why the listed alternatives don't work.

The change seems fine to me. It does increase the size of mca::Instruction, but given it's already heap allocated and large, that's probably not too big of an issue.

chinmaydd · 2024-07-11T19:19:49Z

Thanks for the response @boomanaiden154. I agree with @michaelmaitland that a central reason for this change is to be rid of the custom LLVM dependency that we have been dragging around with MCAD for a couple of years now.

Subclassing mca::Instruction is possible but it will require us to redefine even more llvm-mca functionality within MCAD. For instance, the entire Pipeline works with the InstRef class that holds a pointer to mca::Instruction. We are also looking to use hardware events generated by MCA (such as StallEvent, PressureEvent) to better inform the consumer. This requires that the unique identifier (and therefore access to the relevant metadata) persist through the various stages of the pipeline.

I'm not too confident that hashing will work reliably here. For instance the CyclesLeft field will be updated at different stages of the pipeline -- therefore changing the hash. I could hash the opcode and the operands from InstructionBase but that is not going to help in uniquely identifying instructions. I dont see any other member combination that would yield a unique identifier for differentiating Instructions A and B, that have the same opcode and operands, in the pipeline. Happy to hear suggestions.

boomanaiden154 · 2024-07-15T06:53:57Z

Adding an identifier seems like it might not be a bad option given the current design. If it helps multiple downstream consumers, then it seems reasonable enough to me. We don't need any of this sort of functionality in our downstream usage (so far), but I can see how it would potentially be useful for others.

I'll defer to the primary llvm-mca reviewers here as I'm not really qualified to review changes like this in this area of the project.

chinmaydd · 2024-07-16T21:24:32Z

@michaelmaitland sorry for the ping. Could you weigh in please ? Thanks!

michaelmaitland · 2024-07-17T00:12:04Z

@mshockwave added metadata to llvm-mcad but it did not have a significant impact on improving accuracy or precision. With that knowledge, I am not sure how important it is to modify upstream like this. SiFive has ripped out usage of metadata in our fork of llvm-mcad and integrated with no changes to upstream llvm.

I am curious whether @chinmaydd has a strong motivation to support metadata in llvm-mcad that would warrant us to take this change here.

I am a supporter of making llvm-mcad work with upstream LLVM. The question remains what approach to take.

chinmaydd · 2024-07-17T00:58:56Z

@michaelmaitland thanks for your comments. I understand that getting MCAD to work with upstream LLVM without the metadata changes is possible. In fact, the version we have been maintaining has a branch that implements this.

But, we have been working on other use cases that may benefit from not providing, but receiving, metadata from MCAD. Consider the following disassembly sample -

This is part of the BinaryNinja (a reverse engineering framework) UI annotated with llvm-mca provided cycle counts. The red annotations highlight instructions that llvm-mca reports a StallEvent for. To facilitate this, we have the Broker register a HardwareEvent listener. In the listener, we get the identifier for the instruction through the InstRef and update local storage to reflect this information.

We've had multiple folks be interested in using MCAD like this. Integrating timing information as part of a disassembler UI aids in quickly iterating over and understanding timing changes for different versions of the generated binary (maybe optimizations, patches, etc.)

chinmaydd · 2024-07-19T21:39:24Z

@michaelmaitland sorry for the ping. Any thoughts?

michaelmaitland · 2024-07-24T15:29:29Z

It sounds like the value here is for llvm-mcad to explore whether receiving metadata is valuable. IIRC, the original providing metadata did not turn out to be useful.

I have two ideas here:

We hold off on taking this change until there is proof that llvm-mcad benefits from receiving metadata. Otherwise, there are no known instances where this change is valuable. llvm-mcad can cherry pick this change on its own llvm branch during experiments.
We take this change since it is a small addition to the size of the object and non-invasive otherwise? Perhaps we can mark it as experimental, in the case that llvm-mcad receiving metadata proves to not be useful (I.e. no known users of the field exist in the wild).

I lean towards (1), but I wouldn’t mind (2) if someone felt strongly towards it. What do others think?

chinmaydd · 2024-07-29T18:45:27Z

Hi @michaelmaitland, thank you for continuing to engage with this PR. I appreciate your time.

It sounds like the value here is for llvm-mcad to explore whether receiving metadata is valuable

That is correct, but like my previous comment highlights -- we are also using MCAD to "provide" metadata rather than receiving it. This way, the client of the Broker can leverage hardware events, such as stalls and pipeline pressure events, which we store as a mapping from {InstructionIdentifier: vector<HardwareEvents>}.

michaelmaitland · 2024-07-29T19:08:20Z

the value here is for llvm-mcad to explore whether receiving metadata is valuable. IIRC, the original providing metadata did not turn out to be useful

Sorry, I had it backwards with receiving and providing. My two "ideas" still hold if we swap receiving and providing around:

We hold off on taking this change until there is proof that llvm-mcad benefits from providing metadata. Otherwise, there are no known instances where this change is valuable. llvm-mcad can cherry pick this change on its own llvm branch during experiments.
We take this change since it is a small addition to the size of the object and non-invasive otherwise? Perhaps we can mark it as experimental, in the case that llvm-mcad providing metadata proves to not be useful (I.e. no known users of the field exist in the wild).

chinmaydd · 2024-07-29T19:42:04Z

Thats fair.

Apart from this one change, MCAD is now completely compatible with upstream. We are no longer maintaining a custom fork. We are currently shipping MCAD with a patch file that applies this change.
I'm okay with marking it as experimental for a while. If the community decides to remove it, I would appreciate a ping / mention so that we can update our build instructions.

Thanks !

michaelmaitland · 2024-07-29T19:43:32Z

@mshockwave it would be great if you could weigh in on the two options.

RKSimon · 2024-07-29T19:58:18Z

CC @adibiagio

adibiagio · 2024-07-30T08:25:50Z

@mshockwave it would be great if you could weigh in on the two options.

I also would like to hear @mshockwave's opinions on this.

Is there a reason why an instruction identifier needs to be a std::optional?
Have you considered using uint32_t (instead of uint64_t) for the underlying type?

We could probably save some space by placing the new identifier field in a smart way. If we better pack fields, we may avoid excessive padding due to alignment requirements, and reduce the total sizeof.

In principle, I don't particularly like the idea of adding fields to mca::instruction which are only used by downstream implementations. However, I don't think this change is problematic, and I don't consider the size change a big issue in practice. I'll let @michaelmaitland and @mshockwave decide on this.

-Andrea

llvmbot added the tools:llvm-mca label Jul 5, 2024

[MCA] Add identifier field to mca::Instruction

e23140f

chinmaydd force-pushed the mca-instr-id branch from f44dd7f to e23140f Compare July 6, 2024 04:56

RKSimon requested review from adibiagio and boomanaiden154 July 6, 2024 11:20

boomanaiden154 requested review from mshockwave and michaelmaitland July 9, 2024 00:37

boomanaiden154 reviewed Jul 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[llvm-mca] Add optional identifier field to mca::Instruction #97867

[llvm-mca] Add optional identifier field to mca::Instruction #97867

chinmaydd commented Jul 5, 2024

llvmbot commented Jul 5, 2024

boomanaiden154 left a comment

chinmaydd commented Jul 9, 2024 •

edited

Loading

michaelmaitland commented Jul 9, 2024 •

edited

Loading

chinmaydd commented Jul 11, 2024

boomanaiden154 commented Jul 11, 2024

chinmaydd commented Jul 11, 2024 •

edited

Loading

boomanaiden154 commented Jul 15, 2024

chinmaydd commented Jul 16, 2024

michaelmaitland commented Jul 17, 2024

chinmaydd commented Jul 17, 2024 •

edited

Loading

chinmaydd commented Jul 19, 2024

michaelmaitland commented Jul 24, 2024 •

edited

Loading

chinmaydd commented Jul 29, 2024

michaelmaitland commented Jul 29, 2024

chinmaydd commented Jul 29, 2024

michaelmaitland commented Jul 29, 2024

RKSimon commented Jul 29, 2024

adibiagio commented Jul 30, 2024

[llvm-mca] Add optional identifier field to mca::Instruction #97867

Are you sure you want to change the base?

[llvm-mca] Add optional identifier field to mca::Instruction #97867

Conversation

chinmaydd commented Jul 5, 2024

llvmbot commented Jul 5, 2024

boomanaiden154 left a comment

Choose a reason for hiding this comment

chinmaydd commented Jul 9, 2024 • edited Loading

michaelmaitland commented Jul 9, 2024 • edited Loading

chinmaydd commented Jul 11, 2024

boomanaiden154 commented Jul 11, 2024

chinmaydd commented Jul 11, 2024 • edited Loading

boomanaiden154 commented Jul 15, 2024

chinmaydd commented Jul 16, 2024

michaelmaitland commented Jul 17, 2024

chinmaydd commented Jul 17, 2024 • edited Loading

chinmaydd commented Jul 19, 2024

michaelmaitland commented Jul 24, 2024 • edited Loading

chinmaydd commented Jul 29, 2024

michaelmaitland commented Jul 29, 2024

chinmaydd commented Jul 29, 2024

michaelmaitland commented Jul 29, 2024

RKSimon commented Jul 29, 2024

adibiagio commented Jul 30, 2024

chinmaydd commented Jul 9, 2024 •

edited

Loading

michaelmaitland commented Jul 9, 2024 •

edited

Loading

chinmaydd commented Jul 11, 2024 •

edited

Loading

chinmaydd commented Jul 17, 2024 •

edited

Loading

michaelmaitland commented Jul 24, 2024 •

edited

Loading