-
Notifications
You must be signed in to change notification settings - Fork 11.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[llvm-mca] Add optional identifier field to mca::Instruction #97867
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-tools-llvm-mca Author: Chinmay Deshpande (chinmaydd) ChangesWhile using llvm-mca as a library (in MCA Daemon), we've been having trouble uniquely identifying instructions that go through the pipeline. Inspired by the discussion here, I believe it may make sense to add an optional identifier to each A possible instance of use could be the Full diff: https://github.com/llvm/llvm-project/pull/97867.diff 1 Files Affected:
diff --git a/llvm/include/llvm/MCA/Instruction.h b/llvm/include/llvm/MCA/Instruction.h
index e48a70164bec6..09822e43a827d 100644
--- a/llvm/include/llvm/MCA/Instruction.h
+++ b/llvm/include/llvm/MCA/Instruction.h
@@ -643,6 +643,8 @@ class Instruction : public InstructionBase {
// True if this instruction has been optimized at register renaming stage.
bool IsEliminated;
+ std::optional<uint64_t> Identifier;
+
public:
Instruction(const InstrDesc &D, const unsigned Opcode)
: InstructionBase(D, Opcode), Stage(IS_INVALID),
@@ -690,6 +692,9 @@ class Instruction : public InstructionBase {
bool isRetired() const { return Stage == IS_RETIRED; }
bool isEliminated() const { return IsEliminated; }
+ std::optional<uint64_t> getIdentifier() const { return Identifier; }
+ void setIdentifier(uint64_t Id) { Identifier = Id; }
+
// Forces a transition from state IS_DISPATCHED to state IS_EXECUTED.
void forceExecuted();
void setEliminated() { IsEliminated = true; }
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have more information on how you intend to use this? I'm struggling to see how this would be particularly useful, especially with instruction recycling where you would end up picking up the identifier of a previous instruction that has been recycled.
Hi @boomanaiden154, thanks for your comment. We are using llvm-mca as a library in MCAD. MCAD tries to tweak the timings of individual instructions based on dynamic information from an emulator-like environment (if available). For instance, we have basic support for identifying aliasing loads and stores based on information retrieved from QEMU. We store metadata relating to address, and size of access with the key as the index of the instruction in the larger trace. As you can see, the same instruction might have different metadata at different points in the program. For recycled instructions, we ensure to tag them appropriately when they are instantiated. Having a unique identifier to refer to instructions in the pipeline seems useful to me. |
Could you please help me understand what new functionality this would allow in MCAD? It looks like we have a way to identify them. Edit: I didn't see that the link above was a different branch from the main branch. I don't have any strong objections to this change. In our private MCAD repo, we just removed the Metadata side of things to get it working with upstream LLVM. This change would allow us to get closer to using upstream LLVM with all MCAD functionality, which is desirable to me. One alternative is that you could create a subclass of |
Commenting for bump. |
Can you reply to Michael's questions above? I'd be interest to know why the listed alternatives don't work. The change seems fine to me. It does increase the size of |
Thanks for the response @boomanaiden154. I agree with @michaelmaitland that a central reason for this change is to be rid of the custom LLVM dependency that we have been dragging around with MCAD for a couple of years now. Subclassing I'm not too confident that hashing will work reliably here. For instance the |
Adding an identifier seems like it might not be a bad option given the current design. If it helps multiple downstream consumers, then it seems reasonable enough to me. We don't need any of this sort of functionality in our downstream usage (so far), but I can see how it would potentially be useful for others. I'll defer to the primary |
@michaelmaitland sorry for the ping. Could you weigh in please ? Thanks! |
@mshockwave added metadata to llvm-mcad but it did not have a significant impact on improving accuracy or precision. With that knowledge, I am not sure how important it is to modify upstream like this. SiFive has ripped out usage of metadata in our fork of llvm-mcad and integrated with no changes to upstream llvm. I am curious whether @chinmaydd has a strong motivation to support metadata in llvm-mcad that would warrant us to take this change here. I am a supporter of making llvm-mcad work with upstream LLVM. The question remains what approach to take. |
@michaelmaitland thanks for your comments. I understand that getting MCAD to work with upstream LLVM without the metadata changes is possible. In fact, the version we have been maintaining has a branch that implements this. But, we have been working on other use cases that may benefit from not providing, but receiving, metadata from MCAD. Consider the following disassembly sample - This is part of the BinaryNinja (a reverse engineering framework) UI annotated with llvm-mca provided cycle counts. The red annotations highlight instructions that llvm-mca reports a We've had multiple folks be interested in using MCAD like this. Integrating timing information as part of a disassembler UI aids in quickly iterating over and understanding timing changes for different versions of the generated binary (maybe optimizations, patches, etc.) |
@michaelmaitland sorry for the ping. Any thoughts? |
It sounds like the value here is for llvm-mcad to explore whether receiving metadata is valuable. IIRC, the original providing metadata did not turn out to be useful. I have two ideas here:
I lean towards (1), but I wouldn’t mind (2) if someone felt strongly towards it. What do others think? |
Hi @michaelmaitland, thank you for continuing to engage with this PR. I appreciate your time.
That is correct, but like my previous comment highlights -- we are also using MCAD to "provide" metadata rather than receiving it. This way, the client of the Broker can leverage hardware events, such as stalls and pipeline pressure events, which we store as a mapping from |
Sorry, I had it backwards with receiving and providing. My two "ideas" still hold if we swap receiving and providing around:
|
Thats fair.
Thanks ! |
@mshockwave it would be great if you could weigh in on the two options. |
CC @adibiagio |
I also would like to hear @mshockwave's opinions on this. Is there a reason why an instruction identifier needs to be a std::optional? We could probably save some space by placing the new identifier field in a smart way. If we better pack fields, we may avoid excessive padding due to alignment requirements, and reduce the total sizeof. In principle, I don't particularly like the idea of adding fields to mca::instruction which are only used by downstream implementations. However, I don't think this change is problematic, and I don't consider the size change a big issue in practice. I'll let @michaelmaitland and @mshockwave decide on this. -Andrea |
While using llvm-mca as a library (in MCA Daemon), we've been having trouble uniquely identifying instructions that go through the pipeline. Inspired by the discussion here, I believe it may make sense to add an optional identifier to each
mca::Instruction
.A possible instance of use could be the
InstrBuilder
incrementing theIdentifier
by 1 after each instruction is created. I'm happy to add that too.