Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extract AotCompiler interface #565

Merged
merged 7 commits into from
Oct 12, 2024
Merged

Extract AotCompiler interface #565

merged 7 commits into from
Oct 12, 2024

Conversation

electrum
Copy link
Collaborator

@electrum electrum commented Oct 5, 2024

The new AotMachineFactory class allows reusing the same compiled module for multiple instances.

GitHub does a bad job rendering the diff, but it's easy to see the changes locally (might need to enable copy detection).

These commits are intended to be reviewed and merged separately.

Copy link
Collaborator

@andreaTP andreaTP left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow @electrum this is a huge step forward in emitting self-contained modules!

The JMH test shows a little worsening of the perf, but I believe that's in the same order of magnitude as the noise of the GH Actions machines.

LGTM, and sorry for the delay in reviewing.

@@ -710,7 +710,7 @@ private void compileFunction(
break;
}
// collect unique target labels
Map<Integer, Label> targets = new HashMap<>();
Map<Integer, Label> targets = new TreeMap<>();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick, we can follow up separately, is this a sparse map? Alternatively Map<Integer, Label> can be converted to Label[]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is sparse and we don't know how big it will be. For example:

br_table 1 (;@0;) 1 (;@0;) 1 (;@0;)
br_table 0 (;@2;) 1 (;@1;) 0 (;@2;) 1 (;@1;) 0 (;@2;)

In the first example, there is only one unique label, but it's not 0. In the second example, there are only two unique labels, but we don't know that up front.

This could also use LinkedHashMap. What matters is that we have a consistent iteration order, so that we generate code deterministically.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the detail 👍


final class AotMethodInliner {

private AotMethodInliner() {}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not impacting this PR, for the future, I have a few open questions in my mind:

  • would it make sense to inline the bodies of the functions?
  • is the JVM already doing the inlining for us?
  • is there any performance improvement/drawback?

I think we should do some measuring to understand the tradeoffs.

Copy link
Collaborator Author

@electrum electrum Oct 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The JVM should inline the methods. The purpose of these methods is to keep the bytecode smaller and simpler, and also keep the compiler simpler. Larger bytecode prevents the JVM from optimizing. Many of the methods are there to avoid stack shuffling in the generated code.

For example, when compiling the MEMORY_COPY instruction, at the point when we see the instruction, we already have the destination, offset, and size arguments on the stack. Calling the copy() method on the Memory class requires that the Memory instance reference is pushed first, as the receiver (this) is always the first parameter to an instance method. There's no way to simply insert it that far up the stack. You have to spill all the arguments to local slots, push the receiver, then restore the arguments from locals. This would make the bytecode far larger and require JVM optimizations to make it efficient. Instead, we simply call a static method that has the parameters in the desired order, which trivially delegates to the target method.

An alternative would be to have a far more complicated compiler that can push the receiver or other arguments at the appropriate place, but I have no idea how to do the analysis required to make that work.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to do the write-up!
Your reasoning seems solid, let's move forward on this path and revisit only in presence of evidence.

@andreaTP andreaTP merged commit 407569d into dylibso:main Oct 12, 2024
13 checks passed
@electrum electrum deleted the aot branch October 13, 2024 01:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants