Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF #16234

Closed
wants to merge 13 commits into from

Conversation

XiaohongGong
Copy link

@XiaohongGong XiaohongGong commented Oct 18, 2023

Currently the vector floating-point math APIs like VectorOperators.SIN/COS/TAN... are not intrinsified on AArch64 platform, which causes large performance gap on AArch64. Note that those APIs are optimized by C2 compiler on X86 platforms by calling Intel's SVML code [1]. To close the gap, we would like to optimize these APIs for AArch64 by calling a third-party vector library called libsleef [2], which are available in mainstream Linux distros (e.g. [3] [4]).

SLEEF supports multiple accuracies. To match Vector API's requirement and implement the math ops on AArch64, we 1) call 1.0 ULP accuracy with FMA instructions used stubs in libsleef for most of the operations by default, and 2) add the vector calling convention to apply with the runtime calls to stub code in libsleef. Note that for those APIs that libsleef does not support 1.0 ULP, we choose 0.5 ULP instead.

To help loading the expected libsleef library, this patch also adds an experimental JVM option (i.e. -XX:UseSleefLib) for AArch64 platforms. People can use it to denote the libsleef path/name explicitly. By default, it points to the system installed library. If the library does not exist or the dynamic loading of it in runtime fails, the math vector ops will fall-back to use the default scalar version without error. But a warning is printed out if people specifies a nonexistent library explicitly.

Note that this is a part of the original proposed patch in panama-dev [5], just with some initial review comments addressed. And now we'd like to get some wider feedbacks from more hotspot experts.

[1] #3638
[2] https://sleef.org/
[3] https://packages.fedoraproject.org/pkgs/sleef/sleef/
[4] https://packages.debian.org/bookworm/libsleef3
[5] https://mail.openjdk.org/pipermail/panama-dev/2022-December/018172.html


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (3 reviews required, with at least 1 Reviewer, 2 Authors)

Issue

  • JDK-8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/16234/head:pull/16234
$ git checkout pull/16234

Update a local copy of the PR:
$ git checkout pull/16234
$ git pull https://git.openjdk.org/jdk.git pull/16234/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 16234

View PR using the GUI difftool:
$ git pr show -t 16234

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/16234.diff

Webrev

Link to Webrev Comment

@XiaohongGong
Copy link
Author

Here is the performance improvement for JMH benchmarks [1] [2] after enabling libsleef for AArch64 NEON and SVE:

NEON:

Benchmark               (size)  Mode  Cnt   Gain
DoubleMaxVector.ACOS     1024  thrpt   5   1.775
DoubleMaxVector.ASIN     1024  thrpt   5   2.134
DoubleMaxVector.ATAN     1024  thrpt   5   2.376
DoubleMaxVector.ATAN2    1024  thrpt   5   2.799
DoubleMaxVector.CBRT     1024  thrpt   5   1.588
DoubleMaxVector.COS      1024  thrpt   5   1.751
DoubleMaxVector.COSH     1024  thrpt   5   1.756
DoubleMaxVector.EXP      1024  thrpt   5   8.257
DoubleMaxVector.EXPM1    1024  thrpt   5   2.028
DoubleMaxVector.HYPOT    1024  thrpt   5   2.132
DoubleMaxVector.LOG      1024  thrpt   5   4.017
DoubleMaxVector.LOG10    1024  thrpt   5   5.693
DoubleMaxVector.LOG1P    1024  thrpt   5   2.788
DoubleMaxVector.POW      1024  thrpt   5   3.494
DoubleMaxVector.SIN      1024  thrpt   5   2.010
DoubleMaxVector.SINH     1024  thrpt   5   1.697
DoubleMaxVector.TAN      1024  thrpt   5   3.448
DoubleMaxVector.TANH     1024  thrpt   5   0.984
FloatMaxVector.ACOS      1024  thrpt   5   2.310
FloatMaxVector.ASIN      1024  thrpt   5   2.887
FloatMaxVector.ATAN      1024  thrpt   5   3.076
FloatMaxVector.ATAN2     1024  thrpt   5   4.162
FloatMaxVector.CBRT      1024  thrpt   5   2.941
FloatMaxVector.COS       1024  thrpt   5   1.832
FloatMaxVector.COSH      1024  thrpt   5   2.681
FloatMaxVector.EXP       1024  thrpt   5  15.758
FloatMaxVector.EXPM1     1024  thrpt   5   3.061
FloatMaxVector.HYPOT     1024  thrpt   5   3.428
FloatMaxVector.LOG       1024  thrpt   5  12.364
FloatMaxVector.LOG10     1024  thrpt   5  11.267
FloatMaxVector.LOG1P     1024  thrpt   5   5.819
FloatMaxVector.POW       1024  thrpt   5   6.710
FloatMaxVector.SIN       1024  thrpt   5   1.906
FloatMaxVector.SINH      1024  thrpt   5   2.505
FloatMaxVector.TAN       1024  thrpt   5   4.975
FloatMaxVector.TANH      1024  thrpt   5   1.157
Float64Vector.ACOS       1024  thrpt   5   1.855
Float64Vector.ASIN       1024  thrpt   5   2.294
Float64Vector.ATAN       1024  thrpt   5   2.082
Float64Vector.ATAN2      1024  thrpt   5   2.849
Float64Vector.CBRT       1024  thrpt   5   1.781
Float64Vector.COS        1024  thrpt   5   1.224
Float64Vector.COSH       1024  thrpt   5   1.793
Float64Vector.EXP        1024  thrpt   5   9.000
Float64Vector.EXPM1      1024  thrpt   5   2.096
Float64Vector.HYPOT      1024  thrpt   5   2.589
Float64Vector.LOG        1024  thrpt   5   5.582
Float64Vector.LOG10      1024  thrpt   5   5.495
Float64Vector.LOG1P      1024  thrpt   5   3.594
Float64Vector.POW        1024  thrpt   5   3.254
Float64Vector.SIN        1024  thrpt   5   1.254
Float64Vector.SINH       1024  thrpt   5   1.719
Float64Vector.TAN        1024  thrpt   5   2.670
Float64Vector.TANH       1024  thrpt   5   1.020

SVE 512-bit vector size:

Benchmark               (size)  Mode  Cnt   Gain
DoubleMaxVector.ACOS     1024  thrpt   5   1.731
DoubleMaxVector.ASIN     1024  thrpt   5   2.046
DoubleMaxVector.ATAN     1024  thrpt   5   4.932
DoubleMaxVector.ATAN2    1024  thrpt   5   6.032
DoubleMaxVector.CBRT     1024  thrpt   5   6.883
DoubleMaxVector.COS      1024  thrpt   5   5.512
DoubleMaxVector.COSH     1024  thrpt   5   2.796
DoubleMaxVector.EXP      1024  thrpt   5  42.490
DoubleMaxVector.EXPM1    1024  thrpt   5   6.188
DoubleMaxVector.HYPOT    1024  thrpt   5   2.195
DoubleMaxVector.LOG      1024  thrpt   5  19.532
DoubleMaxVector.LOG10    1024  thrpt   5  19.229
DoubleMaxVector.LOG1P    1024  thrpt   5  10.477
DoubleMaxVector.POW      1024  thrpt   5  11.887
DoubleMaxVector.SIN      1024  thrpt   5   6.073
DoubleMaxVector.SINH     1024  thrpt   5   2.994
DoubleMaxVector.TAN      1024  thrpt   5  15.417
FloatMaxVector.ACOS      1024  thrpt   5   3.867
FloatMaxVector.ASIN      1024  thrpt   5   4.291
FloatMaxVector.ATAN      1024  thrpt   5  11.786
FloatMaxVector.ATAN2     1024  thrpt   5  14.734
FloatMaxVector.CBRT      1024  thrpt   5  11.622
FloatMaxVector.COS       1024  thrpt   5   6.477
FloatMaxVector.COSH      1024  thrpt   5   3.571
FloatMaxVector.EXP       1024  thrpt   5  53.020
FloatMaxVector.EXPM1     1024  thrpt   5   6.348
FloatMaxVector.HYPOT     1024  thrpt   5   4.722
FloatMaxVector.LOG       1024  thrpt   5  41.263
FloatMaxVector.LOG10     1024  thrpt   5  47.685
FloatMaxVector.LOG1P     1024  thrpt   5  22.481
FloatMaxVector.POW       1024  thrpt   5  24.896
FloatMaxVector.SIN       1024  thrpt   5   6.768
FloatMaxVector.SINH      1024  thrpt   5   3.429

[1] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/FloatMaxVector.java#L1068
[2] https://github.com/openjdk/panama-vector/blob/vectorIntrinsics/test/micro/org/openjdk/bench/jdk/incubator/vector/operation/DoubleMaxVector.java#L1068

@bridgekeeper
Copy link

bridgekeeper bot commented Oct 18, 2023

👋 Welcome back xgong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk openjdk bot added the rfr Pull request is ready for review label Oct 18, 2023
@openjdk
Copy link

openjdk bot commented Oct 18, 2023

@XiaohongGong The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@mlbridge
Copy link

mlbridge bot commented Oct 18, 2023

src/hotspot/cpu/aarch64/stubGenerator_aarch64.cpp Outdated Show resolved Hide resolved
src/hotspot/cpu/aarch64/globals_aarch64.hpp Outdated Show resolved Hide resolved
@openjdk
Copy link

openjdk bot commented Oct 18, 2023

⚠️ @XiaohongGong The .jcheck/conf in the target branch of this pull request is invalid. Until that is resolved, this pull request cannot be processed. Please notify the repository owner.

Copy link
Contributor

@theRealAph theRealAph left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec.
I'm very nervous about binding ourselves to a specific version of the SLEEF ABI, because Java releases are maintained for decades, and we don't want to be dependent on other projects.

We'll have to make a plan for version evolution.

@XiaohongGong
Copy link
Author

This looks good. As far as I can tell the choice you've made of accuracy matches what we need to meet the spec. I'm very nervous about binding ourselves to a specific version of the SLEEF ABI, because Java releases are maintained for decades, and we don't want to be dependent on other projects.

We'll have to make a plan for version evolution.

Thanks! I agree that this is a somewhat short term close-the-gap solution, and bundling a good library makes more sense. Also, as John mentioned, implementing those math APIs using Vector API itself, sounds very desirable in the future.

@XiaohongGong
Copy link
Author

Hi @theRealAph ,

The latest commit created a native library as a bridge to the third-party sleef library. Could you please help check whether it's a better solution to fix the hard-coding sleef ABI version issue and the further evolution?

It re-defines all the vector math functions and implements them by calling the relative functions in libsleef. The library is bundled into the jdk image. With this way, we doesn't need to hard-code the libsleef ABI version into jdk. And the potential issue caused by the future ABI updating may be catched earlier. Meanwhile, the original added VM option (i.e. -XX:UseSleefLib) is not needed anymore. So we removed it then.

Thanks,
Xiaohong Gong

@theRealAph
Copy link
Contributor

Hi,

The latest commit created a native library as a bridge to the third-party sleef library. Could you please help check whether it's a better solution to fix the hard-coding sleef ABI version issue and the further evolution?

That looks rather nice.

It re-defines all the vector math functions and implements them by calling the relative functions in libsleef. The library is bundled into the jdk image. With this way, we doesn't need to hard-code the libsleef ABI version into jdk. And the potential issue caused by the future ABI updating may be catched earlier.

That sounds good. We're still rather vulnerable if sleef changes its ABI, but I don't suppose that will happen, and we can deal with it if it does.

@XiaohongGong
Copy link
Author

/label add build

@openjdk
Copy link

openjdk bot commented Nov 16, 2023

@XiaohongGong
The build label was successfully added.

@dholmes-ora
Copy link
Member

/label add panama-dev

@openjdk
Copy link

openjdk bot commented Nov 16, 2023

@dholmes-ora
The label panama is not a valid label.
These labels are valid:

  • graal
  • serviceability
  • hotspot
  • hotspot-compiler
  • ide-support
  • kulla
  • i18n
  • shenandoah
  • jdk
  • javadoc
  • security
  • hotspot-runtime
  • jmx
  • build
  • nio
  • client
  • core-libs
  • compiler
  • net
  • hotspot-gc
  • hotspot-jfr

@dholmes-ora
Copy link
Member

/label add core-libs

Really this should go to panama-dev but core-libs will have to do.

@openjdk
Copy link

openjdk bot commented Nov 16, 2023

@dholmes-ora
The core-libs label was successfully added.

@dholmes-ora
Copy link
Member

So to be clear, now you have to opt-in to using libsleef by building a binary that will use it. That binary will always use libsleef if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too?

@XiaohongGong
Copy link
Author

Thanks for adding the labels @dholmes-ora !

So to be clear, now you have to opt-in to using libsleef by building a binary that will use it. That binary will always use libsleef if found, so there is no way to opt-out at runtime. Is that the way it works on x86_64 too?

Yes, libsleef is used to build the binary if found. And at runtime, if the libsleef with right version is not found, dlopen to the libvmath.so will fail. And then all the operations will be fall-back to the java default implementation. X86_64 has also bundled the Intel's SVML binary to jdk image at build time. And we use the same way loading/opening the library at runtime.

@magicus
Copy link
Member

magicus commented Dec 5, 2023

So you need to check both the flag and the header file? Oh well, then this is probably as good as it gets.

@XiaohongGong
Copy link
Author

So you need to check both the flag and the header file? Oh well, then this is probably as good as it gets.

Yes, we have to check both the flag and the header file.

@magicus
Copy link
Member

magicus commented Dec 6, 2023

All the makefile changes we've discussed previously now look good. However, I just noticed the additional -f flag. Why are you not exporting the functions from source code instead? That is the way we normally do it in JDK libraries. In your case, it seems like you only need to add the export to the macro.

Copy link
Member

@magicus magicus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build changes finally look good. Great, actually! Thanks for persisting, despite the many rounds of review.

You will still need the 2 hotspot reviews for the hotspot part of the patch.

/reviewers 3

@openjdk
Copy link

openjdk bot commented Dec 7, 2023

@XiaohongGong This change is no longer ready for integration - check the PR body for details.

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Dec 7, 2023
@openjdk
Copy link

openjdk bot commented Dec 7, 2023

@magicus
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 3 (with at least 1 Reviewer, 2 Authors).

@openjdk openjdk bot removed the ready Pull request is ready to be integrated label Dec 7, 2023
@XiaohongGong
Copy link
Author

Build changes finally look good. Great, actually! Thanks for persisting, despite the many rounds of review.

You will still need the 2 hotspot reviews for the hotspot part of the patch.

/reviewers 3

Thanks for the review and all the comments!

@luhenry
Copy link
Member

luhenry commented Dec 11, 2023

I can't say anything for sure, but I picked up some positive vibes from our internal chat. I think the idea was that libsleef could potentially cover up vector math for all platforms that the current Intel lib solution is missing (basically, everything but linux+windows x64). So I this can be seen as a bit of a trial balloon if it is worth a more complete integration of libsleef in the JDK.

I can add that we are interested to use that for Linux + RISC-V support given the RISC-V support was recently merged into sleef upstream. shibatch/sleef#477

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 8, 2024

@XiaohongGong This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 5, 2024

@XiaohongGong This pull request has been inactive for more than 8 weeks and will now be automatically closed. If you would like to continue working on this pull request in the future, feel free to reopen it! This can be done using the /open pull request command.

@bridgekeeper bridgekeeper bot closed this Feb 5, 2024
@magicus
Copy link
Member

magicus commented Feb 6, 2024

@XiaohongGong What happened? Did you not intend to continue working to get this integrated?

@XiaohongGong
Copy link
Author

XiaohongGong commented Feb 6, 2024 via email

@magicus
Copy link
Member

magicus commented Feb 6, 2024

Ooookay. I hope someone picks it up. I've spent quite a lot of time trying to get this PR into shape.

@magicus
Copy link
Member

magicus commented Feb 6, 2024

@luhenry Maybe you are interested in helping with bringing this forward? I can assist with risc-v related fixes in the makefiles. I'd just hate to see all this work go to waste.

@nick-arm
Copy link
Contributor

nick-arm commented Feb 6, 2024

We are planning to pick this up again later in the year, thanks for your help so far!

@luhenry
Copy link
Member

luhenry commented Feb 6, 2024

I'd just hate to see all this work go to waste.

Same here!

@Hamlin-Li
Copy link

I could also take a look at it by the end of this month if no one are going to do it.

@Hamlin-Li
Copy link

Just a quick update, I've rebased the code, and will continue the work soon.

I've looked through the previous discussion, seems there is no blocking issues, if I misunderstood or miss some information, please feel free to let me know, also if you have any further comment. Thanks!

@magicus
Copy link
Member

magicus commented Mar 1, 2024

Iirc, your assessment is right; the code should be ready for integration; I just don't know about the state of testing.

@Hamlin-Li
Copy link

Hi @magicus @theRealAph @dholmes-ora,
I've created the new pr to continue the work at: #18294.
Please take another look.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

8 participants