-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF #18605
8312425: [vectorapi] AArch64: Optimize vector math operations with SLEEF #18605
Conversation
👋 Welcome back mli! A progress list of the required criteria for merging this PR into |
❗ This change is not yet ready to be integrated. |
@Hamlin-Li The following labels will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing lists. If you would like to change these labels, use the /label pull request command. |
Given the Sleef build system currently uses cmake, we would have two choices to build the header files as part of the OpenJDK build system:
Neither approach sound good to me as a mandatory option. However, if we are to allow the person building OpenJDK to optionally generate the headers from a Sleef source checkout (provided by the user with a |
If we want the traceability (which I agree is good) of the SLEEF source code but want to avoid having it in the jdk repo itself (adding unnecessary "bloat" for everybody), perhaps we can consider having it in a separate repository somewhere in/under It's not immediately clear to me that we need to have support in the JDK build system (configure/make) itself for building/updating the header files, as long as there's a simple, documented way of doing so. I like to think the |
Currently,
With these 2 points, seems the traceability is fine to me, please kindly point out if I missed some points. Maybe we can add some more clear and specific information in README or createSleef.sh in #19185 to indicate which version of sleef source we're using in jdk. I'm also fine with your suggestion to add whole sleef repo into jdk (maybe we can remove some of files, but we can ignore the difference temporarily in the dicussion here). To copy the sleef repo into jdk, we still need to pre-generate the inline header files, and check them in jdk along with the sleef repo, I think you also think so too (As without checking in these inline headers, we will have to bring some extra dependencies into jdk, and increase extra compilation time when building jdk). But from traceability point of view, seems to me it does not bring extra benefit than current #19185. For example, if someone want to verify the pre-generate inline headers in jdk, he need to first verify the sleef source in jdk, then the pre-generated sleef inline headers. How do you think about it? |
I think the key question is whether we're comfortable relying on/pointing at an external repository which may or may not be there tomorrow and/or where tags may change outside of our control. The SLEEF source code looks to be around 7.5MB, give or take. That's not enormous, but it's not exactly small when keeping in mind that if we Hence my suggestion to consider putting it under our control, but in a separate |
I don't think that anyone is proposing to do that, so we can discount it altogether.
Mmm, but we don't need to do that. |
Right. We should adopt best practice, both from an Open Source compliance point of view and (from a security, traceability, and binary reproduceability point of view) with regard to the xz backdoor hack.
That ticks many of the boxes, as long as we can be sure to tag everything. But from a space point of view I'm not sure it's compelling. After all, we've recently decided to use branches rather than separate repos for releases, which is a good idea because it keeps everything together, but it does increase the repo size for everyone. It would be very nice if Git allowed a subset of the repo to be checked out, but as far as I can see it doesn't. Before checkout, the OpenJDK repo is 1.4G. After checkout that's 2.1G. So, about 0.7G of that is the JDK source code, if you include the file system overhead. 7.5Mb doesn't sound excessive when you consider that SLEEF potentially provides vectorized routines for many OpenJDK targets. It's not just about AArch64. This is starting to sound like we need a policy decision, because we don't want to re-hash this discussion every time the question comes up, as it surely will. For me, that supplying preprocessed source code without real source is known bad practice, even to the extent of being expressly forbidden in the open source definition, is a slam-dunk argument. But clearly that argument doesn't work for everyone. Maybe something to be discussed at the workshop? |
Right, but think about package builders. This isn't about J Random Hacker doing it by hand. When a package gets built, the builder machine unpacks source code. If SLEEF is included as part of JDK source, all the builder has to do is run the script and overwrite whatever preprocessed source is in there. The alternative is packaging the SLEEF source code tarball separately in the OpenJDK source package. Sure, all of this can be done, but it's a question of whether we do it once, here, now, or all the downstream builders have to do it themselves.
That script must be checked in to the OpenJDK tree.
Yes.
You don't need to verify the pre-generated inline headers, just overwrite them. The point is that the sleef source is digitally signed, not just by the SLEEF maintainers, but by OpenJDK as well. This is not a small thing. |
@theRealAph Thanks for clarification. I think there are several different parts involved in the above discussion, please kindly correct me if I misunderstood.
For the package builders, original sleef source is necessary; for the jdk daily development, only pre-generated sleef inline headers are necessary. The script to pre-generate sleef inline headers is only triggerred by package builders (and I think it involves some scripts which are not part of jdk source ? e.g. the script to trigger pre-generating script), but for jdk daily development, we just need pre-generated sleef inline headers. |
may be
Yes, most of the time. Some devs will want to be more thorough.
No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.
Yes, most of the time. Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations. |
Obviously we need to include pre-generated sources in the repo so that most people can just build the library using sanctioned code without needing to regenerate anything. I absolutely agree with @theRealAph that we need to have all relevant SLEEF header build scripts in the OpenJDK repo so that anyone who want to rebuild the headers can do so. I don't believe it is just packagers who will want to do that and it is good open source practice to allow and, where possible, make it easy for anyone to do so. Given the size of the original SLEEF sources I also agree with @theRealAph it is no great burden to include them in the jdk repo. However, I am not averse to @vidmik's alternative of putting the sources in an openjdk/sleef repo. That would be fine so long as the openjdk repo includes SLEEF build scripts that pull a determinate hash to generate the headers. Likewise I agree with @vidmik's suggestion of omitting the extra packages the SLEEF generate step requires from the standard configure/make scripts would be fine so long as the SLEEF build scripts prompt users on what to install. We don't want to force everyone to install packages that they don't need. But we do still need to make it straightforward for those who do want to regenerate the sources to achieve that goal. |
+1 to this if we don't already have one While I haven't read through every comment in this thread in this specific case I generally agree with what @theRealAph has said in some of his earlier comments. My primary concern is that the generated code in there is currently effectively unreviewable in terms of checking for potential vulnerabilities so I also feel it's best to check in the whole (reviewable) source if this PR is to be accepted. Much as I dislike repository bloat I think it's a fairly easy decision in this case IMHO with SLEEF being 7.5MB in size when the openjdk codebase is so large. An alternative "absolute minimum" would be to reference the GitHub SHA of the SLEEF source and include the process for regenerating it reproducibly so that this information is available to anyone who wanted to verify it. With my distributor (Temurin) hat on either of those solutions would mean we have the original source referenced for inclusion in the product SBOM to track the supply chain. I'll also note that I'm also making an assumption here that the generated code from SLEEF is reproducible and not sensitive to the build environment like the CDS archives - I have not tried building them myself to verify but I feel that is important to understand before merging the generated code. As a project should also consider whole issue of ensuring that we have sufficient trust from a supply-chain perspective on the SLEEF source ... I have no specific reason to distrust it but it might be good to understand how well reviewed it is before doing this as it's not a project I'm personally familiar with. On a slightly separate note (and I see @luhenry is in this comment thread too and has contributed to SLEEF) it will be good if this can be used to enhance the performance on RISC-V too in the future ;-) |
@theRealAph I see, I think now I understand the whole picture of your concerns. Thanks!
Based on @vidmik 's previous comments, I think we all agree original sleef source should be added into jdk, including pre-generated sleef inline headers, the only different opinions between us are about how to include sleef source into jdk, one is to just add it into jdk repo itself, another is to put it in another repo which is under control of jdk. Please kindly correct me if I misunderstood. I have not particular preference which options to take. My only concern is how long it will take to make that decision. If it could take rather long time, can we take several incremental steps to achieve the final goal? e.g.
I think we have plenty time to achieve the final goal in jdk-24. How do you think about it? @theRealAph @vidmik @luhenry @magicus @erikj79 |
We already had a prototype which depends on this pr, and the performance gain is promising. |
may be
Yes, most of the time. Some devs will want to be more thorough.
No: all of the scripts to generate the preprocessed source from the SLEEF source must in the OpenJDK source.
Yes, most of the time. Bear in mind that convenient daily development of OpenJDK is important, because we don't want to discourage developers. But we've never treated the size of the repo as one of our primary considerations.
We're only a couple of weeks away from the summit. What would be a long time? |
OK, then let's wait for it. |
It is possible to regenerate
I was able to extract the shell and C preprocessing steps from the upstream CMake-based build system (by adding This branch shows various approaches; ideas include:
Whenever the OpenJDK SLEEF source code copies were updated, one would also check for changes in the upstream CMake steps. |
Really nice work, Thanks!
Compared to current implementation in #19185, my bit concern about This branch is the future maintainence effort when we need to update the sleef source along with the cmake changes, also when new platforms support of sleef are added in jdk. In another hand, I'm not sure if This branch qualify the traceability requirement discussed above. |
That's a fair point. However, it's probably less work than any adequate alternative proposed thus far.
I'm sure it's fine: we have readable source code in the preferred form, along with a script that generates it from the corresponding SLEEF release. |
To check this, I added the I had intended to factor out After that, there was one build step divergence for The two |
Thanks for your effort, this is much better. Just one question in my mind. If there is no major refactoring in sleef in the future, I think we're fine. In case there is such refactoring in sleef's implementation, the maintanance will not be a minor work, as in This branch we need to migrate some process inside sleef into jdk? And I think we can move the discussion about This branch to #19185, as finally this part of code will be pushed into jdk via that pr (because of legal process reason), I hope persons involved in that pr do not miss the discussion and information here. |
@Hamlin-Li This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration! |
depending on #19185 which is still in progress..., so change this as a draft |
Now JDK-8329816 is finally integrated, which means the libsleef source code is present and available in the JDK repo. I would recommend that you close this PR, and start over once again (third time's a charm!), with just the changes needed to get the libsleef conection into Hotspot. This PR, just like the other ones, have been so cluttered that it is hard to understand what and how to review. |
@magicus Thank you for the effort! |
Hi,
Can you help to review the patch?
This pr is based on previous work and discussion in pr 16234, pr 18294.
Compared with previous prs, the major change in this pr is to integrate the source of sleef (for the steps, please check
src/jdk.incubator.vector/linux/native/libvectormath/README
), rather than depends on external sleef things (header or lib) at build or run time.Besides of this change, also modify the previous changes accordingly, e.g. remove some uncessary files or changes especially in make dir of jdk.
Besides of the code changes, one important task is to handle the legal process.
Thanks!
Test
tests:
options:
Performance
Options
Float
data
Double
data
Backup of previous test summary
NOTE:
Src
means implementation in this pr, i.e. without depenency on external sleef.Disabled
means disable intrinsics by-XX:-UseVectorStubs
system_sleef
means implementation in previous pr 18294, i.e. build and run jdk with depenency on external sleef.Basically, the perf data below shows that
Progress
Issue
Contributors
<[email protected]>
Reviewing
Using
git
Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/18605/head:pull/18605
$ git checkout pull/18605
Update a local copy of the PR:
$ git checkout pull/18605
$ git pull https://git.openjdk.org/jdk.git pull/18605/head
Using Skara CLI tools
Checkout this PR locally:
$ git pr checkout 18605
View PR using the GUI difftool:
$ git pr show -t 18605
Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/18605.diff
Webrev
Link to Webrev Comment