-
Notifications
You must be signed in to change notification settings - Fork 242
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[hailctl] update to dataproc 2.2 and Spark 3.5.0 #14158
Conversation
|
||
implicitly[BinaryRegistry[ | ||
DenseMatrix[Double], | ||
Vector[Double], | ||
OpMulMatrix.type, | ||
DenseVector[Double], | ||
]].register( | ||
DenseMatrix.implOpMulMatrix_DMD_DVD_eq_DVD | ||
HasOps.impl_OpMulMatrix_DMD_DVD_eq_DVD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see #13971 (comment)
My team is pretty excited about hail being released with support for Spark 3.5. One thing I noticed is that it looks like the plan is to restrict to Spark 3.5.0 - would it be possible to allow some wiggle room for minor releases? Spark has been beginning to release upgrades much more often than in the past, so restricting to 3.5.0 will prevent access to bug fixes, feature enhancements, etc. |
@zyd14 that's just a warning message printed during compilation. The requirements.txt file accepts any 3.5.x version. |
I had to pin nbconvert<7.14 due to a new bug in nbconvert. |
@@ -232,7 +232,7 @@ async def async_map( | |||
) -> AsyncGenerator[int, None]: | |||
"""Aysncio compatible version of :meth:`.map`.""" | |||
if not iterables: | |||
return (x for x in range(0)) | |||
return the_empty_async_generator() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the lint didn't like that I used an empty generator where an async generator was expected.
Since the dataproc tests only run on main commits (not on every PR commit, due to cost), I submitted a dev deploy to test the latest commit to this branch against dataproc: https://ci.hail.is/batches/8119055
|
Previous dev deploy passed. Rebased on main and submitted a new dev deploy: https://ci.hail.is/batches/8120683 |
@patrick-schultz note changes to build.gradle - is this compatible with your work to use mill? |
case javaVersion(major, _, _) => | ||
if (major.toInt != 11) | ||
fatal(s"Hail requires Java 8 or 11, found $versionString") | ||
warn(s"Hail is tested against Java 11, found $versionString") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't fully grok what we were trying to accomplish here.
You can't possibly get to these lines if the JVM version is lower than the byte code, so JVMs <=7 would never execute this. AFAICT, Spark now supports all LTS versions of Java (currently: 8, 11, 17), so I see no strong reason to prohibit any version. Instead, I simply warn if you are using a version of the JVM other than the one with which we test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In particular:
Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+. Java 8 prior to version 8u371 support is deprecated as of Spark 3.5.0.
I updated batch/Dockerfile.worker and docker/Dockerfile.base. The former controls which JVM is active at QoB execution time and the latter controls which JVM is active at compilation time. Note that in build.gradle we still target JVM 8 byte code, so Hail should work with other versions.
@ehigham ready for re-review now with Java 11 changes. |
And here's a dev deploy that runs the dataproc tests. Don't approve until these tests pass! We don't run them on ordinary PRs because they're expensive and slow. We do run them on main commits. For this PR, the chance of having broken dataproc is high enough that we should ensure the tests pass before merging into main. |
Delay merging until broadinstitute/install-gcs-connector#6 is merged. Without that PR, users will not have access to a version of the GCS Hadoop connector that does not use tons of memory in JVM 11. |
Rebased and dev deploy kicked off: https://ci.hail.is/batches/8122588 |
broadinstitute/install-gcs-connector#6 is merged. |
// WARNING WARNING WARNING | ||
// Before changing the breeze version review: | ||
// - https://hail.zulipchat.com/#narrow/stream/123011-Hail-Query-Dev/topic/new.20spark.20ndarray.20failures/near/41645 | ||
// - https://github.com/hail-is/hail/pull/11555 | ||
val core = ivy"org.scalanlp::breeze:1.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears we've actually been using breeze 1.2, because spark-mllib pulls it in:
❯ mill ivyDepsTree --withCompile --withRuntime --whatDependsOn org.scalanlp:breeze_2.12 (base)
[17/17] ivyDepsTree
└─ org.scalanlp:breeze_2.12:1.2
├─ org.apache.spark:spark-mllib-local_2.12:3.3.0
│ ├─ org.apache.spark:spark-graphx_2.12:3.3.0
│ │ └─ org.apache.spark:spark-mllib_2.12:3.3.0
│ └─ org.apache.spark:spark-mllib_2.12:3.3.0
├─ org.apache.spark:spark-mllib_2.12:3.3.0
└─ org.scalanlp:breeze-natives_2.12:1.1 org.scalanlp:breeze_2.12:1.1 -> 1.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the compileIvyDeps line not exclude them? I want to let Spark use their thing and relocate breeze for our own internal purposes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I missed that, that does let us use breeze 1.1. And I now realize even without that, because this is a compile only dependency, we would build with breeze 1.2 but run with 1.1.
@patrick-schultz @ehigham I'm abdicating responsibility for this. I've too much to wrap up in the next five work days. It looks like there's a mill issue currently. Otherwise I think it should be ready to merge. |
OK, I'm not sure how to fix this but the work is to explain to the GCS Hadoop Connector which credentials we want it to use. See the failure here: https://batch.hail.is/batches/8136069/jobs/49 . It uses CI's credentials instead of the test credentials. We use core-site.xml to do this in Spark <3.5, but the GCS connector is different in Spark 3.5 and it uses different configuration parameters. My most recent change did not successfully configure it. Daniel G can help you a bit with credentials in Batch if that's necessary but the real work is to figure out how to tell the GCS Hadoop Connector to use the /gsa-key/key.json file. |
Matt S fortuitously asked a question that lead me to https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.0.0/gcs/INSTALL.md , so I'm trying that now. That might be the last necessary fix. |
Just the local backend is broken now. I might've just fixed that. |
OK, I won't be able to fix this. @ehigham @patrick-schultz @daniel-goldstein some combo of you three can probably figure it out. The local backend tests that hit requester pays buckets are failing with new Spark. New Spark needs new GCS hadoop connector (see the Dockerfiles). New GCS hadoop connector has brand new configuration parameters. Somehow I managed to make the normal Spark backend work correctly but the Local backend (which still, afaik, uses Spark & Hadoop for filesystems) is still trying to pick up CI's credentials instead of the test account's credentials.
|
<property> | ||
<name>google.cloud.auth.service.account.json.keyfile</name> | ||
<value>/gsa-key/key.json</value> | ||
</property> | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These properties look wrong based on this example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what to say - seems to work in ci?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is what Install.md
recommends
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shrug guess it's fine then
b0b96c5
to
b863f72
Compare
Fixes hail-is#13971 CHANGELOG: Hail now supports and primarily tests against Dataproc 2.2.5, Spark 3.5.0, and Java 11. We strongly recommend updating to Spark 3.5.0 and Java 11. You should also update your GCS connector after installing Hail: curl https://broad.io/install-gcs-connector | python3. Do not try to update before installing Hail 0.2.128. https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good with this, just one minor request, and a resolution on @daniel-goldstein 's question above.
Also, need to update the hail version in the changelog text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just change the hail version in the changelog text before we merge it?
Sorry, missed that bit about the changelog |
Fixes #13971
CHANGELOG: Hail now supports and primarily tests against Dataproc 2.2.5, Spark 3.5.0, and Java 11. We strongly recommend updating to Spark 3.5.0 and Java 11. You should also update your GCS connector after installing Hail:
curl https://broad.io/install-gcs-connector | python3
. Do not try to update before installing Hail 0.2.131.https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2