Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hailctl] update to dataproc 2.2 and Spark 3.5.0 #14158

Merged
merged 5 commits into from
Apr 11, 2024

Conversation

danking
Copy link
Contributor

@danking danking commented Jan 12, 2024

Fixes #13971

CHANGELOG: Hail now supports and primarily tests against Dataproc 2.2.5, Spark 3.5.0, and Java 11. We strongly recommend updating to Spark 3.5.0 and Java 11. You should also update your GCS connector after installing Hail: curl https://broad.io/install-gcs-connector | python3. Do not try to update before installing Hail 0.2.131.

https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2

@danking danking added the WIP label Jan 12, 2024
@danking danking changed the title Dataproc 2.2 [hailctl] update to dataproc 2.2 Jan 12, 2024

implicitly[BinaryRegistry[
DenseMatrix[Double],
Vector[Double],
OpMulMatrix.type,
DenseVector[Double],
]].register(
DenseMatrix.implOpMulMatrix_DMD_DVD_eq_DVD
HasOps.impl_OpMulMatrix_DMD_DVD_eq_DVD
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@danking danking removed the WIP label Jan 12, 2024
@zyd14
Copy link

zyd14 commented Jan 19, 2024

My team is pretty excited about hail being released with support for Spark 3.5. One thing I noticed is that it looks like the plan is to restrict to Spark 3.5.0 - would it be possible to allow some wiggle room for minor releases? Spark has been beginning to release upgrades much more often than in the past, so restricting to 3.5.0 will prevent access to bug fixes, feature enhancements, etc.

@danking
Copy link
Contributor Author

danking commented Feb 1, 2024

@zyd14 that's just a warning message printed during compilation. The requirements.txt file accepts any 3.5.x version.

@danking
Copy link
Contributor Author

danking commented Feb 2, 2024

I had to pin nbconvert<7.14 due to a new bug in nbconvert.

jupyter/nbconvert#2092

@@ -232,7 +232,7 @@ async def async_map(
) -> AsyncGenerator[int, None]:
"""Aysncio compatible version of :meth:`.map`."""
if not iterables:
return (x for x in range(0))
return the_empty_async_generator()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the lint didn't like that I used an empty generator where an async generator was expected.

@danking
Copy link
Contributor Author

danking commented Feb 2, 2024

Since the dataproc tests only run on main commits (not on every PR commit, due to cost), I submitted a dev deploy to test the latest commit to this branch against dataproc: https://ci.hail.is/batches/8119055

hailctl dev deploy -b danking/hail:dataproc-2.2 -s test_dataproc-37 -s test_dataproc-38

@danking danking changed the title [hailctl] update to dataproc 2.2 [hailctl] update to dataproc 2.2 and Spark 3.5.0 Feb 2, 2024
@danking
Copy link
Contributor Author

danking commented Feb 5, 2024

Previous dev deploy passed. Rebased on main and submitted a new dev deploy: https://ci.hail.is/batches/8120683

@ehigham
Copy link
Member

ehigham commented Feb 5, 2024

@patrick-schultz note changes to build.gradle - is this compatible with your work to use mill?

ehigham
ehigham previously approved these changes Feb 5, 2024
@danking danking added the WIP label Feb 5, 2024
case javaVersion(major, _, _) =>
if (major.toInt != 11)
fatal(s"Hail requires Java 8 or 11, found $versionString")
warn(s"Hail is tested against Java 11, found $versionString")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't fully grok what we were trying to accomplish here.

You can't possibly get to these lines if the JVM version is lower than the byte code, so JVMs <=7 would never execute this. AFAICT, Spark now supports all LTS versions of Java (currently: 8, 11, 17), so I see no strong reason to prohibit any version. Instead, I simply warn if you are using a version of the JVM other than the one with which we test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In particular:

Spark runs on Java 8/11/17, Scala 2.12/2.13, Python 3.8+, and R 3.5+. Java 8 prior to version 8u371 support is deprecated as of Spark 3.5.0.

https://spark.apache.org/docs/latest/#downloading

@danking danking dismissed ehigham’s stale review February 5, 2024 22:28

I updated batch/Dockerfile.worker and docker/Dockerfile.base. The former controls which JVM is active at QoB execution time and the latter controls which JVM is active at compilation time. Note that in build.gradle we still target JVM 8 byte code, so Hail should work with other versions.

@danking danking removed the WIP label Feb 6, 2024
@danking
Copy link
Contributor Author

danking commented Feb 6, 2024

@ehigham ready for re-review now with Java 11 changes.

@danking
Copy link
Contributor Author

danking commented Feb 6, 2024

And here's a dev deploy that runs the dataproc tests. Don't approve until these tests pass! We don't run them on ordinary PRs because they're expensive and slow. We do run them on main commits. For this PR, the chance of having broken dataproc is high enough that we should ensure the tests pass before merging into main.

https://ci.hail.is/batches/8121061

@danking
Copy link
Contributor Author

danking commented Feb 7, 2024

Delay merging until broadinstitute/install-gcs-connector#6 is merged. Without that PR, users will not have access to a version of the GCS Hadoop connector that does not use tons of memory in JVM 11.

@danking
Copy link
Contributor Author

danking commented Feb 7, 2024

Rebased and dev deploy kicked off: https://ci.hail.is/batches/8122588

@danking
Copy link
Contributor Author

danking commented Feb 7, 2024

broadinstitute/install-gcs-connector#6 is merged.

Comment on lines +79 to 86
// WARNING WARNING WARNING
// Before changing the breeze version review:
// - https://hail.zulipchat.com/#narrow/stream/123011-Hail-Query-Dev/topic/new.20spark.20ndarray.20failures/near/41645
// - https://github.com/hail-is/hail/pull/11555
val core = ivy"org.scalanlp::breeze:1.1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears we've actually been using breeze 1.2, because spark-mllib pulls it in:

❯ mill ivyDepsTree --withCompile --withRuntime --whatDependsOn org.scalanlp:breeze_2.12                                                        (base)
[17/17] ivyDepsTree
└─ org.scalanlp:breeze_2.12:1.2
   ├─ org.apache.spark:spark-mllib-local_2.12:3.3.0
   │  ├─ org.apache.spark:spark-graphx_2.12:3.3.0
   │  │  └─ org.apache.spark:spark-mllib_2.12:3.3.0
   │  └─ org.apache.spark:spark-mllib_2.12:3.3.0
   ├─ org.apache.spark:spark-mllib_2.12:3.3.0
   └─ org.scalanlp:breeze-natives_2.12:1.1 org.scalanlp:breeze_2.12:1.1 -> 1.2

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the compileIvyDeps line not exclude them? I want to let Spark use their thing and relocate breeze for our own internal purposes.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I missed that, that does let us use breeze 1.1. And I now realize even without that, because this is a compile only dependency, we would build with breeze 1.2 but run with 1.1.

@danking
Copy link
Contributor Author

danking commented Feb 22, 2024

@patrick-schultz @ehigham I'm abdicating responsibility for this. I've too much to wrap up in the next five work days. It looks like there's a mill issue currently. Otherwise I think it should be ready to merge.

@danking
Copy link
Contributor Author

danking commented Feb 23, 2024

OK, I'm not sure how to fix this but the work is to explain to the GCS Hadoop Connector which credentials we want it to use. See the failure here: https://batch.hail.is/batches/8136069/jobs/49 . It uses CI's credentials instead of the test credentials. We use core-site.xml to do this in Spark <3.5, but the GCS connector is different in Spark 3.5 and it uses different configuration parameters. My most recent change did not successfully configure it.

Daniel G can help you a bit with credentials in Batch if that's necessary but the real work is to figure out how to tell the GCS Hadoop Connector to use the /gsa-key/key.json file.

@danking
Copy link
Contributor Author

danking commented Feb 23, 2024

Matt S fortuitously asked a question that lead me to https://github.com/GoogleCloudDataproc/hadoop-connectors/blob/v3.0.0/gcs/INSTALL.md , so I'm trying that now. That might be the last necessary fix.

@danking danking removed the WIP label Feb 23, 2024
@danking
Copy link
Contributor Author

danking commented Feb 27, 2024

Just the local backend is broken now. I might've just fixed that.

@danking
Copy link
Contributor Author

danking commented Feb 28, 2024

OK, I won't be able to fix this. @ehigham @patrick-schultz @daniel-goldstein some combo of you three can probably figure it out. The local backend tests that hit requester pays buckets are failing with new Spark. New Spark needs new GCS hadoop connector (see the Dockerfiles). New GCS hadoop connector has brand new configuration parameters. Somehow I managed to make the normal Spark backend work correctly but the Local backend (which still, afaik, uses Spark & Hadoop for filesystems) is still trying to pick up CI's credentials instead of the test account's credentials.

E           hail.utils.java.FatalError: GoogleJsonResponseException: 403 Forbidden
E           GET https://storage.googleapis.com/storage/v1/b/hail-test-requester-pays-fds32/o/zero-to-nine?fields=bucket,name,timeCreated,updated,generation,metageneration,size,contentType,contentEncoding,md5Hash,crc32c,metadata&userProject=hail-vdc
E           {
E             "code": 403,
E             "errors": [
E               {
E                 "domain": "global",
E                 "message": "[email protected] does not have serviceusage.services.use access to the Google Cloud project. Permission 'serviceusage.services.use' denied on resource (or it may not exist).",
E                 "reason": "forbidden"
E               }
E             ],
E             "message": "[email protected] does not have serviceusage.services.use access to the Google Cloud project. Permission 'serviceusage.services.use' denied on resource (or it may not exist)."
E           }
E           
E           Java stack trace:
E           java.io.IOException: Error accessing gs://hail-test-requester-pays-fds32/zero-to-nine
E           	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getObject(GoogleCloudStorageImpl.java:1986)
E           	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageImpl.getItemInfo(GoogleCloudStorageImpl.java:1882)
E           	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystemImpl.getFileInfoInternal(GoogleCloudStorageFileSystemImpl.java:861)
E           	at com.google.cloud.hadoop.repackaged.gcs.com.google.cloud.hadoop.gcsio.GoogleCloudStorageFileSystemImpl.getFileInfo(GoogleCloudStorageFileSystemImpl.java:833)
E           	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.getFileStatus(GoogleHadoopFileSystem.java:724)
E           	at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:115)
E           	at org.apache.hadoop.fs.Globber.doGlob(Globber.java:349)
E           	at org.apache.hadoop.fs.Globber.glob(Globber.java:202)
E           	at org.apache.hadoop.fs.FileSystem.globStatus(FileSystem.java:2142)
E           	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.globStatus(GoogleHadoopFileSystem.java:759)
E           	at com.google.cloud.hadoop.fs.gcs.GoogleHadoopFileSystem.globStatus(GoogleHadoopFileSystem.java:1277)
E           	at is.hail.io.fs.HadoopFS.glob(HadoopFS.scala:162)
E           	at is.hail.io.fs.HadoopFS.glob(HadoopFS.scala:85)
E           	at is.hail.io.fs.FS.glob(FS.scala:402)
E           	at is.hail.io.fs.FS.glob$(FS.scala:402)
E           	at is.hail.io.fs.HadoopFS.glob(HadoopFS.scala:85)
E           	at is.hail.io.fs.HadoopFS.$anonfun$globAll$1(HadoopFS.scala:154)
E           	at is.hail.io.fs.HadoopFS.$anonfun$globAll$1$adapted(HadoopFS.scala:153)

<property>
<name>google.cloud.auth.service.account.json.keyfile</name>
<value>/gsa-key/key.json</value>
</property>

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These properties look wrong based on this example

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still wrong?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know what to say - seems to work in ci?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what Install.md recommends

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrug guess it's fine then

@ehigham ehigham force-pushed the dataproc-2.2 branch 2 times, most recently from b0b96c5 to b863f72 Compare March 20, 2024 01:57
Dan King and others added 4 commits April 11, 2024 11:22
Fixes hail-is#13971

CHANGELOG: Hail now supports and primarily tests against Dataproc 2.2.5, Spark 3.5.0, and Java 11. We strongly recommend updating to Spark 3.5.0 and Java 11. You should also update your GCS connector after installing Hail: curl https://broad.io/install-gcs-connector | python3. Do not try to update before installing Hail 0.2.128.

https://cloud.google.com/dataproc/docs/concepts/versioning/dataproc-release-2.2
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, just one minor request, and a resolution on @daniel-goldstein 's question above.

Also, need to update the hail version in the changelog text

hail/Makefile Outdated Show resolved Hide resolved
Copy link
Collaborator

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you just change the hail version in the changelog text before we merge it?

@ehigham
Copy link
Member

ehigham commented Apr 11, 2024

I'm good with this, just one minor request, and a resolution on @daniel-goldstein 's question above.

Also, need to update the hail version in the changelog text

Sorry, missed that bit about the changelog

@hail-ci-robot hail-ci-robot merged commit bd0156d into hail-is:main Apr 11, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[query] Support Spark 3.5.x
6 participants