Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SSL EOF occurred in violation of protocol when syncing dist folder to R2 bucket #3559

Closed
flakey5 opened this issue Nov 5, 2023 · 21 comments
Closed

Comments

@flakey5
Copy link
Member

flakey5 commented Nov 5, 2023

Hey all! As a part of #3461, the /home/dist/ folder started to be synced into a R2 bucket. The behavior was added in #3505.

It was working fine for nightly builds up until it came time to promote v18.18.1, where it threw a bunch of SSL errors reported here #3508 (comment).

The error is due to EOF occurred in violation of protocol. I am not sure what exactly the issue is, however I do have some theories:

  1. A bug/other issue with the openssl version installed on the DO server (doesn't really explain why the nightly builds were being successfully synced however)
  2. Different users on the DO server have different CA certs installed. The automated user that runs the uploads of the nightly builds has the correct CA certs installed while other users may not, leading to SSL errors for only certain users.

Unfortunately since the sync script was failing it had to be disabled, so the R2 bucket is becoming more and more stale. This issue also blocks #3461 entirely since we're no longer getting new builds added to the bucket.

Ideally we would be able to figure out what is causing the issue, fix it, and then re-enable the sync script for new releases in addition to syncing the downloads that were added over the course of syncing being disabled

cc @ovflowd @MoLow

@MoLow
Copy link
Member

MoLow commented Nov 6, 2023

Il try too manually tackle this in a similar flow as the releasers would use

@targos
Copy link
Member

targos commented Nov 7, 2023

I just tried to re-enable the CF upload and ran the nightly promoter manually. It uploaded a few files without any errors.

@targos
Copy link
Member

targos commented Nov 7, 2023

I ran the aws s3 sync command manually (for the releases directory) with the dist user and it worked fine too.

@targos
Copy link
Member

targos commented Nov 7, 2023

Then I ran it again for the entire downloads directory (as root this time) and after some time it failed:

upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/webcrypto.json to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canar
y2023101280f36ed46e/docs/api/webcrypto.json
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/apilinks.json to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023
101280f36ed46e/docs/apilinks.json
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/webstreams.html to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-cana
ry2023101280f36ed46e/docs/api/webstreams.html
upload failed: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/all.html to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-cana
ry2023101280f36ed46e/docs/api/all.html SSL validation failed for https://07be8d2fbc940503ca1be344714cb0d1.r2.cloudflarestorage.com/dist-prod/nodejs/v8
-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/all.html EOF occurred in violation of protocol (_ssl.c:2427)
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/url.md to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023101
280f36ed46e/docs/api/url.md
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/test.html to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023
101280f36ed46e/docs/api/test.html
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/tls.json to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary20231
01280f36ed46e/docs/api/tls.json
upload: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/docs/api/stream.json to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary20
23101280f36ed46e/docs/api/stream.json
upload failed: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-canary2023101280f36ed46e-darwin-arm64.tar.xz to s3:/
/dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-canary2023101280f36ed46e-darwin-arm64.tar.xz SSL validation failed for
 https://07be8d2fbc940503ca1be344714cb0d1.r2.cloudflarestorage.com/dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-cana
ry2023101280f36ed46e-darwin-arm64.tar.xz?uploadId=AIwJYC_cE_IgDxJdC0Ww7Ejd4r6aoFPyrlydoyaxcn-UslhX0_iMuKdJ11_Ltp0rQ7OPBXKdl4BN4arv3NxjWrtr8sTryyD5reXR
qsLHZMmZoAYHYYa26wqABBmCAJ73m-FWO_QOkabxla51OqEdLej_cpSDzzH2qjrz2pVOwVbURPvsZy-RFUaHSb_qIJV_oYVFvLeyTN-QR_1lGFLd0mkStQjHNk8eQbCFVwzZOxhlAhmBuuxCPKIGav
HRuKonmIQIsTZKCPLJ7pkw3EssFyVrfLn-NBF-RoDwYKVO4rLjzMkfJFjE46HsArluHcA_9Qnzw-Qp1CRuWRlVvri9JU4&partNumber=2 EOF occurred in violation of protocol (_ssl
.c:2427)
upload failed: ../../../dist/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-canary2023101280f36ed46e-darwin-x64.tar.gz to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-canary2023101280f36ed46e-darwin-x64.tar.gz SSL validation failed for https://07be8d2fbc940503ca1be344714cb0d1.r2.cloudflarestorage.com/dist-prod/nodejs/v8-canary/v21.0.0-v8-canary2023101280f36ed46e/node-v21.0.0-v8-canary2023101280f36ed46e-darwin-x64.tar.gz?uploadId=ADa1-VvKGrLbgXAlHQdZW0uBHFV58cg3gI08PkOYp8d4u-nFIG17DHIl1Z42YAMQhHcIpLeoJuZXWT8KgWNYoPO6gNVHOEFYiE4i4oEMlvsWQoEC1EblIVeDr63e9eJI1bIe4JY3tP8Sqbg1bm716pshV25rQAnJV_lB_rqPOVyTcd99hRudy3XQW6iREjb2yFTAFASjekmOw0jhmCWZLwO_OuJNBG12zAQVfJ_ZRRH48K0ULaYfc41UIjXWYSAj2IjLGQdNULwffGYEiwcEOK_apHuAiNldQDjJsub0c4AybLDzHlDniWLYw79SZC4khTLyArCK1YXqCV_GNZ6BMo0&partNumber=2 EOF occurred in violation of protocol (_ssl.c:2427)

@targos
Copy link
Member

targos commented Nov 7, 2023

I don't know how to verify it, but I think it's probably caused by the old version of OpenSSL that's available on the server.

@flakey5
Copy link
Member Author

flakey5 commented Nov 7, 2023

Did the entire sync started with the dist user succeed?

@targos
Copy link
Member

targos commented Nov 7, 2023

Yes, it did.

@flakey5
Copy link
Member Author

flakey5 commented Nov 7, 2023

Would it be possible to update the openssl version on the server without risking breaking anything?

@richardlau
Copy link
Member

I suspect not -- usually the openssl version is tightly coupled with the Linux distribution version.

@flakey5
Copy link
Member Author

flakey5 commented Nov 7, 2023

I suspect not -- usually the openssl version is tightly coupled with the Linux distribution version.

Ahh I see,, just in case can we try syncing the entire downloads directory as the dist user just to see if that works for some reason?

@targos
Copy link
Member

targos commented Nov 7, 2023

We can try, but the initial issue (#3508 (comment)) happened with the dist user.

@targos
Copy link
Member

targos commented Nov 7, 2023

I just started a new sync with dist.

@targos
Copy link
Member

targos commented Nov 7, 2023

Happens with dist too:

upload: nodejs/v8-canary/v21.0.0-v8-canary202310163027b0e12d/node-v21.0.0-v8-canary202310163027b0e12d-x86.msi to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary202310163027b0e12d/node-v21.0.0-v8-canary202310163027b0e12d-x86.msi
upload failed: nodejs/v8-canary/v21.0.0-v8-canary202310163027b0e12d/node-v21.0.0-v8-canary202310163027b0e12d.tar.xz to s3://dist-prod/nodejs/v8-canary/v21.0.0-v8-canary202310163027b0e12d/node-v21.0.0-v8-canary202310163027b0e12d.tar.xz SSL validation failed for https://07be8d2fbc940503ca1be344714cb0d1.r2.cloudflarestorage.com/dist-prod/nodejs/v8-canary/v21.0.0-v8-canary202310163027b0e12d/node-v21.0.0-v8-canary202310163027b0e12d.tar.xz?uploadId=ANJHPZbyrSJSgzJcpT71HTqrMtVDsnbdXEsyAa1TSwWUZ8Lf2c0bXdxI1a2reX6VrEM4Txu6laGIJnmmH-lQqqjLQzB2I1jdn0PVTJogHy2TSnQ9VRwcGH5JdIW9i6RhJ5Xk-2oYbGAaRy3H8cmhJrr4kCeSBFshcbCHaBChETr-dhS4xkO5BrcqLv1xMmF-vuvimjCdRATTU7AnHzbAdue07lQkJ7W900UP6RgY7cCsvAZFHX04_mNx8hgczt6qQNXa-ZT9ETHLQAO2qkQc2qSw06C0AZwsrXGOZVTxPXWjFlBA1DFzG97ioIawMobDpR4mqFOqhRL0EfwY36R9Y6w&partNumber=1 EOF occurred in violation of protocol (_ssl.c:2427)

@targos
Copy link
Member

targos commented Nov 7, 2023

Note that there technically is an openssl update available, but we don't have access to it as it requires Ubuntu Pro

@flakey5
Copy link
Member Author

flakey5 commented Nov 7, 2023

If it is due to the openssl version on the server, I wonder if we should explore adding a step in CI for the build servers to sync the new builds now rather than later. Especially since these errors are happening at random it seems instead of under specific circumstances.

I believe @MoLow was originally going to do that, but I don't remember if there were any problems he encountered or if he decided it was better for now to just do the sync from the server. I'm not talking about removing the upload to the DO server (we should definitely keep uploading there even after we switch to using the worker fully) but just as another step the build machines do during a release CI run.

@richardlau
Copy link
Member

I believe @MoLow was originally going to do that, but I don't remember if there were any problems he encountered or if he decided it was better for now to just do the sync from the server.

That was most probably me raising an objection as it required installing the s3 client on all of the release machines, but maybe that is the way to go. One of my concerns for releases from #3508 (comment) is that, even if the errors didn't occur, the releaser is basically sitting there waiting for all the release assets to be uploaded from the DO server to R2 (with very little feedback) -- ideally the proposed release assets would already be staged in R2 and the manually triggered promotion step would mark/move them (as well as upload the signed shasums) to dist-prod.

I'm not talking about removing the upload to the DO server (we should definitely keep uploading there even after we switch to using the worker fully)

+1. For now we need to continue to upload to the DO server as the backup scripts pull from the DO server.

@targos
Copy link
Member

targos commented Nov 8, 2023

Server was upgraded to Ubuntu 22.04 (#3564).
I did a full sync and didn't see any errors. Let's consider this fixed?

@ovflowd
Copy link
Member

ovflowd commented Nov 8, 2023

We're running a few more runs, just to rule out the fluke scenario.

@targos
Copy link
Member

targos commented Nov 8, 2023

In case it matters, everytime we do a full sync, the SHASUM256.txt files are all reuploaded:

...
upload: ../home/dist/nodejs/release/v9.4.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.4.0/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.6.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.6.0/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.5.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.5.0/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.6.1/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.6.1/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.7.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.7.0/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.7.1/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.7.1/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.8.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.8.0/SHASUMS256.txt
upload: ../home/dist/nodejs/release/v9.9.0/SHASUMS256.txt to s3://dist-prod/nodejs/release/v9.9.0/SHASUMS256.txt

@targos
Copy link
Member

targos commented Nov 8, 2023

We're running a few more runs, just to rule out the fluke scenario.

Did two more runs. It copied a few V8 canary files. All good.

@MoLow
Copy link
Member

MoLow commented Nov 8, 2023

Thanks so much for this. closing

@MoLow MoLow closed this as completed Nov 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants