Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Try to restart failed download on installation #292

Open
iloveeclipse opened this issue Jun 20, 2023 · 6 comments
Open

Try to restart failed download on installation #292

iloveeclipse opened this issue Jun 20, 2023 · 6 comments

Comments

@iloveeclipse
Copy link
Member

See eclipse-platform/eclipse.platform.releng.aggregator#1075, especially this comment that indicates that our automated installation fails often because of instability of download.eclipse.org server.

It would be nice if we fail to download some artifact for installation, to retry this operation few times. This would help us to get stable SDK test publishing, without me every morning checking and re-triggering collectResults job.

@merks : is this the area of p2 you may be familiar with?

@merks
Copy link
Contributor

merks commented Jun 20, 2023

To me the problem

The artifact file for osgi.bundle,org.eclipse.e4.core.di,1.9.0.v20230429-1914 was not found.

isn't caused by a failure to download the artifact but a failure to find the artifact metadata for that artifact key.

One fundamental concern here is that these update sites are not necessarily "stable"

Especially the first one can be changing its contents on the fly at any point in time and I'm not sure how "atomic" those update to the server actually are. Also, the server will often serve up cached content for a while; at least that was my past experience. So one might see a newer content.jar but an older artifacts.jar (or vice versa) causing exactly this type of problem.

I'm not sure what p2 can really retry here... It's not as if anything failed in the transport layers. Is just inconsistency either at the time of the requests or as served up by the server for a short period of time.

@iloveeclipse
Copy link
Member Author

Especially the first one can be changing its contents on the fly at any point in time

The collectResults job runs hours after we've created new SDK build. I assume the publishing to https://download.eclipse.org/eclipse/updates/4.29-I-builds/ does not need hours, so at the time collectResults job is executed everything should be "stable".

Also, the server will often serve up cached content for a while; at least that was my past experience.

How a "stale cache" can be a problem here, what could change for artifacts that are published once? Or do you mean, some (older) artifacts are deleted while we install?

@sravanlakkimsetti, @akurtakov : how do we "maintain" https://download.eclipse.org/eclipse/updates/4.29-I-builds/ - do we have some script / job that deletes old artifacts after uploading new one?

@laeubi
Copy link
Member

laeubi commented Jun 20, 2023

I also see similar issues when an update-site is currently updated while a build is running as explained by @merks .
That is because it is quite impossible to update a p2 site in an atomic way "on the fly", what one can do is:

  1. the site itself must be a composite
  2. upload the new content
  3. add it to the composite
  4. after a while (e.g. one day) delete the old content from the composite

still there is a small chance that compositeContent.xml is updated before compositeArtifacts.xml and a build has already one file and see stale content but its very small time-window.

Regarding caching the eclipse-servers last time do not respond very well in regards to caching but it is "intentional", also there is caching at P2 as well, and if Tycho is used there is also another caching...

@akurtakov
Copy link
Member

Cleaning is done by https://ci.eclipse.org/releng/job/Cleanup/job/dailyCleanOldBuilds/ . I have never dug into the topic more so that's all the help I can provide here.

@sravanlakkimsetti
Copy link
Member

sravanlakkimsetti commented Jun 20, 2023

@sravanlakkimsetti, @akurtakov : how do we "maintain" https://download.eclipse.org/eclipse/updates/4.29-I-builds/ - do we have some script / job that deletes old artifacts after uploading new one?

The contents are overwritten when you run the collectResults job.

Regarding the old builds, We have a cleanup script that deletes old builds leaving Monday's build in https://download.eclipse.org/eclipse/downloads/

In case of https://download.eclipse.org/eclipse/updates/4.29-I-builds/ we have last two successful builds as part of composite. the build adds new build and cleanup is done by https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/blob/master/cje-production/cleaners/cleanupNightlyRepo.sh

@iloveeclipse
Copy link
Member Author

The contents are overwritten when you run the collectResults job.

You mean local contents, not I-Build repo?

Cleaning is done by https://ci.eclipse.org/releng/job/Cleanup/job/dailyCleanOldBuilds/

This runs at 4 am / pm and shouldn't run in parallel at same time collectResults job runs / failed.

So if I see it right, the IBuild repo is not "touched" during collectResults job execution and the two other repos are "too old" to be updated by anyone in parallel. So the instability must be coming from download.eclipse.org server.

With that, we are back to question if we can do something in p2 land to handle instability of metadata/artifacts download server during installation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants