-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Try to restart failed download on installation #292
Comments
To me the problem
isn't caused by a failure to download the artifact but a failure to find the artifact metadata for that artifact key. One fundamental concern here is that these update sites are not necessarily "stable"
Especially the first one can be changing its contents on the fly at any point in time and I'm not sure how "atomic" those update to the server actually are. Also, the server will often serve up cached content for a while; at least that was my past experience. So one might see a newer content.jar but an older artifacts.jar (or vice versa) causing exactly this type of problem. I'm not sure what p2 can really retry here... It's not as if anything failed in the transport layers. Is just inconsistency either at the time of the requests or as served up by the server for a short period of time. |
The collectResults job runs hours after we've created new SDK build. I assume the publishing to https://download.eclipse.org/eclipse/updates/4.29-I-builds/ does not need hours, so at the time collectResults job is executed everything should be "stable".
How a "stale cache" can be a problem here, what could change for artifacts that are published once? Or do you mean, some (older) artifacts are deleted while we install? @sravanlakkimsetti, @akurtakov : how do we "maintain" https://download.eclipse.org/eclipse/updates/4.29-I-builds/ - do we have some script / job that deletes old artifacts after uploading new one? |
I also see similar issues when an update-site is currently updated while a build is running as explained by @merks .
still there is a small chance that Regarding caching the eclipse-servers last time do not respond very well in regards to caching but it is "intentional", also there is caching at P2 as well, and if Tycho is used there is also another caching... |
Cleaning is done by https://ci.eclipse.org/releng/job/Cleanup/job/dailyCleanOldBuilds/ . I have never dug into the topic more so that's all the help I can provide here. |
The contents are overwritten when you run the collectResults job. Regarding the old builds, We have a cleanup script that deletes old builds leaving Monday's build in https://download.eclipse.org/eclipse/downloads/ In case of https://download.eclipse.org/eclipse/updates/4.29-I-builds/ we have last two successful builds as part of composite. the build adds new build and cleanup is done by https://github.com/eclipse-platform/eclipse.platform.releng.aggregator/blob/master/cje-production/cleaners/cleanupNightlyRepo.sh |
You mean local contents, not I-Build repo?
This runs at 4 am / pm and shouldn't run in parallel at same time collectResults job runs / failed. So if I see it right, the IBuild repo is not "touched" during collectResults job execution and the two other repos are "too old" to be updated by anyone in parallel. So the instability must be coming from download.eclipse.org server. With that, we are back to question if we can do something in p2 land to handle instability of metadata/artifacts download server during installation? |
See eclipse-platform/eclipse.platform.releng.aggregator#1075, especially this comment that indicates that our automated installation fails often because of instability of download.eclipse.org server.
It would be nice if we fail to download some artifact for installation, to retry this operation few times. This would help us to get stable SDK test publishing, without me every morning checking and re-triggering collectResults job.
@merks : is this the area of p2 you may be familiar with?
The text was updated successfully, but these errors were encountered: