Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve errors handling while downloading p2 artifacts from repository #4276

Merged
merged 1 commit into from
Sep 15, 2024

Conversation

ptziegler
Copy link
Contributor

In case an artifact couldn't be downloaded due to e.g. a bad mirror, Equinox returns an RETRY error code. This code is currently ignored and Tycho simply fails with an I/O exception.
Instead, Tycho attempts to download the artifact up to three times (assuming that this error code was returned), before failing.

@ptziegler
Copy link
Contributor Author

ptziegler commented Sep 14, 2024

Follow-up to the discussion in eclipse-windowbuilder/windowbuilder#882.

We still get build errors due to a bad mirror, which look something like:

Error:  Failed to execute goal org.eclipse.tycho:tycho-baseline-plugin:4.0.9-SNAPSHOT:verify (compare-version-with-baseline) on project org.eclipse.wb.runtime: Unknown error: Download failed: Status ERROR: org.eclipse.equinox.p2.artifact.repository code=13 Retry another mirror children=[Status ERROR: org.eclipse.tycho.p2maven.transport.TychoRepositoryTransport code=0 download from https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.runtime_1.9.3.202402012120.jar failed java.io.FileNotFoundException: https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.runtime_1.9.3.202402012120.jar] -> [Help 1]
Error:  Failed to execute goal org.eclipse.tycho:tycho-baseline-plugin:4.0.9-SNAPSHOT:verify (compare-version-with-baseline) on project org.eclipse.wb.core.databinding.xsd: Unknown error: Download failed: Status ERROR: org.eclipse.equinox.p2.artifact.repository code=13 Retry another mirror children=[Status ERROR: org.eclipse.tycho.p2maven.transport.TychoRepositoryTransport code=0 download from https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.core.databinding.xsd_1.0.100.202405231800.jar failed java.io.FileNotFoundException: https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.core.databinding.xsd_1.0.100.202405231800.jar] -> [Help 1]

I sadly don't have a good way to test this issue locally, as the mirror I usually get is up-to-date.

@@ -47,6 +47,8 @@
@Component(role = P2RepositoryManager.class)
public class P2RepositoryManager {

private static final int RETRY_ATTEMPTS = 3;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not overly happy with this magic number. Perhaps there's a better way to define the number of retries? Perhaps it should even be configurable?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can define a property and then use MavenPropertyHelper (don't forget to document the property here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@laeubi laeubi added the backport-to-tycho-4.0.x Can be added to a PR to trigger an automatic backport of the change label Sep 14, 2024
@laeubi
Copy link
Member

laeubi commented Sep 14, 2024

I sadly don't have a good way to test this issue locally, as the mirror I usually get is up-to-date.

Yes this is very hard (and annoying) to test...

Basically if one wants to go that way look at org.eclipse.tycho.test.target.OfflineModeTest where we use HttpServer util class to simulate some things, in this case it would need:

  1. A P2 site with a mirror list
  2. this mirror list should include two mirrors
  3. then download from that p2 repo
  4. Make the mirrors behave that the (global) first request fails and the second suceeds

That way P2 should see one failing and one succeeding request and in the end the download should work + maybe seeing a message in the logs.

@ptziegler ptziegler marked this pull request as draft September 14, 2024 07:41
Copy link

github-actions bot commented Sep 14, 2024

Test Results

  603 files  +3    603 suites  +3   4h 3m 12s ⏱️ - 24m 34s
  430 tests +1    423 ✅ +1   7 💤 ±0  0 ❌ ±0 
1 290 runs  +3  1 268 ✅ +3  22 💤 ±0  0 ❌ ±0 

Results for commit b615532. ± Comparison against base commit c7bd924.

♻️ This comment has been updated with latest results.

@ptziegler ptziegler marked this pull request as ready for review September 14, 2024 12:22
@ptziegler
Copy link
Contributor Author

ptziegler commented Sep 14, 2024

I sadly don't have a good way to test this issue locally, as the mirror I usually get is up-to-date.

Yes this is very hard (and annoying) to test...

Basically if one wants to go that way look at org.eclipse.tycho.test.target.OfflineModeTest where we use HttpServer util class to simulate some things, in this case it would need:

1. A P2 site with a mirror list

2. this mirror list should include two mirrors

3. then download from that p2 repo

4. Make the mirrors behave that the (global) first request fails and the second suceeds

That way P2 should see one failing and one succeeding request and in the end the download should work + maybe seeing a message in the logs.

I fear 4. is not really possible. I've created a test which -in principle- works, except that I can't reliably pick the mirror over the source repository.

The problem is this method here, which effectively picks a random mirror:

private MirrorInfo selectMirror(IProgressMonitor monitor) {
	initMirrors(monitor);
	final int mirrorCount;
	if (mirrors == null || (mirrorCount = mirrors.length) == 0)
		return null;


	MirrorInfo selected;
	if (mirrorCount == 1)
		selected = mirrors[0];
	else {
		Arrays.sort(mirrors, getComparator());
		for (;;) {
			//this is a function that randomly selects a mirror based on a logarithmic
			//distribution. Mirror 0 has a 1/2 chance of being selected, mirror 1 has a 1/4 chance,
			// mirror 2 has a 1/8 chance, etc. This introduces some variation in the mirror
			//selection, while still heavily favoring better mirrors
			//the algorithm computes the most significant digit in a binary number by computing the base 2 logarithm
			//if the first digit is most significant, mirror 0 is selected, if the second is most significant, mirror 1 is selected, etc
			int highestMirror = min(15, mirrorCount);
			int result = (int) (Math.log(random.nextInt(1 << highestMirror) + 1) / LOG2);
			if (result >= highestMirror || result < 0)
				result = highestMirror - 1;


			int mirrorIndex = highestMirror - 1 - result;
			selected = mirrors[mirrorIndex];


			// Only choose a mirror from the best 50% of the top 15 of all mirrors
			if (mirrorIndex <= (mirrorCount * 0.5d))
				// This is good enough
				break;
		}
	}


	//for now, don't tolerate mirrors with multiple failures
	if (selected.failureCount > 1)
		return null;
	return selected;
}

But if you run the test often enough, you eventually see the following warning, which is exactly what I wanted to add:

[INFO] Downloading from p2: http://localhost:39215/baseline/p2.index
[INFO] Adding repository http://localhost:39215/baseline
[INFO] Downloading from p2: http://localhost:39215/baseline/p2.index
[INFO] Adding repository http://localhost:39215/baseline
[INFO] Downloading from p2: http://localhost:39215/mirror/plugins/bundle1_1.0.0.jar
[WARNING] Artifact repository requested retry (attempt [1/3]): 'Status ERROR: org.eclipse.equinox.p2.artifact.repository code=13 Retry another mirror children=[Status ERROR: org.eclipse.osgi code=0 download from http://localhost:39215/mirror/plugins/bundle1_1.0.0.jar failed java.io.FileNotFoundException: http://localhost:39215/mirror/plugins/bundle1_1.0.0.jar]'
[INFO] Downloading from p2: http://localhost:39215/baseline/plugins/bundle1_1.0.0.jar
[INFO] Downloaded from p2: http://localhost:39215/baseline/plugins/bundle1_1.0.0.jar (417 bytes at 407 KB/s)
[INFO] No baseline problems found.

@laeubi
Copy link
Member

laeubi commented Sep 14, 2024

I fear 4. is not really possible. I've created a test which -in principle- works, except that I can't reliably pick the mirror over the source repository.

You can add a new method similar to org.eclipse.tycho.test.util.HttpServer.addServer(String, File) but allow to specify a servlet class. Then you can implement whatever behaviour you want (e.g. using a static global boolean flag).

Another option would be to allow registering a Filter, as filters can intercept the calls to servlets and e.g. return 404 for the first request.

In case an artifact couldn't be downloaded due to e.g. a bad mirror,
p2 returns an RETRY error code. This code is currently ignored and
Tycho simply fails with an I/O exception.
Instead, Tycho attempts to download the artifact up to three times
(assuming that this error code was returned), before failing. This value
can be configured using the 'eclipse.p2.maxDownloadAttempts' system
property.
@ptziegler
Copy link
Contributor Author

You can add a new method similar to org.eclipse.tycho.test.util.HttpServer.addServer(String, File) but allow to specify a servlet class. Then you can implement whatever behaviour you want (e.g. using a static global boolean flag).

Another option would be to allow registering a Filter, as filters can intercept the calls to servlets and e.g. return 404 for the first request.

In principle,yes. Though Tycho does a lot of caching, which quickly led to situations where only the initial request was sent. The only way to reliably get around this problem was by using -U, to always get the data from the remote repository.

Copy link
Member

@laeubi laeubi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks very good and thanks for the testcase great work!

@laeubi laeubi merged commit 8462a37 into eclipse-tycho:main Sep 15, 2024
10 of 11 checks passed
@eclipse-tycho-bot
Copy link

💚 All backports created successfully

Status Branch Result
tycho-4.0.x

Note: Successful backport PRs will be merged automatically after passing CI.

Questions ?

Please refer to the Backport tool documentation and see the Github Action logs for details

@ptziegler ptziegler deleted the retry-handling branch September 16, 2024 19:21
@ptziegler
Copy link
Contributor Author

I've just tried out the latest nightly build and this fix already prevented the first crash! 😄

2024-09-16T19:00:15.2506685Z [WARNING] Artifact repository requested retry (attempt [1/3]): 'Status ERROR: org.eclipse.equinox.p2.artifact.repository code=13 Retry another mirror children=[Status ERROR: org.eclipse.tycho.p2maven.transport.TychoRepositoryTransport code=0 download from https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.swing.jsr296_1.9.700.202407131015.jar failed java.io.FileNotFoundException: https://mirrors.jevincanders.net/eclipse/releases/2024-09/202409111000/plugins/org.eclipse.wb.swing.jsr296_1.9.700.202407131015.jar]'

@laeubi Thanks for pointing me in the right direction, especially with the unit test!

@merks
Copy link
Contributor

merks commented Sep 16, 2024

Thanks both of you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-to-tycho-4.0.x Can be added to a PR to trigger an automatic backport of the change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants