-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple SWT/UI test failures on Windows since I20240923-0040 or I20240918-0950 #1486
Comments
From the recent results => https://download.eclipse.org/eclipse/downloads/drops4/I20240924-1810/testResults.php We dont see any of these failures. |
Because we don't see Windows test results yet |
in
the same occured in https://download.eclipse.org/eclipse/downloads/drops4/I20240926-0020/testresults/consolelogs/ep434I-unit-win32-java17_win32.win32.x86_64_17_consolelog.txt locally for me NoFreezeWhileWaitingForRuleTest runs fine. |
I've reopened https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410. |
In SWT & JFace tests I see following error reported multiple times:
See
Could it be, it is our (SDK) issue and not Windows machine problem? |
Is there a full stack trace available? |
No |
that split package was recently moved from perfromance to eclipse.test @akurtakov |
There was only one o.e.test.AwtScreenshot (in o.e.test bundle) and it gets only additions after the split package removal eclipse-platform/eclipse.platform.releng.aggregator@6e6a136#diff-4234e8d1a72f1bd53d546bc92ce417bac8b765082c1728d2a9356a74ef507c7e . I fail to see a relation for now. |
org.eclipse.test.performance had bundle shape jar, while org.eclipse.test has |
for "Eclipse-BundleShape: dir" in MANIFEST.MF fix "Error: Could not find or load main class org.eclipse.test.AwtScreenshot" eclipse-platform/eclipse.platform.swt#1486
for "Eclipse-BundleShape: dir" in MANIFEST.MF fix "Error: Could not find or load main class org.eclipse.test.AwtScreenshot" eclipse-platform/eclipse.platform.swt#1486
for "Eclipse-BundleShape: dir" in MANIFEST.MF fix "Error: Could not find or load main class org.eclipse.test.AwtScreenshot" eclipse-platform/eclipse.platform.swt#1486
was unrelated to the many fails- now its gone:
The screenshots of the failed tests are now available for download: however they seem to be not helpfull (just black screen) |
Black screen is a symptom that someone has logged in to the host via RDP and then disconnected normally. Ask to reboot the host and not to touch RDP. I don't have time to look for a better source, so here is relevant instructional article for a random product. https://docs.testarchitect.com/user-guide/support/frequently-asked-questions/disconnecting-from-remote-desktop-while-executing-automated-tests/ Note, that locked out GUI sessions skip (do not emit) a few types of system events like Paint, so tests done on "black screen" are invalid. |
That would mean that there was a causing change in the period of time in which the failures first occured (this is, after 17th September). Some test failures already occur in SWT tests (i.e., at the beginning of the dependency chain). In SWT itself, there was only a single change in the period of time in which tests started to fail, and it seems unlikely that this change has caused the problems. I am not sure if changes to the other projects/bundles may also affect the SWT tests in I-Builds, since even the SWT tests are run in an Equinox environment against all the SDK dependencies, aren't they?
While those errors appeared together with the discussed test failures and thus there will probably be some relation, they do not seem to be an indicator for a root cause. Those error appear in the logs after several of the failing tests (in particular the SWT tests) have already been executed. The SWT tests run without any of those "no more handles" errors. With respect to a potential cause in the infrastructure, no further actions seem to be taken according to the helpdesk issue: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410, e.g., in response to the comments of Vasili (https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2832799) or Jörg (https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2829466). In order to isolate the root cause (in particular SDK vs. infrastructure): would it be possible (with low effort) to temporarily (for testing purposes) move the Windows job on a different node? E.g., there is still the Windows 10 node (https://ci.eclipse.org/releng/computer/rs68g%2Dwin10/). |
Hurra, we don't have Windows test failures anymore... because tests don't run anymore on Windows... |
Oh dear. That is one way to address the problem. |
I adressed the problem with the failing ui tests with frederric Gurr in person. They promised infrastructur Team will help to work on identify the root problem - if we can configure the failing Job to fail much faster, for example by executing the failing test only. I however do only know how we could change the TestSuite but not know how to disable the suits of the other bundles. Ideas needed. Is there a Parameter to only test swt or Platform.ui during the job?
Jörg Kubitz
… Am 21.10.2024 um 21:07 schrieb Andrey Loskutov ***@***.***>:
Hurra, we don't have Windows test failures anymore... because tests don't run anymore on Windows...
see eclipse-platform/eclipse.platform.releng.aggregator#2468
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you commented.
|
I suggest to create a dedicated test that would fail only if no PAINT events are happening and run that in a loop. Verify that it fails on Windows, when the console running the test is locked. checkswtpaintevent.zip are sources for a simple Eclipse application that opens and repeatedly resizes a window to receive paint events from OS. It exits with a non-zero exit code when paint events are not received as expected. So far I was unable to reproduce the missing events using Windows 11 VM on aarch64 on locked screen and on Windows 11 VM x86_64 by disconnecting RDP. Prebuilt: |
Apart from recent consistent 38 failures on windows, there is a new test failure for the test case
I think this if reattempted/rerun should pass is my understanding. Am i correct? |
One can replay the latest I-build test job run for Windows, i.e. https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/ and set the If one does that ideally the publication of the test results is skipped to not remove all other results from the overview page of the previous build (i.e. delete the call of Releng/ep-collectResults). I did all this for https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/84/, lets see how long this runs. |
Michael Keppler @Bananeweizen do you mind to help? https://ci.eclipse.org/releng/job/AutomatedTests/job/ep434I-unit-win32-java17/84/ took 2 hours. |
As pointed by @jukzi , the problem we observe on Windows is most likely related to https://bugs.openjdk.org/browse/JDK-8336862. So we should downgrade JDK on Windows to |
Since we are somehow stuck it may be worth a try. i will create a gitlab |
@iloveeclipse JDK-8336862 only takes place if Jenkins agent is run as system service and there is no active console present (read - auto login is not configured and/or RDP is used). Surely, this can't be the case on test machines? Because all kinds of problems will appear even on older Java if test host is misconfigured. SWT tests have to run in an environment equivalent to a usual desktop user. No system services, no strange disconnected displays. Otherwise test results are not representative. |
I have not a slightest idea how Windows test machines are configured and how jenkins is started, sorry. Better discuss that with our IT guys on gitlab ticket. |
I created https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5213 |
I've left instructions in https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410#note_2832799 (2 weeks ago) I'm open to chat. But if those instructions are already applied, I have no idea what's going on either. And last time I've configured a Jenkins agent personally was on Windows 7 (I've been blessed with nice IT support since), so this configuration would require a lot of searching. |
please double check it. https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5214 |
Bug https://bugs.openjdk.org/browse/JDK-8336862 could be the reason for (all/some?) recent Windows test failures. Let try the previous version... See eclipse-platform/eclipse.platform.swt#1486
Bug https://bugs.openjdk.org/browse/JDK-8336862 could be the reason for (all/some?) recent Windows test failures. Let try the previous version... See eclipse-platform/eclipse.platform.swt#1486
I'm not sure if anyone involved in this lengthy thread dealt with this yet:
That's an error thrown by Windows if a process has acquired 10.000 resource handles. This can happen easily when creating icons, mouse cursors etc. and not freeing their related OS resource handles. When dealing with such an error locally, I typically run https://the-sz.com/products/bear/ in parallel to the test execution and check which kind of resource handle is growing and how the lost handles look like. That's often sufficient to identify some piece of code for debugging (e.g. in our company product we had hundreds of unreleased "close" icons when I did this last). I didn't ever have to use this with an automated pipeline, therefore I'm not sure what would be a good setup for doing this in parallel to the automated eclipse windows tests. |
see eclipse-platform/eclipse.platform.ui#2379 |
This is not visible in SWT tests (which started to fail at same time too). In UI tests where this error appear, looking at the log files, it seem to be a side effect of previous UI/test errors. See
|
For completeness, same message is confusingly reported for most failures to allocate system resources, including invalid arguments and invalid system state. So while leaks are the most frequent reason, they are not the only one. |
Nope. Still same fails in SWT tests :-( |
tests fail but we now have screenshots (which is what you changed): they all blank black though. |
:-) |
@eclipse-platform/eclipsefdn-releng : anyone knows what happened to Windows test machine / tests? Since https://download.eclipse.org/eclipse/downloads/drops4/I20241028-1330/ we have only 8 jdt.ui fails and no other UI related Windows test failures anymore: |
No, see my comment here: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/5214#note_2847073 |
Any objections on closing this, since the tests are running fine again? |
Thanks to every involved person, its a big relief. |
Indeed. Thank you very much! |
Ente gut, alles gut :) |
And here we go again. |
See https://download.eclipse.org/eclipse/downloads/drops4/I20240923-1800/testresults/html/org.eclipse.swt.tests_ep434I-unit-win32-java17_win32.win32.x86_64_17.html
Last known good state: I20240917-1800.
After that we didn't had any Windows test executed.
They started to run again with I20240923-0040, showing 38 test failures in SWT and many Jface & platform UI tests.
Restarting Windows test machine didn't help so far: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/issues/4410
The text was updated successfully, but these errors were encountered: