-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
twister: DeviceHandler DUT selection improvements and fixes #75548
twister: DeviceHandler DUT selection improvements and fixes #75548
Conversation
3d8187f
to
f85f3dc
Compare
dear reviewers, up to your attention |
@@ -473,7 +473,7 @@ def device_is_available(self, instance): | |||
d.counter_increment() | |||
avail = True | |||
logger.debug(f"Retain DUT:{d.platform}, Id:{d.id}, " | |||
f"counter:{d.counter}") | |||
f"counter:{d.counter}, failures:{d.failures}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pre my experiences, failure count is not that matter, instead the error count matters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are necessary, as hardware failure oft shows as test timeouts, which are failed
statuses, not error
s.
At the same time, I'd rank them separately; a separate failures
and errors
counters, which would be used in sorting, with errors given a higher priority (sorted()
's key
accepts tuples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change counts both 'error' and 'failed' test instance statuses for each DUT as a failure situation happened there not going deeper into details of what was the origin. It is a single counter to avoid overcomplication, but to stimulate retry attempts on different DUTs choosing less 'suspicious' one as a candidate for the next round. This way it works for any root cause not known at the moment - either at a board, a harness, or a test, or any their combination.
@@ -484,8 +484,10 @@ def device_is_available(self, instance): | |||
return None | |||
|
|||
def make_dut_available(self, dut): | |||
if self.instance.status in ["error", "failed"]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to fix this problem twister needs to consider 'failed' test instances at a DUT:
.. for instance to resolve ploblems when some DUTs have connectivity or HW issues slowing down test plan execution, or even block the execution when only one test suite runs whereas the same first DUT candidate in the list is not working and others were not chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd approve, were it not for the hash.
if not dut_found: | ||
raise TwisterException(f"No device to serve as {device} platform.") | ||
|
||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A stray hash.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LukaszMrugala do you have any other objections ?
@@ -473,7 +473,7 @@ def device_is_available(self, instance): | |||
d.counter_increment() | |||
avail = True | |||
logger.debug(f"Retain DUT:{d.platform}, Id:{d.id}, " | |||
f"counter:{d.counter}") | |||
f"counter:{d.counter}, failures:{d.failures}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are necessary, as hardware failure oft shows as test timeouts, which are failed
statuses, not error
s.
At the same time, I'd rank them separately; a separate failures
and errors
counters, which would be used in sorting, with errors given a higher priority (sorted()
's key
accepts tuples)
f85f3dc
to
89f455e
Compare
89f455e
to
a10967a
Compare
rebased |
dear reviewers, up to your attention |
a10967a
to
890829b
Compare
just rebased again |
890829b
to
a518bd0
Compare
and rebased adjusting to #71401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It works on CI. LGTM
Several improvements at Twister DeviceHandler when it releases current DUT (Device Under Test): - release the exact DUT which is used for the test instance instead of all configured DUTs which happened to have the same serial device configured. - Twister PyTtest harness plugin adjustment to the above. - additional debug logging to track DUT waiting/retain/release. Signed-off-by: Dmitrii Golovanov <[email protected]>
Fix Twister DeviceHandler exit on SerialException when it connects to the serial device in 'flash before' mode. Signed-off-by: Dmitrii Golovanov <[email protected]>
Twister DeviceHandler - make DUT use counter increment operation atomic. Signed-off-by: Dmitrii Golovanov <[email protected]>
Twister DeviceHandler - add test failure counter for how many test instances have been failed on each DUT (Device Under Test) when it executes the current test plan. Output DUT falure counter summary at the end of Twister run. Signed-off-by: Dmitrii Golovanov <[email protected]>
Change Twister PyTest plugin's test finalizing sequence to release the DUT it is used as the very last operation, after the Test Instance status becomes fully updated from the execution results. This also fix a race condition possible when pytest plugin releases the DUT and it becomes acquired by another test while the current test is not yet finalized completely. Signed-off-by: Dmitrii Golovanov <[email protected]>
Improve DUT selection at DeviceHandler: for each DUT it counts how many test instances have been failed on it during the current twister execution, so the next available DUT will be chosen ordering the eligible DUTs by less failures occured so far. The new selection mechanism should increase chances to retry failed tests on different DUTs, for instance to resolve ploblems when some DUTs have connectivity or HW issues slowing down test plan execution, or even block the execution when only one test suite runs whereas the same first DUT candidate in the list is not working and others were not chosen. Signed-off-by: Dmitrii Golovanov <[email protected]>
a518bd0
to
e40c7c7
Compare
Several improvements and fixes for how Twister DeviceHandler selects and releases a DUT having multiple candidates for a test instance execution and taking into account failures.
See individual commits for details.