twister: DeviceHandler DUT selection improvements and fixes #75548

golowanow · 2024-07-07T11:04:35Z

Several improvements and fixes for how Twister DeviceHandler selects and releases a DUT having multiple candidates for a test instance execution and taking into account failures.
See individual commits for details.

golowanow · 2024-07-17T20:48:29Z

dear reviewers, up to your attention

hakehuang · 2024-07-18T03:14:50Z

scripts/pylib/twister/twisterlib/handlers.py

@@ -473,7 +473,7 @@ def device_is_available(self, instance):
                d.counter_increment()
                avail = True
                logger.debug(f"Retain DUT:{d.platform}, Id:{d.id}, "
-                             f"counter:{d.counter}")
+                             f"counter:{d.counter}, failures:{d.failures}")


pre my experiences, failure count is not that matter, instead the error count matters.

Both are necessary, as hardware failure oft shows as test timeouts, which are failed statuses, not errors.
At the same time, I'd rank them separately; a separate failures and errors counters, which would be used in sorting, with errors given a higher priority (sorted()'s key accepts tuples)

The proposed change counts both 'error' and 'failed' test instance statuses for each DUT as a failure situation happened there not going deeper into details of what was the origin. It is a single counter to avoid overcomplication, but to stimulate retry attempts on different DUTs choosing less 'suspicious' one as a candidate for the next round. This way it works for any root cause not known at the moment - either at a board, a harness, or a test, or any their combination.

hakehuang · 2024-07-18T03:15:44Z

scripts/pylib/twister/twisterlib/handlers.py

@@ -484,8 +484,10 @@ def device_is_available(self, instance):
        return None

    def make_dut_available(self, dut):
+        if self.instance.status in ["error", "failed"]:


same as above

to fix this problem twister needs to consider 'failed' test instances at a DUT:

.. for instance to resolve ploblems when some DUTs have connectivity or HW issues slowing down test plan execution, or even block the execution when only one test suite runs whereas the same first DUT candidate in the list is not working and others were not chosen.

LukaszMrugala

I'd approve, were it not for the hash.

LukaszMrugala · 2024-07-25T08:42:15Z

scripts/pylib/twister/twisterlib/handlers.py

-        if not dut_found:
-            raise TwisterException(f"No device to serve as {device} platform.")
-
+        #


A stray hash.

@LukaszMrugala do you have any other objections ?

LukaszMrugala · 2024-07-25T09:27:51Z

scripts/pylib/twister/twisterlib/handlers.py

@@ -473,7 +473,7 @@ def device_is_available(self, instance):
                d.counter_increment()
                avail = True
                logger.debug(f"Retain DUT:{d.platform}, Id:{d.id}, "
-                             f"counter:{d.counter}")
+                             f"counter:{d.counter}, failures:{d.failures}")


Both are necessary, as hardware failure oft shows as test timeouts, which are failed statuses, not errors.
At the same time, I'd rank them separately; a separate failures and errors counters, which would be used in sorting, with errors given a higher priority (sorted()'s key accepts tuples)

golowanow · 2024-08-01T17:55:41Z

rebased

golowanow · 2024-08-06T08:14:36Z

dear reviewers, up to your attention

golowanow · 2024-08-12T06:36:12Z

dear reviewers, up to your attention

just rebased again

golowanow · 2024-08-13T07:12:50Z

dear reviewers, up to your attention

just rebased again

and rebased adjusting to #71401

katgiadla

It works on CI. LGTM

Several improvements at Twister DeviceHandler when it releases current DUT (Device Under Test): - release the exact DUT which is used for the test instance instead of all configured DUTs which happened to have the same serial device configured. - Twister PyTtest harness plugin adjustment to the above. - additional debug logging to track DUT waiting/retain/release. Signed-off-by: Dmitrii Golovanov <[email protected]>

Fix Twister DeviceHandler exit on SerialException when it connects to the serial device in 'flash before' mode. Signed-off-by: Dmitrii Golovanov <[email protected]>

Twister DeviceHandler - make DUT use counter increment operation atomic. Signed-off-by: Dmitrii Golovanov <[email protected]>

Twister DeviceHandler - add test failure counter for how many test instances have been failed on each DUT (Device Under Test) when it executes the current test plan. Output DUT falure counter summary at the end of Twister run. Signed-off-by: Dmitrii Golovanov <[email protected]>

Change Twister PyTest plugin's test finalizing sequence to release the DUT it is used as the very last operation, after the Test Instance status becomes fully updated from the execution results. This also fix a race condition possible when pytest plugin releases the DUT and it becomes acquired by another test while the current test is not yet finalized completely. Signed-off-by: Dmitrii Golovanov <[email protected]>

Improve DUT selection at DeviceHandler: for each DUT it counts how many test instances have been failed on it during the current twister execution, so the next available DUT will be chosen ordering the eligible DUTs by less failures occured so far. The new selection mechanism should increase chances to retry failed tests on different DUTs, for instance to resolve ploblems when some DUTs have connectivity or HW issues slowing down test plan execution, or even block the execution when only one test suite runs whereas the same first DUT candidate in the list is not working and others were not chosen. Signed-off-by: Dmitrii Golovanov <[email protected]>

golowanow · 2024-08-14T18:01:43Z

dear reviewers, up to your attention

just rebased again

and rebased adjusting to #71401

.. and rebased adjusting to #76962

golowanow force-pushed the twister-dut-select_20240706 branch from 3d8187f to f85f3dc Compare July 8, 2024 20:16

golowanow marked this pull request as ready for review July 8, 2024 21:30

zephyrbot added the area: Twister Twister label Jul 8, 2024

zephyrbot requested review from gchwier, hakehuang, KamilxPaszkiet, LukaszMrugala, nashif and PerMac July 8, 2024 21:30

zephyrbot assigned nashif Jul 8, 2024

hakehuang reviewed Jul 18, 2024

View reviewed changes

LukaszMrugala requested changes Jul 25, 2024

View reviewed changes

golowanow force-pushed the twister-dut-select_20240706 branch from f85f3dc to 89f455e Compare August 1, 2024 12:49

golowanow requested review from LukaszMrugala and hakehuang August 1, 2024 12:50

golowanow force-pushed the twister-dut-select_20240706 branch from 89f455e to a10967a Compare August 1, 2024 17:55

LukaszMrugala previously approved these changes Aug 9, 2024

View reviewed changes

golowanow force-pushed the twister-dut-select_20240706 branch from a10967a to 890829b Compare August 12, 2024 06:34

golowanow dismissed LukaszMrugala’s stale review via a518bd0 August 13, 2024 07:11

golowanow force-pushed the twister-dut-select_20240706 branch from 890829b to a518bd0 Compare August 13, 2024 07:11

golowanow requested a review from LukaszMrugala August 13, 2024 07:13

katgiadla previously approved these changes Aug 13, 2024

View reviewed changes

hakehuang previously approved these changes Aug 13, 2024

View reviewed changes

golowanow added 5 commits August 14, 2024 19:59

twister: Fix DeviceHandler exit on exception at 'flash before'

b0f8742

Fix Twister DeviceHandler exit on SerialException when it connects to the serial device in 'flash before' mode. Signed-off-by: Dmitrii Golovanov <[email protected]>

twister: DeviceHandler make DUT counter increment atomic

5fe8dd6

Twister DeviceHandler - make DUT use counter increment operation atomic. Signed-off-by: Dmitrii Golovanov <[email protected]>

golowanow dismissed stale reviews from hakehuang and katgiadla via e40c7c7 August 14, 2024 18:00

golowanow force-pushed the twister-dut-select_20240706 branch from a518bd0 to e40c7c7 Compare August 14, 2024 18:00

golowanow requested review from hakehuang and katgiadla August 14, 2024 18:01

nashif approved these changes Aug 15, 2024

View reviewed changes

hakehuang approved these changes Aug 15, 2024

View reviewed changes

fabiobaltieri merged commit a4cb802 into zephyrproject-rtos:main Aug 15, 2024
28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

twister: DeviceHandler DUT selection improvements and fixes #75548

twister: DeviceHandler DUT selection improvements and fixes #75548

golowanow commented Jul 7, 2024 •

edited

Loading

golowanow commented Jul 17, 2024

hakehuang Jul 18, 2024

LukaszMrugala Jul 25, 2024

golowanow Aug 1, 2024

hakehuang Jul 18, 2024

golowanow Aug 1, 2024

LukaszMrugala left a comment

LukaszMrugala Jul 25, 2024

golowanow Aug 1, 2024

golowanow Aug 8, 2024

LukaszMrugala Jul 25, 2024

golowanow commented Aug 1, 2024

golowanow commented Aug 6, 2024

golowanow commented Aug 12, 2024

golowanow commented Aug 13, 2024

katgiadla left a comment

golowanow commented Aug 14, 2024

twister: DeviceHandler DUT selection improvements and fixes #75548

twister: DeviceHandler DUT selection improvements and fixes #75548

Conversation

golowanow commented Jul 7, 2024 • edited Loading

golowanow commented Jul 17, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

LukaszMrugala left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

golowanow commented Aug 1, 2024

golowanow commented Aug 6, 2024

golowanow commented Aug 12, 2024

golowanow commented Aug 13, 2024

katgiadla left a comment

Choose a reason for hiding this comment

golowanow commented Aug 14, 2024

golowanow commented Jul 7, 2024 •

edited

Loading