Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DFU mode: improve programming robustness. #98

Merged
merged 2 commits into from
Jan 20, 2021

Conversation

enrikb
Copy link
Contributor

@enrikb enrikb commented Sep 7, 2020

dfu.get_status() might return usb.core.USBError with errno == EPIPE,
when called in quick succession. This is the case when called from
dfu.block_on_state(). The next call usually works.

This change tries to catch this specific error and just re-tries in case
it happens.

Without this change, the error (and programming abort) might get
undetected, as click catches IOerror/EPIPE silently. This has
significant potential to brick a device.

This change addresses issue #97.

@nickray
Copy link
Member

nickray commented Sep 9, 2020

Thanks for thinking about this!

I did not understand the click comment, what exactly happens in case of EPIPE without your patch applied? I'm thinking here it might be good to have a maximum number of retries, but perhaps you're saying that click swallows EPIPE completely, so this patch is in any case strictly better?

@enrikb
Copy link
Contributor Author

enrikb commented Sep 9, 2020

I did not understand the click comment, what exactly happens in case of EPIPE without your patch applied?

As you can see here: https://github.com/pallets/click/blob/3984f9efce5a0d15f058e1abe1ea808c6abd243a/src/click/core.py#L944, click catches all OSError having errno == EPIPE. Effectively, it maps it to an exit code 1, but no exception output will be generated. (This makes sense for typical CLI commands, as an actual EPIPE will happen often.)

(In the version I have been using, it catches IOError, which now is an alias of OSError.)

When libusb1 returns a PIPE error, it raises USBError, which is derived from IOError (a.k.a OSError) also having errno == EPIPE.

So, when the error happens, without the patch, the programming stops immediately. However, there is no obvious feedback like a stacktrace, because this specific USBError is caught by click when handling the frequent CLI command event of writing to a closed pipe. Of course there will be no success message of the whole program/verify cycle. Unfortunately I did not know what to expect during my very first DFU programming try :(

Also, the error happens so often (4-5 times during one full flash programming cycle), that it was impossible for me to do a full flash program in DFU mode. I only managed to program the bootloader without the patch, after some retries.

I'm thinking here it might be good to have a maximum number of retries, but perhaps you're saying that click swallows EPIPE completely, so this patch is in any case strictly better?

You're right of course, some limit to avoid an endless loop would be good.

How would this be handled if s.state != state is never reached in dfu.block_on_state()? Is there a timeout or something?

@enrikb enrikb force-pushed the bugfix/click_epipe_vs_usberror_epipe branch from 2958b56 to 7350e45 Compare September 9, 2020 21:21
@enrikb
Copy link
Contributor Author

enrikb commented Sep 9, 2020

I have changed the approach to try the ctrl transfer inside get_status up to three times in case of USBError/EPIPE.

@nickray
Copy link
Member

nickray commented Sep 18, 2020

Thanks for the explanations! So this is about the something | head type of situation probably.

I'm happy to merge this as-is if you're happy :)

Regarding your original problem, I've never seen or heard of such frequent errors (we only had this kind of error in our deprecated web interface, which would interrupt flashing if users tabbed out). I wonder if it could be a bad connector, or what else could cause this.

@enrikb
Copy link
Contributor Author

enrikb commented Sep 18, 2020

Thanks for the explanations! So this is about the something | head type of situation probably.

Exactly.

I'm happy to merge this as-is if you're happy :)

I've just verified on Windows 10 that the change doesn't seem to do any harm, either. Even though the programming also worked reliably without the patch! Same hardware where it is unstable on Linux (Ubuntu 20.04, Kernel 5.4.something).

Regarding your original problem, I've never seen or heard of such frequent errors (we only had this kind of error in our deprecated web interface, which would interrupt flashing if users tabbed out). I wonder if it could be a bad connector, or what else could cause this.

I see the failures on two (somewhat different) Dell notebooks on Ubuntu 20.04 reliably. It's a Solo Hacker V1.2 having a USB-C plug + C-to-A adapter. I assume the adapter is passive.

From my side you can merge if you feel comfortable to do so. Just don't blame me if bunches of Solo Hackers will be bricked in the next weeks ;-)

enrikb and others added 2 commits January 20, 2021 19:12
The ctrl_transfer in dfu.get_status() might return usb.core.USBError
with errno == EPIPE, seen when called in quick succession. This is the
case when called from dfu.block_on_state(). The next call usually works.

This change tries to catch this specific error and just re-tries up to
two times in case it happens.

Without this change, the error (and programming abort) might get
undetected, as 'click' catches IOerror/EPIPE silently. This has
significant potential to brick a device if you don't check the output
and return code of 'solo' closely enough.
@nickray nickray force-pushed the bugfix/click_epipe_vs_usberror_epipe branch from 7350e45 to dff5668 Compare January 20, 2021 18:13
@nickray nickray merged commit 81dfc81 into solokeys:main Jan 20, 2021
@enrikb enrikb deleted the bugfix/click_epipe_vs_usberror_epipe branch January 30, 2021 06:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants