-
Notifications
You must be signed in to change notification settings - Fork 218
hashlib.sha1 giving incorrect results #194
Comments
Verified that it's still a problem on "MicroPython v1.9.2-275-gd0e11f76 on 2017-10-04; ESP32 module with ESP32" |
Doh! I can't even go back and download the last known stable version at: http://micropython.org/resources/firmware/esp32-20170911-v1.9.2-270-g14fb53e0.bin They old versions are all destroyed |
Thanks for letting us know. I'll go back and build some older versions and see where they diverged. |
f0561ab changes CONFIG_MBEDTLS_HARDWARE_SHA in the process of updating to a newer esp-idf. |
Two questions: What's it using for SHA if that is unset, and why is it wrong? |
That's a bit worrying that the software SHA is broken (or seems to be)... as I said in that commit, when updating the IDF I used the default mbedtls settings. If you look at @nickzoic can you check if adding |
You need to call This isn't a problem for the hardware accelerated version, as we ignore the software initial context state. (I have been bit by this API limitation before which is why I went and looked for it in moduhashlib.c when I saw this bug.) Hardware SHA is faster in some circumstances (short lived hash contexts with minimal concurrency) and slower in others (long lived hash contexts with concurrency). This is because there's only one SHA engine per hash type, so we "read out" the hardware hash context to software if another context needs it. The situation with multiple long lived hash contexts tends to happen in TLS sessions, which is why we've turned it off by default. |
Hmmm, thanks for the info @projectgus, it sounds like a quick fix anyway. PS: If the hardware engine switches out to the software engine when concurrency occurs, how come it is slower than just using the software engine? Does the copy out of hash context take a long time? |
Thanks @projectgus that's very useful to know, not only for the esp32. @nickzoic please feel free to push a fix for this. |
Actually, sorry, what I said before is wrong and based on a previous revision of the SHA concurrency model. Now a hardware SHA context just stays in hardware for as long as it's needed there and any other contexts run in software. What I'm basing the summary I gave on is running some fairly high level measurements (with CPU 240MHz) where TLS sessions were slower with hardware SHA but simple benchmarks were faster. I believe most of the overhead is copying data in/out of the accelerator registers which are in DPORT memory. Reading DPORT is currently particularly limiting in dual core mode because of a hardware bug (we have to stall the other CPU). It may be simply that having multiple hardware accelerators enabled and in use at once (ie a TLS handshake) is too limiting because of this, it's better to let both cores continue working smoothly. In single core mode or at lower CPU speeds than 240MHz then the hardware accelerators are probably faster in all cases. I'd like to look into this more at some point in the future... :) |
OK, so microseconds after I pushed a fix for this I accidentally did this and discovered something a little weird:
whereas in Python 3.5.3, the behaviour seems a bit less weird:
(same thing in sha256) |
Running digest() more than once is also not supported by esp8266 or unix uPy (see the tests/extmod/uhashlib_sha256.py test). So if it's not an easy fix then the behaviour can be left as-is for now. |
(aha, it's doing the padding step as part of the _finish call, so that's getting done twice and scrambling the results. The only way around it would be to keep a copy of that vstr output buffer around and then you'd only have to call _finish once ...) For now I just pushed it in with the smallest possible change: I hadn't realized that was known behaviour so I guess we can close this after all. |
Can it always throw an exception rather than give a wrong answer? It's an amazingly difficult thing to debug when sha() gives a wrong answer because you only consider it after eliminating all other possibilities. I hit this one because I was writing a python implementation of websockets which didn't work and was reduced to portsniffing successful connections to find out what was going wrong between the http headers. |
Yeah, I'd rather it worked like the CPython version and just return the same answer every time, but throwing an exception would be a lesser evil than returning wrong results. Thanks for bringing it to our attention, we've pushed a fix and its now available for download as esp32-20171005-v1.9.2-276-ga9517c04.bin |
I've checked and it's giving the right answer on the ESP32 (and even makes my websocket work). The second calling to digest() does give a different answer, and the update() doesn't work as expected (after digest() is called). Same goes for the ESP8266, but with a different spurious answer on the second calling of digest(). https://docs.python.org/3/library/hashlib.html says of digest() "Return the digest of the data passed to the update() method so far..." which suggests multiple calls are possible Definitely, if it is not otherwise possible, digest() should put the hash object in a state where any subsequent function call causes an exception. I don't know of the exact policy, but after this bug should not a sanity check on the answer be included in the tests here? |
Yeah, I'd like to push nicer behaviour upstream to main micropython. |
I don't know of the exact policy, but after this bug should not a
sanity check on the> answer be included in the tests here?
Yes. I'll put in a PR for that too ASAP.
|
…s often as you want. This same techique would work with extmod/moduhashlib.c easily enough
But on normal Python it's:
Oddly, the value on version "MicroPython v1.9.2-270-g14fb53e0 on 2017-09-11; ESP32 module with ESP32" as reported in issue #178 is correct, so there has been a regression.
The text was updated successfully, but these errors were encountered: