USB ZLP exponential backoff kills performance of large responses #83

fabled · 2024-09-10T11:37:39Z

It seems that the USB Zero Length Packet (ZLP) is not well understood as its logged as error and uses exponential back-off delay. Though, the fact that these packets are seen often means that the Printer/Scanner device developers have not really understood it well either.

The relevant code is at:
https://github.com/OpenPrinting/ipp-usb/blob/master/usbtransport.go#L797-L828

On the wire level USB ZLP is sent when the USB Bulk Transaction is sent when:

the device wants to indicate that there is no data to send (and it does not want to wait for it either; though arguably it should wait up to the USB timeout specified in the USB)
some data has been sent, and the amount of bytes is sent is divisible by the USB block length. In this case, a ZLP packet is sent to terminate the transaction (otherwise, the last USB packet which is not full will implicitly terminate the transaction)

The latter case is handled internally in libusb. So the only case when a libusb user sees ZLP is the first one: the device terminated the transaction without sending data.

Seems that at least HP keeps sending ZLP:s when the data is not yet available. However, the exponential algorithm will kill performance on large responses. This specifically causes scanning to not work at full speed with HP Color LaserJet Entrprise Flow MFP X677 printers (see also alexpevzner/sane-airscan#347).

Even fixed 100ms delay was not enough. So I currently just changed the ZLP to do a 10ms fixed delay to get full scanning speed.

Recommended changes:

downgrade the ZLP from Error to Debug
change the exponential backoff to either fixed 10ms (or configurable) delay, or to some non-exponential algorithm like linear or logarithmic increase
keep the maximum delay as something smaller than 1second. perhaps in the maginuted of 100-200ms; issuing an usb bulk transfer request is not cpu intensive

This likely also improves things in connection with #32.

Technically, the correct interpretation of seeing a ZLP should be interpreted as "socket closed" and terminate waiting the HTTP response. This interpretation would allow fix the problem in #64 of GET /ipp/faxout response not containing the end-of-body newline. However, due to the said bug of manufacturer sending ZLPs this does not work immediately so.

May I recommend that if a HTTP Error code response is received, and multiple ZLPs are seen when waiting for the body, then these ZLPs will be considered as end-of-body and will abort the response waiting.

The text was updated successfully, but these errors were encountered:

Experience shows that some devices tend to send zero-length USB packets even when they shouldn't, and this is a normal occurrence with these devices. The log level for these events has been reduced from Error to Debug in order to minimize clutter in the log files.

alexpevzner · 2024-09-11T07:30:50Z

Hi Timo,

glad to hear from you after some years of silence! :-)

downgrade the ZLP from Error to Debug

Done.

change the exponential backoff to either fixed 10ms

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

then these ZLPs will be considered as end-of-body and will abort the response waiting.

I think your hypothesis that device sends ZLP when device currently has no data to send and doesn't want to wait for more data is the very good hypothesis.

However, it seems to happen in two cases:

all scan data is received and there is no more data expected -- i.e., the end of transaction
scan unit is just too slow to provide more data in time, though transaction is not ended yet

Some devices send ZLP in the middle of transaction, then continue to send more data.

So I think, interpreting ZLP as the end of transaction indication is not safe. We will loose synchronization with devices that insert ZLP in the middle.

However, we can implement it as a quirk for those few devices that really need it.

What do you think?

P.S. The current implementation sometimes a way too fragile. Once synchronization is lost, it cannot be recovered, and device fails forever (until reboot/reconnect).

I'm thinking about attempt to implement some synchronization recovery logic, which will include the following:

at the initialization, drain USB connection for some time (500-1000ms): just read from USB and drop received data, if any
drain idle USB connections all the time until they are reused
something more?

Unfortunately, I have no idea how to reliably distinguish between failed HTTP transaction and HTTP transaction that just takes too long time to proceed. For example, reading image from scanner may include minutes of wait (low-end scanners acting at high resolutions). If I could to detect failed transaction, I could implement some reset-and-recover logic...

Some devices sometimes send 0-length USB packets, and then can do it continuously. So, to avoid excessive CPU consumption, ipp-usb inserts some pauses when zero-length USB packet is received, and use exponential backoff algorithm to manage wait interval. Parameters of this algorithm was tuned to avoid performance limitation when working with the high-end devices that may send A LOT OF data in response at the very high speed. See discussion under #83 for further details.

fabled · 2024-09-11T14:42:27Z

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

Thanks. I'll test this on Friday.

However, it seems to happen in two cases:

* all scan data is received and there is no more data expected -- i.e., the end of transaction
* scan unit is just too slow to provide more data in time, though transaction is not ended yet

Yes, this is unfortunate.

So I think, interpreting ZLP as the end of transaction indication is not safe. We will loose synchronization with devices that insert ZLP in the middle.

Correct. Just getting one ZLP alone is not enough. Sadly, because of different vendors having different interpretations of this low-level bulk request detail (possibly due to not understanding what ZLPs are for).

Unfortunately, I have no idea how to reliably distinguish between failed HTTP transaction and HTTP transaction that just takes too long time to proceed. For example, reading image from scanner may include minutes of wait (low-end scanners acting at high resolutions). If I could to detect failed transaction, I could implement some reset-and-recover logic...

Though, at least the Mopria Alliance eSCL specification implies that the server should return 503 if it cannot attend to the request in timely manner. And at least that is what the HP printers do. If the HTTP level request exceeds a timeout, it'll return 503 and the HTTP request needs to be redone. I hope there are no devices that wait long periods there.

I think we can do two things:

if HTTP status != 200 to indicate error has been seen, we can assume the body is non-existent or small. In this case just a timeout of few seconds to wait for end-of-body should be enough
if HTTP status == 200 has been seen, we could just set a bit longer timeout (e.g. minute or more); in these cases if something fails the device usually reboots or is disconnected

After the above timeouts we can assume everything is drained and release the channel for future use. Might be a good idea to keep draining idle channels.

fabled · 2024-09-14T06:22:38Z

I think that exponential backoff with initial time of 10ms and which adds 25% of time at every roll will give nearly the same performance as linear algorithm, while putting significantly less load to the CPU. What do you think? Let's try?

This seems to work just fine. Thank you!

The actual main issue seems to have been the initial delay of 100ms, and changing it to 10ms was the primary fix. The data stream will occationally send ZLP, but will be almost immediately able to give additional data. Perhaps some internal device queue was drained causing the ZLP, but the data is ready so its just matter of some process getting scheduled and more data is available. I observed only few occasions when there was two or perhaps three consecutive ZLPs also - in these cases the updated backoff also helps a lot.

I think the performance issue can be considered fixed now. The other issue of resynchronization is still left, but there are also other issues about it. This issue can be closed unless you want to keep it open for the sync issue as well?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

USB ZLP exponential backoff kills performance of large responses #83

USB ZLP exponential backoff kills performance of large responses #83

fabled commented Sep 10, 2024

alexpevzner commented Sep 11, 2024

fabled commented Sep 11, 2024

fabled commented Sep 14, 2024

USB ZLP exponential backoff kills performance of large responses #83

USB ZLP exponential backoff kills performance of large responses #83

Comments

fabled commented Sep 10, 2024

alexpevzner commented Sep 11, 2024

fabled commented Sep 11, 2024

fabled commented Sep 14, 2024