-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
usbd: device_next: cdc: wait IN EP ready before enqueue data to EP #79104
base: main
Are you sure you want to change the base?
usbd: device_next: cdc: wait IN EP ready before enqueue data to EP #79104
Conversation
1fc6702
to
6ad3ad4
Compare
@@ -540,6 +544,11 @@ static void cdc_acm_tx_fifo_handler(struct k_work *work) | |||
return; | |||
} | |||
|
|||
if (atomic_test_and_set_bit(&data->state, CDC_ACM_TX_EP_BUSY)) { | |||
cdc_acm_work_submit(&data->tx_fifo_work); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? Will it simply keep retriggering the TX fifo work until the request is completed? This sounds way too much CPU intensive to me that could starve system workqueue and other lower priority threads.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tmon-nordic, you are right. Indeed keep requeuing tx_fifo_work would cause the issue you mentioned, I have totally missed that. I have changed the handling to raised the tx_fifo altered flag instead. Please help and see whether it make sense for you.
18778a2
to
27486c5
Compare
The commit message should not be a concatenation of squashed commit messages like it is now. |
27486c5
to
bea015b
Compare
Hi @tmon-nordic, thanks again. Sorry that I am unaware for the rules and workflow. Have removed the "review history" from the commit message. May I check with you whether there is documentation regarding correct workflow for addressing review? From what I understand from what is happening so far is that, for addressing review, we wanted the amendment to be squashed with the original commit, without needing record of the review history. Thanks again @tmon-nordic. |
It can be derived from https://docs.zephyrproject.org/latest/contribute/contributor_expectations.html but only indirectly. The PR in Zephyr is essentially equivalent to what a patch series in Linux Kernel is.
The review history is not really useful in the "final patch set" (each subsequent force push should be possible to consider as "final patch set", there shouldn't really be "WIP" changes unless you mark the PR as draft). There are so many commits that if all review history was in the main history it would be orders of magnitude harder to navigate it. If someone is interested in review history then the GitHub is the place to look for it (Linux Kernel IMHO has it better with the lore.kernel.org archive, but that's related to tooling and not necessarily the development process). |
@tmon-nordic, undestood. Thanks for the sharing again 😄 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Under the current structure of usbd_cdc_acm, there is a possibility that
TX data is enqueued to UDC driver before the previous transaction is
completed, hence causing the dropped data.
Where is data dropped? Also please take a look at #75439
Hi @jfischer-no,
from my current understanding, because that
From my understanding of the code change in #75439, it seems that the By the way, it seems that the case where "IN endpoint is busy" is not included in #75439 yet. Please correct me if I understand that wrongly. |
I think blindly retriggering with some arbitrary delay is bad design. It should really be a flag that executes once the underlying condition has changed (which is the case here with The two issues seem independent to me. But it would be best @zeonchew could try to trigger the issue this PR is aiming to fix with #75439 changes applied (and without this PR). |
@tmon-nordic, sure. Let me have a try on it soonest possible and get back here to update the status. 😉 |
bea015b
to
87f93a9
Compare
Under the current structure of usbd_cdc_acm, there is a possibility that TX data is enqueued to UDC driver before the previous transaction is completed, hence causing the dropped data. This commit adds a mechanism for usbd_cdc_acm driver to wait for IN endpoint to complete existing transaction before subsequent transaction is enqueued. Signed-off-by: Chew Zeh Yang <[email protected]>
87f93a9
to
a7aac8d
Compare
@@ -114,6 +115,8 @@ struct cdc_acm_uart_data { | |||
struct k_work rx_fifo_work; | |||
atomic_t state; | |||
struct k_sem notif_sem; | |||
/* USBD CDC ACM TX transaction deferred due to BULK IN EP busy */ | |||
bool tx_deferred; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why new variable instead of just using data->tx_fifo.altered
like it was earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tmon-nordic,
I was just about to add my report on what I have done here.
Since that #75439 has been merged into zephyr main, I have rebased my works and retested with the changes of #75439 applied. However, I am still able to reproduce this issue.
I have also run more stress test on my fixed ported over to the latest main, and found out that raising data->tx_fifo.altered
wouldn't help in retrigger the TX transaction, as tx_fifo.altered
is cleared in cdc_acm_irq_cb_handler()
.
zephyr/subsys/usb/device_next/class/usbd_cdc_acm.c
Lines 836 to 837 in a7aac8d
data->tx_fifo.altered = false; | |
data->rx_fifo.altered = false; |
The previous testing wasn't rigorous enough, that the subsequent transaction was actually triggered by new data coming in from RX instead (and hence the stale data was flushed out together).
This being said, I have added another handling here to make sure that remaining data in tx_fifo
, which the BULK IN endpoint is unable to send out in current transaction, gets sent out immediately after current TX is completed as well.
zephyr/subsys/usb/device_next/class/usbd_cdc_acm.c
Lines 571 to 580 in a7aac8d
/* | |
* If ring buffer contains more data than what the endpoint could | |
* send in one go, raise deferred flag for remaining data to be | |
* sent out once current transaction is completed. | |
*/ | |
if (len == buf->size) { | |
if (!ring_buf_is_empty(data->tx_fifo.rb)) { | |
data->tx_deferred = true; | |
} | |
} |
Please help to review whether this make sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand cdc_acm_irq_cb_handler()
. Isn't the data->tx_fifo.altered = false;
going to make the if (data->tx_fifo.altered)
effectively a dead code? Shouldn't this be fixed instead of adding new boolean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand
cdc_acm_irq_cb_handler()
. Isn't thedata->tx_fifo.altered = false;
going to make theif (data->tx_fifo.altered)
effectively a dead code? Shouldn't this be fixed instead of adding new boolean?
I took me a while to understand this as well, from my understanding, in this handler, it will first clear all the altered flag, and then callback to the interrupt handler at application side: data->cb(...)
.
zephyr/subsys/usb/device_next/class/usbd_cdc_acm.c
Lines 841 to 844 in a7aac8d
if (atomic_test_bit(&data->state, CDC_ACM_IRQ_RX_ENABLED) || | |
atomic_test_bit(&data->state, CDC_ACM_IRQ_TX_ENABLED)) { | |
data->cb(usbd_class_get_private(c_data), data->cb_data); | |
} |
If there is more data to be transmitted by the application, the tx_fifo.altered
flag will be raised by application before the callback returns. Then here we go, continue to send data out again.
I see that this part is checked in by @jfischer-no. @jfischer-no, can you please help to confirm whether this understanding is correct?
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a problem when there is more data in the tx ringbuffer? The sequence I think about is:
cdc_acm_irq_cb_handler()
setsdata->tx_fifo.altered = false;
data->cb(usbd_class_get_private(c_data), data->cb_data);
is called, application writes to ring buffer more bytes than a CDC ACM TX buffer can fit- TX fifo work is triggered because
data->tx_fifo.altered
was set to true alongside ring buffer write cdc_acm_tx_fifo_handler()
callslen = ring_buf_get(data->tx_fifo.rb, buf->data, buf->size);
and submits the transfer. The ring buffer still has pending TX data.- host reads all the data,
usbd_cdc_acm_request()
submitscdc_acm_work_submit(&data->irq_cb_work);
cdc_acm_irq_cb_handler()
setsdata->tx_fifo.altered = false;
data->cb(usbd_class_get_private(c_data), data->cb_data);
is called, application doesn't have more data to write- TX fifo work is not triggered despite there being data in the ring buffer.
Is there anything preventing the above scenario?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @tmon-nordic,
I added this for your case you mentioned in the latest commit just now
zephyr/subsys/usb/device_next/class/usbd_cdc_acm.c
Lines 571 to 580 in a7aac8d
/* | |
* If ring buffer contains more data than what the endpoint could | |
* send in one go, raise deferred flag for remaining data to be | |
* sent out once current transaction is completed. | |
*/ | |
if (len == buf->size) { | |
if (!ring_buf_is_empty(data->tx_fifo.rb)) { | |
data->tx_deferred = true; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why to have two booleans for roughly the same? Wouldn't it be better to change the data->tx_fifo.altered
logic completely? If we go with separate variable, I think bool tx_deferred;
should be moved inside tx_fifo
structure.
But is it really necessary to have tx_deferred
? I think that the CDC_ACM_TX_EP_BUSY
in combination with ring_buf_is_empty(data->tx_fifo.rb))
should be enough to determine whether the TX work has to be raised or not.
My idea would be to replace if (data->tx_fifo.altered)
with if (!ring_buf_is_empty(data->tx_fifo.rb)) && !atomic_test_and_set_bit(&data->state, CDC_ACM_TX_EP_BUSY))
. Then the CDC_ACM_TX_EP_BUSY
would be cleared in usbd_cdc_acm_request()
(and inside the TX handler if it fails to enqueue the buffer) when the IN request completes. This would essentially eliminate the need for data->tx_fifo.altered
altogether and potentially make the code simpler to follow and more robust. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Open Source development works on technical merit. I would say that there is no need to consult the original author when proposing a change that solves issues or otherwise enhances the code base.
While there may be times where you want to ask what was intended, there is also the possibility that the original author simply is no longer around.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, was going to replace my comment with another one just now but somehow I lost all of them.
(The comment on saying I am new to Open-Source and ZephyrOS, asking on how this community works, and whether it is okay to make such a change without consulting original author is correct way to do things.)
I am kind of agree that in this case, checking for ring_buf_is_empty()
at cdc_acm_irq_cb_handler()
plus checking for CDC_ACM_TX_EP_BUSY
flag at cdc_acm_tx_fifo_handler()
would be optimal as checking the ring buffer status directly gives us the most accurate status for actions to be taken.
It seems that the original intend of altered
flag is to provide a fast access to the status whether application layer has accessed the ring buffer. However, it seems to me that the bit might not be good enough for the scenario in tx_fifo here anymore.
@jfischer-no , what do you think? Is @tmon-nordic suggestion here to check whether to trigger tx_fifo_work
based on whether ring_buf_is_empty()
sounds good to you as well?
Under the current structure of usbd_cdc_acm, there is a possibility that TX data is enqueued to UDC driver before the previous transaction is completed when host is sending data to device faster than it receives data from device, causing dropped data.
Example of such flow is that:
tx_fifo
.tx_fifo
again.tx_fifo.altered
flag,tx_fifo_work
is queued again.cdc_acm_tx_fifo_handler()
tries to enqueue packet to UDC driver, it is rejected because the USB_IN endpoint is still busy, causing the packet to be dropped.This PR adds mechanism to wait for IN endpoint to complete existing transaction before allowing subsequent transaction to be enqueued to UDC drive.