buffer release when BT_CONN_DISCONNECT_COMPLETE #64114

watsug · 2023-10-19T09:04:30Z

Connection cleanup is performed not only when BT_CONN_DISCONNECTED, but also when BT_CONN_DISCONNECTING and BT_CONN_DISCONNECT_COMPLETE. In addition send_buf() function can return -ENOTCONN error code - the handling was added. The fix has been tested on our side and it fixes the issue.

Fixes #63217

watsug · 2023-10-19T09:08:01Z

I have created a PR, but frankly - most of the changes come from clang-format. Was the file formatted properly in the main branch?

hermabe · 2023-10-19T09:16:29Z

I have created a PR, but frankly - most of the changes come from clang-format. Was the file formatted properly in the main branch?

You should not format the whole file, only your changes and in general keep with the style of the file. clang-format supports formatting the diff only. The way it is now makes it very hard to review.

watsug · 2023-10-19T09:20:24Z

OK, I will roll back and format the diff only. But could you explain to me why the file is not formatted as a whole according to the clang-format definition? What sense is in diff formatting only, when the rest of the file is a 'jungle'?

hermabe · 2023-10-19T09:26:32Z

OK, I will roll back and format the diff only. But could you explain to me why the file is not formatted as a whole according to the clang-format definition? What sense is in diff formatting only, when the rest of the file is a 'jungle'?

The clang-format file is not enforced in CI, checkpatch is used instead. Different subsystems may have slightly different styles. Different versions of clang-format support different options. The column width is recommended 80 chars but max 100, which tools dont handle at all. Formatting the whole file will create a large diff making backporting of changes more difficult.

This reverts commit cf3e379.

watsug · 2023-10-19T09:31:18Z

OK, I have reverted the file and added only the fix with proper formatting.

Thalley · 2023-10-19T09:40:22Z

The column width is recommended 80 chars but max 100

The recommendation for 80 characters is no longer in effect in the Linux kernel (where we got this from), so the current "recommendation" is, AFAIK, actually 100.

Thalley · 2023-10-19T09:42:42Z

subsys/bluetooth/host/conn.c

+	bool disconnecting = conn->state == BT_CONN_DISCONNECTED ||
+			     conn->state == BT_CONN_DISCONNECTING ||
+			     conn->state == BT_CONN_DISCONNECT_COMPLETE;


I'm wondering if it would make more sense to check for conn->state != BT_CONN_CONNECTED.

Arguably we only want to process TX when in the connected state, right?

This was also my initial idea, but frankly, the whole state machine in conn.c looks a little overcomplicated and didn't have time to analyze it deeply. On the other hand, you are correct, inside the send_buf() function there is a check to validate the state is CONNECTED.

Is there any scenario where we are not in the BT_CONN_CONNECTED state and do not want to flush our TX queue?

I will take a look, but probably this weekend, and I can redo tests next week to see if '!= CONNECTED' is OK.

Hi @Thalley

I did some additional analysis and testing and this is what I found:

BT_CONN_CLEANUP flag is set by bt_conn_set_state() only when transition from BT_CONN_DISCONNECT_COMPLETE -> BT_CONN_DISCONNECTED. See this:

/* Notify disconnection and queue a dummy buffer to wake * up and stop the tx thread for states where it was * running. */ switch (old_state) { case BT_CONN_DISCONNECT_COMPLETE: tx_notify(conn); /* Cancel Connection Update if it is pending */ if ((conn->type == BT_CONN_TYPE_LE) && (k_work_delayable_busy_get(&conn->deferred_work) & (K_WORK_QUEUED | K_WORK_DELAYED))) { k_work_cancel_delayable(&conn->deferred_work); } atomic_set_bit(conn->flags, BT_CONN_CLEANUP); k_poll_signal_raise(&conn_change, 0); /* The last ref will be dropped during cleanup */ break;

so in bt_conn_process_tx(), the call to conn_cleanup() will be executed only when the state is BT_CONN_DISCONNECTED and BT_CONN_CLEANUP set. This means my change here is useless.

What fixes memory buffer release is the check for errors from send_buf(). If the connection state is different than BT_CONN_CONNECTED (tested in do_send_frag()) this function returns '-ENOTCONN'. And this is what should be handled.

What is still not clear to me is how other error codes should be treated. Like '-ENOBUFS' or '-ENOMEM'.

-ENOBUFS is only in the case of not enough "credits" for sending data to the controller. In that case, we still haven't popped buf from the tx queue, so no leaks.

-ENOMEM is in the case where we don't have a buffer for allocating a fragment for sending parts of buf. In that case we also haven't popped buf from TX queue, so no leaks again

Any other error codes from the controller are converted to -EIO to avoid a scenario when the above cases are not true.

Thalley · 2023-11-15T09:16:37Z

subsys/bluetooth/host/conn.c

@@ -926,7 +926,7 @@ void bt_conn_process_tx(struct bt_conn *conn)
 	err = send_buf(conn, buf);
 	net_buf_unref(buf);

-	if (err  == -EIO) {
+	if (err == -EIO || err == -ENOTCONN) {


Isn't this already being done in
https://github.com/zephyrproject-rtos/zephyr/blob/main/subsys/bluetooth/host/conn.c#L643-L656

Yeah that's the intention at least. But this PR proves that the flow is obfuscated enough that no one here is really sure of what's happening anymore.
I had a branch that massively simplifies all of this, but couldn't merge it as it exposed other issues. I'll see if we can split out some parts of it.

@watsug I took the liberty of editing your PR message to link the issue you're fixing, in order to give more context to the reviewers .

@jori-nordic What should we do with this PR?

It may sound like it's fixing an issue, but it also sounds like that the fix here already exist and has just been moved, so to me it doesn't seem to actually change anything.

I'll have another look at the explanation (next week) and report back.

Sometimes, fixes like these are impossible to test for in bsim upstream, as the ZLL is not compatible with BT_RECV_WORKQ_SYS and BT_RECV_WORKQ_BT. These two options can lead to a very different scheduling pattern, highlighting issues like this one.

Instead of singling out "problematic" error codes, we could only check for the cases where we want to keep the buffer in the queue, E.g. -EAGAIN.
TODO for myself: document what error codes should trigger a destroy.

jori-nordic

Could you instead move the connection test from here to here?

edit: That way, buf is never popped from the TX queue, and conn_cleanup will end up freeing it and its metadata later.

edit2: and also cleanup the commit history

r2r0 · 2024-01-16T15:02:48Z

@watsug Any chance to continue work on this PR?

jori-nordic · 2024-02-08T13:47:19Z

@r2r0 @watsug

I implemented what I suggested in #68748

Would one of you have the time to double-check that PR fixes the bug?

github-actions · 2024-04-09T00:30:04Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

buffer release when BT_CONN_DISCONNECT_COMPLETE

cf3e379

watsug requested review from alwa-nordic, jhedberg, Vudentz and sjanc as code owners October 19, 2023 09:04

zephyrbot added area: Bluetooth Host area: Bluetooth labels Oct 19, 2023

zephyrbot requested review from asbjornsabo, hermabe, jori-nordic, kruithofa, Thalley and theob-pro October 19, 2023 09:05

zephyrbot assigned jhedberg Oct 19, 2023

Adam Augustyn added 2 commits October 19, 2023 11:27

Revert "buffer release when BT_CONN_DISCONNECT_COMPLETE"

11463d7

This reverts commit cf3e379.

diff formatted only

06a8c05

Thalley reviewed Oct 19, 2023

View reviewed changes

extra states check removed

48f66e8

Thalley reviewed Nov 15, 2023

View reviewed changes

jori-nordic requested changes Nov 24, 2023

View reviewed changes

kruithofa removed their request for review January 11, 2024 10:14

github-actions bot added the Stale label Apr 9, 2024

github-actions bot closed this Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

buffer release when BT_CONN_DISCONNECT_COMPLETE #64114

buffer release when BT_CONN_DISCONNECT_COMPLETE #64114

watsug commented Oct 19, 2023 •

edited by jori-nordic

Loading

watsug commented Oct 19, 2023

hermabe commented Oct 19, 2023

watsug commented Oct 19, 2023

hermabe commented Oct 19, 2023

watsug commented Oct 19, 2023

Thalley commented Oct 19, 2023

Thalley Oct 19, 2023

watsug Oct 19, 2023

Thalley Oct 19, 2023

watsug Oct 19, 2023

watsug Nov 13, 2023

jori-nordic Nov 24, 2023 •

edited

Loading

Thalley Nov 15, 2023

jori-nordic Nov 16, 2023

Thalley Nov 24, 2023

jori-nordic Nov 24, 2023 •

edited

Loading

jori-nordic Feb 8, 2024

jori-nordic left a comment •

edited

Loading

r2r0 commented Jan 16, 2024

jori-nordic commented Feb 8, 2024

github-actions bot commented Apr 9, 2024

buffer release when BT_CONN_DISCONNECT_COMPLETE #64114

buffer release when BT_CONN_DISCONNECT_COMPLETE #64114

Conversation

watsug commented Oct 19, 2023 • edited by jori-nordic Loading

watsug commented Oct 19, 2023

hermabe commented Oct 19, 2023

watsug commented Oct 19, 2023

hermabe commented Oct 19, 2023

watsug commented Oct 19, 2023

Thalley commented Oct 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jori-nordic Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jori-nordic Nov 24, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jori-nordic left a comment • edited Loading

Choose a reason for hiding this comment

r2r0 commented Jan 16, 2024

jori-nordic commented Feb 8, 2024

github-actions bot commented Apr 9, 2024

watsug commented Oct 19, 2023 •

edited by jori-nordic

Loading

jori-nordic Nov 24, 2023 •

edited

Loading

jori-nordic Nov 24, 2023 •

edited

Loading

jori-nordic left a comment •

edited

Loading