-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bluetooth: conn: move auto-init procedures to system workqueue #77703
Bluetooth: conn: move auto-init procedures to system workqueue #77703
Conversation
2c2a47f
to
8bbe833
Compare
8bbe833
to
958b700
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments, but nothing that actually needs to be fixed especially since this is from moved code.
Will make another PR right after this one to address, scout's honor. |
subsys/bluetooth/host/conn.c
Outdated
} | ||
|
||
exit: | ||
bt_conn_unref(conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can potentially destroy the currently running k_work object. Is that ok?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good point. I have no idea if it's going to implode or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks in work.c like the work item is touched after the handler completes. Better safe than sorry. I suggest we make the work item global, we iterate over all connected conns in the handler, and have a flag that marks the work done on each conn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking at what the workqueue implementation does after calling the callback this may actually be a problem (it's still accessing parts of the work struct, like work->flags
after it).
Lines 688 to 703 in 40414f7
handler(work); | |
/* Mark the work item as no longer running and deal | |
* with any cancellation and flushing issued while it | |
* was running. Clear the BUSY flag and optionally | |
* yield to prevent starving other threads. | |
*/ | |
key = k_spin_lock(&lock); | |
flag_clear(&work->flags, K_WORK_RUNNING_BIT); | |
if (flag_test(&work->flags, K_WORK_FLUSHING_BIT)) { | |
finalize_flush_locked(work); | |
} | |
if (flag_test(&work->flags, K_WORK_CANCELING_BIT)) { | |
finalize_cancel_locked(work); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this is already an issue with deferred_work
:
zephyr/subsys/bluetooth/host/conn.c
Lines 2069 to 2072 in 40414f7
/* Release the reference we took for the very first | |
* state transition. | |
*/ | |
bt_conn_unref(conn); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One solution could be to do "slow" freeing of connections, i.e. when the refcount drops to zero there's a separate (independent of any connection object)
k_work
that's responsible for really freeing the object.
Wouldn't that hinder advertisers to restart connectable advertising even further?
Today we suggest to use a k_work in the disconnected callback, but if the stack then also does a k_work before actually freeing the connection, then the application's k_work needs to be schedules after the stack's k_work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to address the rootcause, which is the work item belonging to the struct bt_conn
. I really don't like the deferred_work
anyways. So refactoring deferred_work
using @alwa-nordic 's proposition seems appropriate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably need to address the rootcause, which is the work item belonging to the
struct bt_conn
. I really don't like thedeferred_work
anyways. So refactoringdeferred_work
using @alwa-nordic 's proposition seems appropriate.
Not really opposed to that either. It does sound a bit like we are starting to implement a tiny garbage collector if we have a work item that occasionally goes through all connection objects to finalize the free'ing :D But you are correct that free'ing an object that contains the work items that triggers the free is an issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to name it ZNGC Zis is Not a Garbage Collector 🚮🙊
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ZNGC Zis is Not a Garbage Collector
Better make it "ZNGC: ZNGC is Not a Garbage Collector" for additional recursiveness
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like we need to fix further things in this PR
a96f483
to
a95c8bb
Compare
a95c8bb
to
7013f8f
Compare
subsys/bluetooth/host/conn.c
Outdated
if (conn->state != BT_CONN_CONNECTED) { | ||
goto exit; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the if (conn->state != BT_CONN_CONNECTED) {
on line 1697, do we need this check for each procedure? Or can e.g. bt_hci_read_remote_version
actually disconnect the connection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idk. I'm not taking any chances in case of pre-emption by e.g. the controller thread sending a disconnected event, which is high-prio.
err = bt_hci_le_read_max_data_len(&tx_octets, &tx_time); | ||
if (!err) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No log message if bt_hci_le_read_max_data_len
fails?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just copy-pasting stuff around
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to have this PR be a single cherry-pickable commit. And have another one for logs and cosmetic changes, IS_ENABLED()
etc..
subsys/bluetooth/host/conn.c
Outdated
{ | ||
ARG_UNUSED(unused); | ||
|
||
bt_conn_foreach(BT_CONN_TYPE_ALL, perform_auto_initiated_procedures, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bt_conn_foreach(BT_CONN_TYPE_ALL, perform_auto_initiated_procedures, NULL); | |
bt_conn_foreach(BT_CONN_TYPE_LE, perform_auto_initiated_procedures, NULL); |
I guess we don't want to do these for ISO or classic connections
subsys/bluetooth/host/conn.c
Outdated
|
||
LOG_DBG("[%p] Running auto-initiated procedures", conn); | ||
|
||
if (atomic_test_and_set_bit(conn->flags, BT_CONN_AUTO_INIT_PROCEDURES_DONE)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be atomic_test_and_clear_bit()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind, I think it might be correct. To me it'd just be more intuitive to have a flag set together with the reference, and cleared when you do unref()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
set
should be correct
@@ -62,6 +62,7 @@ enum { | |||
BT_CONN_BR_NOBOND, /* SSP no bond pairing tracker */ | |||
BT_CONN_BR_PAIRING_INITIATOR, /* local host starts authentication */ | |||
BT_CONN_CLEANUP, /* Disconnected, pending cleanup */ | |||
BT_CONN_AUTO_INIT_PROCEDURES_DONE, /* Auto-initiated procedures have been done */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be more intuitive if you reversed this, i.e. call it AUTO_INIT_PROCEDURES_PENDING
. That way the reference is tied to something more tangible (the flag being set).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did this to distinguish between a freshly-memset connection and a connection that has had this procedure performed. I can swap if you really want to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think verifying the correctness of the reference counting becomes clearer if it was reversed. Normally reference counts are tied to actual pointer variables, but that's not the case here. If you can see a setting of the flag when you do ref()
and a test_and_clear()
when you do unref()
it's IMO more obvious to what the reference count is tied, i.e. something like:
bt_conn_ref(conn);
set_bit();
and:
if (test_and_clear_bit()) {
bt_conn_unref();
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aiight, i'll change it.
7013f8f
to
e0fa3af
Compare
subsys/bluetooth/host/conn.c
Outdated
LOG_DBG("[%p] Successfully ran auto-initiated procedures", conn); | ||
|
||
exit: | ||
CHECKIF(!atomic_test_and_clear_bit(conn->flags, BT_CONN_AUTO_INIT_PROCEDURES_PENDING)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the same rule applies for CHECKIF as for ASSERT, i.e. you should never put functionally significant actions inside it. E.g. if CONFIG_NO_RUNTIME_CHECKS
is set then the test_and_clear_bit()
would never get called, which is probably not what you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. forgot about that option.
e0fa3af
to
adf1895
Compare
subsys/bluetooth/host/conn.c
Outdated
* connection flag. The reference will be given back the moment that | ||
* flag is set. | ||
*/ | ||
atomic_set_bit(bt_conn_ref(conn)->flags, BT_CONN_AUTO_INIT_PROCEDURES_PENDING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking the ref here is unnecessary. bt_conn_foreach
iterates over live connection objects and lends a reference to the loop body function.
In general, if a function gets a bt_conn in a parameter, the caller shall guarantee it lives until the function returns.
subsys/bluetooth/host/conn.c
Outdated
* connection flag. The reference will be given back the moment that | ||
* flag is set. | ||
*/ | ||
atomic_set_bit(bt_conn_ref(conn)->flags, BT_CONN_AUTO_INIT_PROCEDURES_PENDING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This flag is arguably redundant. I think it's functionally equivalent to conn.state == CONNECTED && !procedures_done
.
adf1895
to
85b8ac3
Compare
static void schedule_auto_initiated_procedures(struct bt_conn *conn) | ||
{ | ||
LOG_DBG("[%p] Scheduling auto-init procedures", conn); | ||
k_work_submit(&procedures_on_connect); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
static void schedule_auto_initiated_procedures(struct bt_conn *conn) | |
{ | |
LOG_DBG("[%p] Scheduling auto-init procedures", conn); | |
k_work_submit(&procedures_on_connect); | |
} | |
static void schedule_auto_initiated_procedures(void) | |
{ | |
LOG_DBG("Scheduling auto-init procedures"); | |
k_work_submit(&procedures_on_connect); | |
} |
Since the k_work
is no longer in the conn object, suggest to remove it from the function. In the case of LOG_DBG not being enabled, it was a unused argument anyhow
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should still have it. As it is the start of an async operation and the log in perform_auto_initiated_procedures
only notify us of the execution of that async operation.
If you really don't want it, feel free to NAK and I'll remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a huge deal and it may be useful :)
`conn_auto_initiate()` starts a bunch of controller procedures (read: HCI commands) that are fired off right after connection establishment. Right now, it's called from the RX context, which is the same context where resources (cmd & acl buffers) are freed. This not ideal. But the procedures are all async, so it should be fine to schedule this function on the system workqueue, where we have less risk of deadlocks. Signed-off-by: Jonathan Rico <[email protected]>
85b8ac3
to
b975ee8
Compare
rebased to fix conflict in |
conn_auto_initiate()
starts a bunch of controller procedures (read: HCIcommands) that are fired off right after connection establishment.
Right now, it's called from the RX context, which is the same context where
resources (cmd & acl buffers) are freed. This not ideal.
But the procedures are all async, so it should be fine to schedule this
function on the system workqueue, where we have less risk of deadlocks.