samples: bluetooth: central_iso: fix synchronization of data submission #72622

fgrandel · 2024-05-11T15:36:02Z

This PR synchronizes preparation and scheduling of isochronous data to the Bluetooth clock.

It uses TX synchronization info to set and maintain a well-defined phase between the application timer and the bluetooth clock including ongoing drift compensation (see https://github.com/nrfconnect/sdk-nrf/blob/main/samples/bluetooth/iso_time_sync/src/iso_tx.c and the "LE Read ISO TX Sync" HCI command in the Bluetooth Core Spec 5.4, Vol 4 Part E, 7.8.96).

The PR introduces a synchronization approach that works on both, central and peripheral, and could therefore be ported in later PRs to showcase sending synchronized data from multiple peripherals to one central.

Fixes #72452

fgrandel · 2024-05-12T08:09:24Z

Update: This approach is too brittle. I tested it under different circumstances and it turns out that my initial assumption was wrong that the connected event would be a stable anchor for synchronization. I'll have to implement the timstamp-based approach right away. (Therefore converted back to Draft.)

fgrandel · 2024-05-12T15:48:47Z

Done.

Known issue (cf Discord), IMO to be fixed in a separate PR:

Just wondering why the peripheral's ACK does not trigger the sent-callback immediately. In the example almost a full ISO interval (~10 ms) elapses before the callback is called. This leaves very little time to prepare and send the next SDU. IMHO it would be nicer if one had time to prepare the next SDU triggered by the sent-callback. Is this hard to fix? I saw you already have a TODO for early ACK in the code.

If you don't like the surrounding changes (refactoring, reversal of payload length, etc.) just let me know and I'll remove this unrelated stuff.

rugeGerritsen · 2024-05-12T18:31:49Z

samples/bluetooth/central_iso/src/main.c

+	struct bt_iso_tx_info iso_tx_info = {0};
+	int err;
+
+	err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info);


@fgrandel , thanks for spending time on improving this sample!
I would recommend to not use timestamps when providing data in this sample.
The increase in sequence number gives the controller enough information about when it is to be scheduled on air. See https://developer.nordicsemi.com/nRF_Connect_SDK/doc/latest/nrfxlib/softdevice_controller/doc/isochronous_channels.html#providing-data for more information on how the Nordic SoftDevice Controller processing incoming HCI ISO Data.

When using timestamps it is possible to provide the data to the controller right before it is going to be sent on air.
Unfortunately, this timing information cannot be obtained in a controller-agnostic way:
The timestamp returned from the HCI LE Read ISO TX Sync command is ambiguous, the returned value depends on how the controller implementation interprets the spec. Also, the command documentation refers to CIG/BIG reference anchor points which may not align with the SDU generation if ISO_Interval != SDU interval (up to the controller). These things make it hard to rely on this return value.
See the open spec errata https://bluetooth.atlassian.net/browse/ES-23138 for more details.

Another note: The timing of the iso_sent is also implementation dependent. It is raised when the controller raises the HCI Number of Completed Packets event. This event may be raised when the controller is ready to receive more data. This may be the case when the payload is scheduled for transmission. That is, even before it is sent.

TLDR; I see why the sent-callback + HCI LE Read ISO TX Sync + timestamp approach is lacking and depends on controller implementation details. But IMO it is still the only approach that works at all to reliably synchronize ISO channels for the moment being.

Arguments following:

Thanks for the link. Unfortunately I don't have permission to access it - guess that's only for SIG members.

The SDU vs. ISO interval argument is convincing - however as long as all our options are to some degree depending on the controller implementation, I'd still prefer the approach recommended by the core spec. By the time we'll implement more controllers, I guess the spec will have been updated and the errata removed in a somehow backwards compatible way, so we can move on from there.

In Zephyr we can and should decouple from implementation details by wrapping the controller specific parts behind more usable high-level APIs in the meantime. This would also improve developer experience by protecting them from the hacky and error-prone synchronization details. Plus it guarantees a migration path.

The dependence of iso_sent on the controller implementation is also a good argument. But we still need to provide developers with a working solution until the spec does so. That's AFAICS, why Nordic recommends the HCI LE Read ISO TX Sync + timestamp solution for the moment being and uses the send callback as a trigger. Also see the recommended approach in the link you provided:

This is the preferred way of providing data to the SoftDevice Controller and guarantees the highest degree of control.

Re sent-callback latency: Introducing a delay of one ISO interval w/o technical need is not a good way of interpreting the slack in the spec IMO. We should at least provide best-effort latency as far as technically possible - even more as it seems to be the only workable solution for deterministic synchronization for the time being. So the controller really should deliver some sufficient degree of latency guarantees beyond what is required by the spec for this specific callback/HCI command until the spec provides a workable solution itself.

The seq-number and "time-of-arrival" approaches don't work unless you admit at least two ISO intervals of extra latency to avoid systematic package loss (as demonstrated by the bug I'm fixing here). This I find inacceptable because it counters the concept of ISO channels with unframed PDUs and deterministic latency. Also see the latency promises of the CIG config API in Zephyr which will not be met. You need to add at least two sequence numbers all the time to ensure that:

the initially (but non-deterministically) skipped event gets accounted for.

source-to-CIG anchor point offset doesn't play a role even if you happen to synchronize too closely to the next anchor point so that the controller will pick up your SDUs too late.

In addition the "time-of-arrival" approach is not consistent with the spec AFAICS as flushing cannot be controlled deterministically by the application. It's the point of the ISO protocol that you can synchronize your source with the sink with deterministic latency and time-to-live of data packets.

On the peripheral both approaches don't work at all because neither the time-of-arrival nor the seq number approach are able to account for drift of the peripheral's clock. This cannot work by design if you want well-synchronized data across devices. This also explains why the (local) timer approach does systematically not work.

In our case for example we do decentralized measurements on peripherals in sync with the central's distributed clock across several devices and do explicitly want those measurements be sent out in the same CIG event with deterministic latency so we don't need timestamps and get cheap syntonization of all peripherals to the master clock w/o having to implement our own syntonization approach.

On the peripheral both approaches don't work at all because neither the time-of-arrival nor the seq number approach are able to account for drift of the peripheral's clock. This cannot work by design if you want well-synchronized data across devices. This also explains why the (local) timer approach does systematically not work.

Fun fact: An ISO peripheral from BT Core spec 5.4 is not aware of the SDU interval and can actually never transmit correctly (reliably) 8)

Another comment for the use of bt_iso_chan_get_tx_sync: When we use it like this here, we can use the current response to compare with the previous response. IIRC (and please correct me @rugeGerritsen) the return values of bt_iso_chan_get_tx_sync are only when an SDUs have been sent (or scheduled for transmission). In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.

e.g.

App send SDU_1 with TS_1 and PSN_1

App gets sent callback and gets TS_1 and PSN_1 from bt_iso_chan_get_tx_sync

App sends SDU_2 wit TS_1 and PSN_2 (or TS_2 and PSN_1)

Controller reject this as invalid (which not all controllers may do), but still triggers a Number of Completed Packets event

App gets sent callback and gets TS_1 and PSN_1 from bt_iso_chan_get_tx_sync (since it was not transmitted or scheduled for transmission)

App is now aware that the previous TX went wrong and could attempt to fix by increasing PSN or TS accordingly

In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.

Only when the Controller has received a SDU from Host, the returned Sequence Number and Timestamp correspond to the last transmitted or scheduled SDU are calculated.

If bt_iso_chan_get_tx_sync is called multiple times without a SDU enqueued, it will return same Sequence Number and Timestamp corresponding to the instant the previous bt_iso_chan_send was called.

Every bt_iso_chan_get_tx_sync after bt_iso_chan_send call will return the Sequence Number and Timestamp of the last transmitted or scheduled SDU. Hence, for the consecutive call to bt_iso_chan_send, the next Sequence Number (returned value plus one) and next Timestamp (returned value plus one SDU interval) be provided to bt_iso_chan_send_ts.

If bt_iso_chan_get_tx_sync followed by bt_iso_chan_send_ts is called in the sent-callback, then the returned Sequence Number and Timestamp could be stale depending on the SDU intervals elapsed from the last call of bt_iso_chan_send_ts to when the current sent-callback executes. If sufficient SDU buffers into the future has been already provided to the Controller, then bt_iso_chan_get_tx_sync will continue to return future Sequence Number and Timestamp ensuring new SDU enqueued to the Controller are not stale.

I've tried to implement a drift compensation algorithm now that:

does no longer rely on the sent callback

does not rely on the exact point in time that controllers store the TX sync information (even if the sync timestamp lies in the future)

should continue to work if SDU interval != ISO interval

uses an algorithm that is able to calculate the SDU interval even on a peripheral

works with BT > 1 and FT > 1

tolerates a certain amount of package loss

compensates for initial non-standard delay of the second SDU (split controller ISOAL bug?)

uses seq_num only when sending the sdu (no more timestamp dependency)

For obvious reasons I can not compensate for TX timestamp specification ambiguities but as I have to rely on hardware-specific timing anyway as long as the bluetooth clock is not exposed by the API in some hardware-independent way, this is not a problem in practice as the split controller defines the timestamp in the same way as the Nordic SD controller AFAICS. Let's hope that by the time more hardware is being supported by the sample, this ambiguity will have been fixed.

An ISO peripheral from BT Core spec 5.4 is not aware of the SDU interval

Hrm, I think I found a way to derive the SDU interval from channel info that is available both, on the central and the peripheral:

Transport_Latency = CIG_Sync_Delay + FT x ISO_Interval - SDU_Interval, see Bluetooth Core, Version 5.4, Vol 6, Part G, Section 3.2.2.

Therefore:
SDU_Interval = CIG_Sync_Delay + FT x ISO_Interval - Transport_Latency

These parameters are available from the HCI LE CIS Established event.

cvinayak · 2024-05-13T04:25:27Z

Done.

Known issue (cf Discord), IMO to be fixed in a separate PR:

Just wondering why the peripheral's ACK does not trigger the sent-callback immediately. In the example almost a full ISO interval (~10 ms) elapses before the callback is called. This leaves very little time to prepare and send the next SDU. IMHO it would be nicer if one had time to prepare the next SDU triggered by the sent-callback. Is this hard to fix? I saw you already have a TODO for early ACK in the code.

If you don't like the surrounding changes (refactoring, reversal of payload length, etc.) just let me know and I'll remove this unrelated stuff.

Zephyr Controller follows a similar design (implementation choice) to ACL Sending Data for Sending ISO data too, refer to Bluetooth Core Specification Version 5.4, Vol 6, Part D, Section 6.1 Sending Data. In this design, the Number of Completed Packets is generated when corresponding buffer space is freed when acknowledged and/or flushed (in the case of CIS in the Zephyr Controller). But in the implementation of ISO ack/flush, the Zephyr Controller implementation reuses the code (control path) that is determining stale PDUs hence the Number of Completed Packets is delayed by one ISO Interval. An optimization is good to have, but not necessary, as the Specification states (Bluetooth Core Specification Version 5.4, Vol 4, Part E, Section 7.7.19 Number Of Completed Packets event):
While the Controller has HCI Data packets or HCI ISO Data packets in its buffer, it shall keep sending the HCI_Number_Of_Completed_Packets event to the Host at least periodically, until it finally reports that all the pending packets have been completed. The rate with which this event is sent is manufacturer specific.

Hence, an application should not rely on the rate at which the sent callback is called. Application will need to ensure there is enough buffers available/configured in the Controller to be able to provide the ISO data with corresponding sequence numbers for the future SDU intervals.

Zephyr Controller also needs that flush timeout is considered when determining the buffer number, as the implementation does not generate Number of Complete Packets until they are flushed under bad transmissions and NACK conditions. i.e for FT=2, Zephyr Controller needs 5 HCI ISO Data buffers.

Thalley

As others have pointed out, proper synchronization of ISO data over HCI as a generic host (i.e. not knowing what controller we use) is pretty hard (actually impossible, but can be done "good enough").

The use and requirements of both sequence numbers and timestamps are pretty vague / implementation specific in the core spec as per 5.4 unfortunately. Hopefully this will be improved in the future.

Some comments that need to be addressed

Thalley · 2024-05-13T07:08:08Z

samples/bluetooth/central_iso/src/main.c

 */
-static void iso_timer_timeout(struct k_work *work)
+static void iso_send_sdu(uint32_t seq_num, int32_t next_sdu_ts)


Suggested change

static void iso_send_sdu(uint32_t seq_num, int32_t next_sdu_ts)

static void iso_send_sdu(uint32_t seq_num, uint32_t next_sdu_ts, bool is_first_sdu)

I think this will make more sense. There's not really any reason to encode the seq_num and "is_first" into the same variable, and since -1 was only called one place this seems like a better solution to me.

Furthermore, the timestamp is a uint32_t, so for this to have worked the variable should otherwise have been a int64_t.

This approach is now obsolete, so should be resolved. I returned to a timer-based approach as recommended.

Thalley · 2024-05-13T07:08:30Z

samples/bluetooth/central_iso/src/main.c

 	static uint8_t buf_data[CONFIG_BT_ISO_TX_MTU];
-	static bool data_initialized;
+	static size_t len_to_send = ARRAY_SIZE(buf_data);


Suggested change

static size_t len_to_send = ARRAY_SIZE(buf_data);

static const size_t len_to_send = ARRAY_SIZE(buf_data);

len_to_send is now updated again, so this should be obsolete, please check if I'm not being confused here. ;-)

samples/bluetooth/central_iso/src/main.c

Thalley · 2024-05-13T07:13:55Z

samples/bluetooth/central_iso/src/main.c

 	.connected	= iso_connected,
+	.sent = iso_sent,
 	.disconnected	= iso_disconnected,


Suggested change

.connected = iso_connected,

.sent = iso_sent,

.disconnected = iso_disconnected,

.connected = iso_connected,

.disconnected = iso_disconnected,

.sent = iso_sent,

or

Suggested change

.connected = iso_connected,

.sent = iso_sent,

.disconnected = iso_disconnected,

.connected = iso_connected,

.disconnected = iso_disconnected,

.sent = iso_sent,

Personal opinion, but it kind of stood out with sent in the middle of the connected and disconnected callbacks, and also not using the same indentation scheme :D

The sent callback is no longer needed, so I returned to the previous code. This should therefore be obsolete.

fgrandel · 2024-05-13T17:55:30Z

An optimization is good to have, but not necessary, as the Specification states (Bluetooth Core Specification Version 5.4, Vol 4, Part E, Section 7.7.19 Number Of Completed Packets event):

Agreed. But there are good arguments why it should be improved regardless of what the spec allows (see my response to @rugeGerritsen above). This is required to cover up for weaknesses in the spec in other places to achieve a workable solution at all. Guess you agree that it is also not against the spec to improve latency on the sent-callback. Gaining one ISO interval should provide us with enough leeway so that the sent-callback latency is deterministic enough under all practical circumstances to synchronize ISO channels even if HCI works over reasonable remote links. Can you agree with that statement?

samples/bluetooth/central_iso/src/main.c

Thalley · 2024-05-13T19:10:28Z

samples/bluetooth/central_iso/src/main.c

+	struct bt_iso_tx_info iso_tx_info = {0};
+	int err;
+
+	err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info);


Another comment for the use of bt_iso_chan_get_tx_sync: When we use it like this here, we can use the current response to compare with the previous response. IIRC (and please correct me @rugeGerritsen) the return values of bt_iso_chan_get_tx_sync are only when an SDUs have been sent (or scheduled for transmission). In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.

e.g.

App send SDU_1 with TS_1 and PSN_1

App gets sent callback and gets TS_1 and PSN_1 from bt_iso_chan_get_tx_sync

App sends SDU_2 wit TS_1 and PSN_2 (or TS_2 and PSN_1)

Controller reject this as invalid (which not all controllers may do), but still triggers a Number of Completed Packets event

App gets sent callback and gets TS_1 and PSN_1 from bt_iso_chan_get_tx_sync (since it was not transmitted or scheduled for transmission)

App is now aware that the previous TX went wrong and could attempt to fix by increasing PSN or TS accordingly

samples/bluetooth/central_iso/src/main.c

cvinayak · 2024-05-14T03:00:20Z

samples/bluetooth/central_iso/src/main.c

+	struct bt_iso_tx_info iso_tx_info = {0};
+	int err;
+
+	err = bt_iso_chan_get_tx_sync(chan, &iso_tx_info);


In the case that the timestamp is too low or the sequence number is our of order, the return value of the current and previous should be the same.

Only when the Controller has received a SDU from Host, the returned Sequence Number and Timestamp correspond to the last transmitted or scheduled SDU are calculated.

If bt_iso_chan_get_tx_sync is called multiple times without a SDU enqueued, it will return same Sequence Number and Timestamp corresponding to the instant the previous bt_iso_chan_send was called.

Every bt_iso_chan_get_tx_sync after bt_iso_chan_send call will return the Sequence Number and Timestamp of the last transmitted or scheduled SDU. Hence, for the consecutive call to bt_iso_chan_send, the next Sequence Number (returned value plus one) and next Timestamp (returned value plus one SDU interval) be provided to bt_iso_chan_send_ts.

If bt_iso_chan_get_tx_sync followed by bt_iso_chan_send_ts is called in the sent-callback, then the returned Sequence Number and Timestamp could be stale depending on the SDU intervals elapsed from the last call of bt_iso_chan_send_ts to when the current sent-callback executes. If sufficient SDU buffers into the future has been already provided to the Controller, then bt_iso_chan_get_tx_sync will continue to return future Sequence Number and Timestamp ensuring new SDU enqueued to the Controller are not stale.

cvinayak · 2024-05-14T03:19:31Z

samples/bluetooth/central_iso/src/main.c

 	}
+
+	iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US);


Suggested change

iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US);

iso_send_sdu(iso_tx_info.seq_num + 1U, iso_tx_info.ts + INTERVAL_US);

This approach has been reverted, so should be resolved.

cvinayak · 2024-05-14T03:36:33Z

samples/bluetooth/central_iso/src/main.c

+	if (unlikely(is_first_sdu)) {
+		ret = bt_iso_chan_send(&iso_chan, buf, seq_num);


If is_first_sdu, call bt_iso_chan_send once then followed by call to bt_iso_chan_get_tx_sync and bt_iso_chan_send_ts consecutively ((iso_tx.rtn + 1U) * 2U) times, so as to ensure the Controller has sufficient SDUs buffered before sent-callback under the case of retransmissions.

Give this a try under noisy conditions or when devices are far away. Let me know if the Sequence Number and Timestamp get back in sync when the radio condition get better, when devices move back close.

I believe, I now implemented an approach that should tolerate package loss. If it doesn't 100% I hope that my new approach has enough merit that it can be merged anyway. Should I experience problems with synchronization under package loss in practical usage in the future I'll open up an other PR if you don't mind.

Previously the BT_CTLR_CONN_ISO_SDU_LEN_MAX value defaulted to the "magic" number 251 which actually is the default MTU. We make this dependency explicit (and thereby also transparently configurable) by setting the default to BT_ISO_TX_MTU explicitly. This also reduces the probability of inadvertent configuration errors or inconsistencies. Signed-off-by: Florian Grandel <[email protected]>

Fixes a minor typo in the Zephyr BT controller configuration overlay. Signed-off-by: Florian Grandel <[email protected]>

Introduces reversed xmas-tree format to local variables before making changes to them for better readability of subsequent commits. Signed-off-by: Florian Grandel <[email protected]>

The change replaces magic numbers in the ISO CIG configuration by preprocessor definitions. Signed-off-by: Florian Grandel <[email protected]>

Replaces static constant variables by preprocessor definitions for improved readability and (minor) RAM usage improvement. Signed-off-by: Florian Grandel <[email protected]>

Remove CONFIG_BT_CTRL_SCAN_DATA_LEN_MAX: The sample does not employ extended scanning. Remove CONFIG_BT_CTLR_CONN_ISO_SDU_LEN_MAX: This setting defaults to BT_ISO_TX_MTU which is set in the controller-independent prj.conf file and should default to that value. Signed-off-by: Florian Grandel <[email protected]>

The sample used deferred work to schedule ISO SDUs. The deferred work API does not guarantee deterministic timing, though. This commit switches to kernel timers which allow for deterministic periodic or one-off timing as long as timers are re-scheduled in the timer callback which runs directly in ISR context. Signed-off-by: Florian Grandel <[email protected]>

This change introduces a drift compensation algorithm that keeps SDU preparation synchronized with the bluetooth clock in a role, HCI transport and controller agnostic way based on standard HCI commands. Signed-off-by: Florian Grandel <[email protected]>

This change increases the max SDU size to twice the available max PDU size which results in a burst number (BN) larger than one. The sample thereby demonstrates that the drift compensation algorithm can deal with BN > 1. Signed-off-by: Florian Grandel <[email protected]>

Setting the SDU latency parameter to a value larger than the SDU interval results in a flush timeout (FT) larger than one if at the same time the ISO policy is switched from "low latency" to "reliability". This demonstrates that the synchronization algorithm can deal with FT > 1. Signed-off-by: Florian Grandel <[email protected]>

rugeGerritsen · 2024-05-31T07:05:54Z

samples/bluetooth/central_iso/src/bt_clock_hal_nrf5.h

+ *
+ * @brief Bluetooth clock hardware abstraction for nRF5x
+ *
+ * @details Currently, there is no way to access the Bluetooth clock


As you have seen earlier, the timestamps in the HCI packets are defined by the controller.
For the Zephyr open source link layer, the timestamp maps directly to RTC0 ticks. For the SoftDevice Controller the timestamps also takes wrapping into account. Therefore, the approach taken here will only work until the RTC wraps (8 minutes). Also, the approach taken here will only work when run on a single-core 52 series device. That is, it will not work on the nRF53 app core.

Looking at this it looks like the only feasible way forward:

Remove reading out the controller clock

Never issue SDUs with timestamps.

Not use timers to trigger submission of SDUs, just push them to controller as quickly as possible.

The drawback will be that the total end-to-end latency becomes slightly longer.
If we want to optimize for reduced latency, controller-specific assumptions are needed.

I agree with @rugeGerritsen here

As the current BT Core spec works w.r.t. ISO over HCI, we can't really make any (good) assumptions, and the best we really can do is just to always enqueue at least 2 SDUs and TX as fast as possible.

There are no guarantees we can make without knowing the specific controller.

I've recently refactored TX for a similar sample: #73406

Effectively moving a timer-based approach to just sending as fast and much as possible, based on the available buffers.

Keep in mind that the buffer handling will also change as part of #72090

Agreed, see my comment re next steps.

Thalley

There are a lot of good changes in this PR, but not sure if some of the platform specific ones are good to have.

This is, after all, a sample and not a full application.

Thalley · 2024-05-31T07:57:46Z

samples/bluetooth/central_iso/prj.conf

+# Max 2 PDUs per SDU (BN = 2)
+CONFIG_BT_ISO_TX_MTU=494


Not sure I understand this.

CONFIG_BT_ISO_TX_MTU effectively reflects the maximum size of the SDUs. The SDUs can be any value below this, and thus increasing this size does not necessarily affect the number of PDUs per SDU.

The sample now counts up to 494 bytes, so two PDUs will effectively be required to transfer that SDU (as the MAX PDU size payload is limited in the sample to exactly half of this value). This is required to prove that timing will work even if SDU interval != ISO interval.

Thalley · 2024-05-31T07:59:52Z

samples/bluetooth/central_iso/src/main.c

+/* Currently only nRF5 hardware is supported by this sample. */
+#include "bt_clock_hal_nrf5.h"


At least in upstream. I don't think we should make too many assumptions of how samples may be used by other projects, and samples should not (without proper guards at least) assume or require any specific hardware.

Agreed, see my comment re next steps.

Thalley · 2024-05-31T08:00:21Z

samples/bluetooth/central_iso/src/main.c

-NET_BUF_POOL_FIXED_DEFINE(tx_pool, 1, BT_ISO_SDU_BUF_SIZE(CONFIG_BT_ISO_TX_MTU),
+
+static struct bt_iso_tx_info last_iso_tx_sync_info;
+static inline bool is_synchronized(void)


Suggested change

static inline bool is_synchronized(void)

static bool is_synchronized(void)

inlines are rarely an improvement, and should generally be left to the compiler

Thalley · 2024-05-31T08:02:32Z

subsys/bluetooth/controller/Kconfig

@@ -945,7 +945,7 @@ config BT_CTLR_CONN_ISO_SDU_LEN_MAX
 	int "Maximum Connected Isochronous Channel SDU Length"
 	depends on BT_CTLR_CONN_ISO
 	range 1 4095
-	default 251
+	default BT_ISO_TX_MTU


@cvinayak is BT_CTLR_CONN_ISO_SDU_LEN_MAX including the size of the bt_hci_iso_sdu_hdr/bt_hci_iso_sdu_ts_hdr headers?

No, this is SDU length (no HCI layer)

fgrandel · 2024-06-02T20:23:53Z

@Thalley @rugeGerritsen First of all: Thx again for your extensive and fast review. Highly appreciated!!

Before I try to fix remaining test failures let's see how and whether I can progress in a way that fits both, your needs and mine.

I totally agree with your feedback re hardware-specific code in the sample (and more or less expected it). OTOH sending data as fast as possible is really a boring solution that wouldn't motivate me to further spend my free time (as I obviously scratch my own itch here and the existing HW specific solution is totally fine for what we need).

Plus sending data as fast as possible and relying on (variable) backpressure and non-deterministic latency from the stack defeats the very idea of isochronous data sources.

Therefore I'd like to propose something different:

I've been thinking about a generic clock syntonization (aka drift compensation) framework for a long time already. See the linked resource for a definition of clock syntonization (as opposed to synchronization).

I already verified most elements of this approach within my (currently stalled) 802.15.4 TSCH protocol project. And I still believe that a general clock source (counter management/syntonization/synchronization) framework would really be a great and long overdue addition to Zephyr. And I might be able to make it useful in this case.

Basically I propose to buffer local timestamps until I get corresponding TX sync info via HCI. Once I get that info I can syntonize the local clock to the remote clock via a configurable syntonization strategy within the clocksource framework. Initially I'd use Zephyr's existing clock syntonization algorithm internally. But as the syntonization approach will be pluggable, something like a PID controller could be added any time and re-used across the system (as is required for more precise clock syntonization and quicker convergence, e.g. in PTP).

Assuming arbitrary latency between data preparation and data sending but limited jitter (which I'd expect to be a reasonable assumption, even in the HCI over UART case) such an approach would make the solution completely hardware / HCI agnostic and still optimize latency and (more importantly) provide stable isochronous timing of measurements across an arbitrary number of peripherals which is the very idea of isochronous channels.

The syntonization framework implementation would be kept locally to the network subsystem for the time being and would leverage the existing k_time (ns precision timestamp) framework already introduced there for network timestamps. The sample would only use the API and could therefore be simplified. Would that be a compromise acceptable upstream?

Thalley · 2024-06-03T08:54:09Z

I totally agree with your feedback re hardware-specific code in the sample (and more or less expected it). OTOH sending data as fast as possible is really a boring solution that wouldn't motivate me to further spend my free time (as I obviously scratch my own itch here and the existing HW specific solution is totally fine for what we need).

I think board specific behavior is something that fits in an application, but not necessarily a sample. If in a somewhat generic way that does not make the sample depend on it and does not make the code hard to follow, I would be OK with having board (or platform) dependent code here (so it's not an automatic rejection).

Plus sending data as fast as possible and relying on (variable) backpressure and non-deterministic latency from the stack defeats the very idea of isochronous data sources.

Therefore I'd like to propose something different:

I've been thinking about a generic clock syntonization (aka drift compensation) framework for a long time already. See the linked resource for a definition of clock syntonization (as opposed to synchronization).

I already verified most elements of this approach within my (currently stalled) 802.15.4 TSCH protocol project. And I still believe that a general clock source (counter management/syntonization/synchronization) framework would really be a great and long overdue addition to Zephyr. And I might be able to make it useful in this case.

Basically I propose to buffer local timestamps until I get corresponding TX sync info via HCI. Once I get that info I can syntonize the local clock to the remote clock via a configurable syntonization strategy within the clocksource framework. Initially I'd use Zephyr's existing clock syntonization algorithm internally. But as the syntonization approach will be pluggable, something like a PID controller could be added any time and re-used across the system (as is required for more precise clock syntonization and quicker convergence, e.g. in PTP).

Assuming arbitrary latency between data preparation and data sending but limited jitter (which I'd expect to be a reasonable assumption, even in the HCI over UART case) such an approach would make the solution completely hardware / HCI agnostic and still optimize latency and (more importantly) provide stable isochronous timing of measurements across an arbitrary number of peripherals which is the very idea of isochronous channels.

The syntonization framework implementation would be kept locally to the network subsystem for the time being and would leverage the existing k_time (ns precision timestamp) framework already introduced there for network timestamps. The sample would only use the API and could therefore be simplified. Would that be a compromise acceptable upstream?

That sounds like a very interesting project, and could be a nice way of doing this.
However, the fact of the matter is that HCi does not provide any guarantees whatsoever with any timing.

The timestamps generated by the controller is basically "whatever it wants":

In my eyes, the main issue is that you cannot do anything "good" (e.g. proper use of timestamps) as a generic host. ISO over HCI is honestly just pretty terribly designed for generic hosts such as Zephyr. If* (and that's a big if) we want the sample here to assume anything about a controller, it should be based on the Zephyr controller.

If you want to go the road of improving and optimizing TX, then we should add this sample to a babblesim tests in CI, and run tests there that can verify that such optimizations work, and will also ensure that they keep working by adding some pass criteria

fgrandel · 2024-06-17T08:51:05Z

@Thalley @rugeGerritsen I have to apologize again for being inactive for such a long time. I've not given up on this ticket and will work on it again as soon as I've time.

github-actions · 2024-08-17T00:29:42Z

This pull request has been marked as stale because it has been open (more than) 60 days with no activity. Remove the stale label or add a comment saying that you would like to have the label removed otherwise this pull request will automatically be closed in 14 days. Note, that you can always re-open a closed pull request at any time.

fgrandel added platform: nRF Nordic nRFx area: Bluetooth ISO Bluetooth LE Isochronous Channels labels May 11, 2024

fgrandel requested review from Thalley and cvinayak May 11, 2024 15:36

fgrandel self-assigned this May 11, 2024

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch from fb95936 to 916630e Compare May 11, 2024 15:52

fgrandel added the area: Bluetooth label May 11, 2024

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch 2 times, most recently from d269363 to 84989c5 Compare May 11, 2024 16:35

fgrandel marked this pull request as ready for review May 11, 2024 16:36

zephyrbot added the area: Samples Samples label May 11, 2024

zephyrbot requested review from asbjornsabo, hermabe, jhedberg, kartben, nashif, sjanc and Vudentz May 11, 2024 16:37

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch from 84989c5 to 723196f Compare May 11, 2024 16:43

fgrandel marked this pull request as draft May 12, 2024 08:08

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch from 723196f to 12f1ef9 Compare May 12, 2024 15:43

fgrandel marked this pull request as ready for review May 12, 2024 15:49

rugeGerritsen reviewed May 12, 2024

View reviewed changes

Thalley requested changes May 13, 2024

View reviewed changes

Thalley reviewed May 13, 2024

View reviewed changes

cvinayak reviewed May 14, 2024

View reviewed changes

zephyrbot requested a review from thoh-ot May 30, 2024 15:59

fgrandel requested review from Thalley, rugeGerritsen and cvinayak May 30, 2024 16:29

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch 4 times, most recently from 74a1bce to eb8f0b6 Compare May 31, 2024 06:24

fgrandel and others added 10 commits May 31, 2024 08:29

samples: bluetooth: central_iso: fix minor typo

16cdf93

Fixes a minor typo in the Zephyr BT controller configuration overlay. Signed-off-by: Florian Grandel <[email protected]>

samples: bluetooth: central_iso: reversed xmas-tree

81b205f

Introduces reversed xmas-tree format to local variables before making changes to them for better readability of subsequent commits. Signed-off-by: Florian Grandel <[email protected]>

samples: bluetooth: central_iso: replace magic numbers

ebced06

The change replaces magic numbers in the ISO CIG configuration by preprocessor definitions. Signed-off-by: Florian Grandel <[email protected]>

samples: bluetooth: central_iso: remove unnecessary vars

c4eb456

Replaces static constant variables by preprocessor definitions for improved readability and (minor) RAM usage improvement. Signed-off-by: Florian Grandel <[email protected]>

fgrandel force-pushed the fix/72452-ble-iso-sample-sync branch from eb8f0b6 to a2f74de Compare May 31, 2024 06:29

rugeGerritsen reviewed May 31, 2024

View reviewed changes

Thalley requested changes May 31, 2024

View reviewed changes

github-actions bot added the Stale label Aug 17, 2024

github-actions bot closed this Aug 31, 2024

fgrandel reopened this Sep 1, 2024

fgrandel removed the Stale label Sep 1, 2024

kruithofa removed their request for review September 3, 2024 05:40

	static void iso_send_sdu(uint32_t seq_num, int32_t next_sdu_ts)
	static void iso_send_sdu(uint32_t seq_num, uint32_t next_sdu_ts, bool is_first_sdu)

	static size_t len_to_send = ARRAY_SIZE(buf_data);
	static const size_t len_to_send = ARRAY_SIZE(buf_data);

		}

		iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US);

	iso_send_sdu(iso_tx_info.seq_num + 1, iso_tx_info.ts + INTERVAL_US);
	iso_send_sdu(iso_tx_info.seq_num + 1U, iso_tx_info.ts + INTERVAL_US);

		if (unlikely(is_first_sdu)) {
		ret = bt_iso_chan_send(&iso_chan, buf, seq_num);

		/* Currently only nRF5 hardware is supported by this sample. */
		#include "bt_clock_hal_nrf5.h"

	static inline bool is_synchronized(void)
	static bool is_synchronized(void)

samples: bluetooth: central_iso: fix synchronization of data submission #72622

Are you sure you want to change the base?

samples: bluetooth: central_iso: fix synchronization of data submission #72622

Conversation

fgrandel commented May 11, 2024 • edited Loading

fgrandel commented May 12, 2024

fgrandel commented May 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cvinayak May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cvinayak commented May 13, 2024 • edited Loading

Thalley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgrandel commented May 13, 2024

Choose a reason for hiding this comment

cvinayak May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Thalley left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

fgrandel commented Jun 2, 2024 • edited Loading

Thalley commented Jun 3, 2024

fgrandel commented Jun 17, 2024

github-actions bot commented Aug 17, 2024

fgrandel commented May 11, 2024 •

edited

Loading

fgrandel commented May 12, 2024 •

edited

Loading

cvinayak May 14, 2024 •

edited

Loading

cvinayak commented May 13, 2024 •

edited

Loading

cvinayak May 14, 2024 •

edited

Loading

fgrandel commented Jun 2, 2024 •

edited

Loading