[RFC] Proposal for a Precision Clock Subsystem Architecture #76335

fgrandel · 2024-07-26T14:31:57Z

This RFC consolidates ideas from from prior RFCs, PRs, standards (PTP, NTP, BLE, IEEE 802.15.4, ...), Discord threads, source code (Zephyr, Linux Kernel, linuxptp, ...) and other sources maintained since 06/19.

TLDR;

This RFC is work-in-progress to specify concepts and APIs for an abstract precision clock subsystem architecture in Zephyr.

The following layers are specified:

uptime counter layer: tick-based overflow protected uptime counters with optional power optimization (wake/sleep counter integration) as thin API wrappers around existing counter/clock drivers like system uptime counter, timer peripherals, radio counters, etc.
high-precision syntonization layer: monotonic and continuous syntonized uptime references, divided into precision timestamp services (RTC, PPS, dedicated PTP ETH peripherals, radio timestamping, etc.) and syntonization algorithm components (e.g. algorithms for leap second/drift smearing or interpolation)
abstract timescale/clock layer: representing distributed timescales and offsets (epochs) like UTC, TAI, etc.
client adapters for POSIX, clib, specific applications or protocols, etc.

While consolidating input from a large array of existing sources, this RFC adapts them to RTOS-specific requirements like low-power, resource limitations and a special focus on synchronization via RF protocols like BLE or IEEE 802.15.4. Care has been taken that the RFC can be implemented in small individual steps that contribute useful features on their own.

Concepts Model: Integration with existing Zephyr concepts and newly proposed concepts

Note: The model is a conceptual overview and wip. Attribute-level API details, references to existing SoCs, peripherals, driver APIs and protocols are preliminary. Black color marks existing APIs, red color proposed new APIs.

All references to ktime_t correspond to the current concept of net_time_t which has been introduced into the net subsystem as a precursor of a generic nanosecond time representation throughout Zephyr.

Proposed Architecture

Uptime Counter Layer

Overflow protected uptime counters, optionally hybrid low-power implementation based on separate wake and sleep counter peripherals as thin wrappers around e.g.:

sysclock: timer.c/timeout.c - default OS uptime ticks or cycles with additional overflow protection in case of 32-bit implementations (as sleep/awake uptime counter source, optionally as low-level timer/alarm source)
any counter.h-comptible driver with additional overflow protection (as arbitrary precision sleep or awake uptime counter source independent from the kernel system clock, optionally as low-level alarm source)
any rtc.h-compatible driver, usually tick of 1sec (as sleep uptime counter source), not to be confused with RTCs to be used as timestamping source in the syntonization layer
legacy RTC drivers or arbitrary hardware counter peripherals directly (see drivers/timer/nrf_rtc_timer.h, drivers/rtc/*, as sleep uptime counter source), access to CPU instruction counters
not to be confused with Linux clock sources which are neither overflow protected nor hybrid counters optimized for low-power applications
similar to the combination of a suspend and wake time clock source in Linux clocksource.c or the combination of the TSC/ART timers in Intel architectures. We do NOT bind to a single system clock, though, nor do we bind to a specific vendor's hardware.

Syntonization Layer

Monotonic and continuous syntonized uptime references:

This layer consists of two separate components:

hardware-assisted timestamping services that assign (sub-)ns-precison timestamps to specific ticks of a (class of) specific layer-1 uptime counters based on timing event sources (e.g. ETH or transceiver timestamps, PPS, PTP, NTP, RTC, GPS, ...) in regular intervals, triggered by inbound timing events (pulses, net packets) or on demand.
syntonization algorithms that discipline (drift compensate) local uptime counters and interpolate between subsequent timestamped counter ticks to maintain monotonicity and continuity of uptime references.

Remarks:

One (class of) uptime counters can be timstamped by any number of compatible timing event services. As timestamping services require direct access to counter peripherals for precision timestamping, a single timing event service cannot be combined with several (classes of) uptime counters, though (ie. this is a one-to-many relationship).
The monotonicity and/or continuity requirements can be relieved for certain application areas to simplify implementation of uptime references. Sometimes it may be enough to just reset the reference on every timing event when continuity is not required.
some counter peripherals implement syntonization features in hardware (e.g. PTP-aware ethernet peripherals). In these cases, the uptime reference might be strongly coupled to a specific uptime counter.
separate system and network uptime references may be defined and chosen (via DT and/or Kconfig)
an optional low-latency timer multiplexing framework based on syntonized uptime (timer multiplexing similar to hrtimer) may be derived as a basis for high-level POSIX timers (see layers 3 and 4)
The system timeout.c/timer.c infrastructure could be made instantiable to derive multiplexed ns-precision uptime reference timers from uptime counters, see one attempt in [RFC] abstract clock subsystem: counter syntonization #60400

Examples:

The "syntonization part" of phc2sys from PTP
the Time Utilities API gives an example of a generic hardware-agnostic syntonization algorithm
Syntonization algorithms can be arbitrarily complex and must be pluggable as there is no "one best" syntonization algorithm, see D. Mills' "Kernel Model for Precision Timekeeping" (RFC 1589) as implemented in several *nix systems, a recent gPTP PR trying to introduce a PI synchronization controller or leap second smearing
the Zephyr Timing API could benefit from syntonized clocks for precise (e.g. PTP based) timings.
possible reference clocks in Zephyr are network reference clocks (gPTP, drift-compensated peripheral BLE ticker, TSCH TDMA, CSL) - similar to dynamic clock sources in Linux - or local PPS-type peripherals (rtc.h or a future pps.h for GPS, serial lines or even miniature local atomic clock peripherals)
synchronization of several counters will be required in the (g)PTP subsystem, when multiple PTP ports (different media, different peripherals, ...) of the same PTP instance need to share a common local clock with high precision, e.g. to implement PTP Relay Instances (IEEE 802.1AS-2020) or Boundary Clocks/Transparent Clocks (IEEE 1588).

Abstract Timescale/clock Layer

Timescale wrappers for the layer 2 system uptime reference source implementing a common timescale API:

A timescale (or clock in the sense of this RFC) is a syntonized uptime reference additionally offset by a well-defined epoch, thereby definining a common "zero point" for time.
Offsetting and adjusting the chosen layer 2 syntonized system uptime reference to appropriate timescales (MONOTONIC, REAL/UTC, TAI, ...) may be implemented as a collection of stateless and stateful utility algorithms. This is called "synchronization". While syntonization only guarantees that clocks tick at the same frequency, synchronization ensures that the common epoch (time offset) is kept the same across different representations of the same clock, i.e. they "show the same date and time".
Offsetting is usually not persistent across power cycles.
Power-on offsets may come from an external network source (PTP, NTP, ...) or from a local battery driven clock.
adjustment may be complex, examples: NTP, the "initial offseting" part of phc2sys from PTP, clock_adjtime(2)
Whether adjustments are to be implemented in the syntonization or synchronization layer depends on whether the offset is to be changed (synchronization) or whether the frequency needs to be controlled (syntonization).

Client Adapter Layer

Additional POSIX / clib / calender / timezone clients that use timescaled clocks exposed by the clock subsystem to provide standard APIs for POSIX/libc clock access or implement high-level calendar and timezone support (e.g. CLOCK_REALTIME or CLOCK_MONOTONIC). The clock subsystem must provide a sufficiently capable default system uptime reference and a minimal choice of default timescales based on the kernel system clock that is guaranteed to be present on all Zephyr systems.

These architectural layers can be implemented one by one from low-level to high-level. Each layer will immediately provide value for specific applications without the higher levels being present. It is sufficient to provide basic algorithms for each layer initially as long as we ensure that the architecture is extensible enough to cater for more complex algorithms if needed later on.

Specifics of an "embedded" clock subsystem for Zephyr

Clock Diversity and Decentralization

Linux typically centers its notion of time around a single system clock (with only one or two underlying clock sources). Other notions of time are mostly derivatives that inherit basic properties of the system clock source (resolution, precision, energy consumption, continuity during low-power modes, etc). While independent ("dynamic") clocks and alternative synchronization approaches (PPS, PTP) may exist, they are rather hard to combine and synchronize unless referred to the common system clock.

Such a "centralized" approach does not seem right for an embedded real time OS where several clock stacks with diverse properties and distinct trade offs typically need to co-exist on an equal basis:

resolution/precision vs. low-power,
"hard" realtime vs. "soft" scheduling
different levels of clock interrupt priority
clock peripherals distributed over distinct power domains or across network interfaces
arbitrary collections of on-soc, on-board and remote clock peripherals need to be kept in sync or isolated
precise distributed clocks independent from the OS clock are at the core of embedded time sensitive / low-power networking and control applications

We propose an architecture where any number of independent clock stacks can be assembled from basic building blocks and co-exist on an equal basis. Any of these can be chosen to provide features of a traditional OS system clock. But the clock behind POSIX/libc APIs should no longer necessarily be the same as the clock behind kernel scheduling. Both should be configurable independently of each other at build time.

All configured clock stacks remain independently accessible, configurable and "synchronizable". There can be any number of concurrent clocks for subsystems like slotted automation control networks, BLE, 5G, PTP and TSCH. Whether and which of these clocks are synchronized and which should remain isolated should be constrained by configuration, not by the architecture.

Optimized for precision, constrained power, computation and memory resources

On an embedded system, offloading of timing and scheduling to dedicated peripherals is important:

Traditional OS clocks require CPU intervention on timing critical paths. Nanosecond precision clock syntonization and scheduling can only be achieved through specialized hardware, drivers and fixed-latency ISRs.
CPU and OS involvement must be minimized to save power, memory/stack and computing resources
Easy and frequent switching between different power modes must be supported
Combining configurability with modularity allows us to create highly diverse and specialized minimal firmware runtime bundles. Offloading of configuration and compute tasks to the build process keeps the firmware small and fast.

Modularization into basic building blocks is key

The basic idea of the architecture is to specify re-usable building blocks that can be combined among each other to form independent clock stacks. "Pluggability" exists at all levels: uptime counters, timing event services, syntonization strategies, derived clocks and user or library clients:

partial clock stacks can be configured and used if the application does not require a full clock stack (applications may require only low-power, overflow protected uptime counters, others need syntonization but no synchronization, many will not need POSIX/libc)
the same clock source can be re-used in several uptime counters (e.g. a single always-on clock can be used to maintain sleeptime continuity of any number of independent high resolution network interface timers)
re-usable uptime counters can be based on any number and combination of pre-existing drivers (rtc, counter, system clock, etc.) or directly on hardware peripherals
low-power continuity strategies (based on low-level peripherals or pps-style drivers) can be re-used across uptime counters
a single physical uptime counter can be logically syntonized/synchronized to different timing event sources
syntonization algorithms can be re-used across counters
any syntonized clock can be used as a basis for any number of higher-level timescales
the same timescale can be computed on distinct syntonized clocks
Zephyr-specific configuration, power management and debugging approaches are natively supported by all components.
We prune from the build what is not required by the application. No common infrastructure is needed.
Several variants of the same application can be built with different clock stacks and timing/hardware profiles based on build-time configuration (*.prj, devicetree, Kconfig, run-time assembled stacks).

Need for direct access to counter values underlying clock sources

For precise timing when dealing with low-energy peripherals, synchronized wireless protocols or synchronized real-time actors in distributed systems, it is often not acceptable to let the system CPU interfere with alarms or do the scheduling. That excludes ISR-based alarm/timer callbacks even if they are constant-latency. Pre-programming triggers with specific counter values in advance is usually required.

For the same reason it is usually not acceptable to work with approximate time values for hard realtime requirements. Time representations from all layers must be deterministically convertible to low-level counter values. All conversions must be implemented inside the clock subsystem to protect applications from conversion error or unintended abuse of "pseudo-precise" nanosecond timestamps.

The following use cases must be supported in the proposed architecture:

Applications requiring hard realtime or very high resolution timing must be able to deterministically pre-calculate precise (opaque) low-level counter values based on the well-defined nanosecond representation of time in syntonized or timescaled clocks and inject them into hardware in a driver-specific way for scheduled RX/TX. The radio timer (RAT) for example can then be programmed w/o IEEE 802.15.4/BLE L2 having to care about the details of conversion between external time sources, local low-energy counter and fast radio counter.
A cross-counter nano- or microsecond precision and overflow protected uptime abstraction above counter peripherals and convertible w/o loss to/from timescaled representations is required to convert between counters or relate low-level counter values to the reference clock w/o loss of precision (i.e. syntonization error). Nanosecond values encoded as int64_t (aka net_time_t) on level 2 (scalar syntonized uptime) or 64bit struct timespec on level 3 (timescaled values) are adequate for this purpose within the clock subsystem to avoid dependency on higher level concepts (e.g. POSIX timeval).

Examples of where such hardware support is required is timed RX/TX as in CSL (as used in Thread protocol) or #50336, see IEEE 802.15.4-2020, sections 6.12.2 and 6.5.4 plus other IEEE 802.15.4 features like synchronized PANs, RIT, and so on.

Case Study: TSCH TDMA protocol operation

TSCH is a TDMA protocol that defines cyclic timeslots at a fixed frequency (e.g. 10ms / 100Hz). Inside a timeslot high resolution timing is required to schedule TX packets and reception windows including ACKs at precise moments in time.

TSCH uses aspects of the slotted and time based architectures mentioned above.

The requirements are more specifically:

timeslot synchronization uptime counter:
- local clock syntonized to the global, distributed TSCH clock via one or more remote time synchronization neighbors
- (optionally) distinct from the system uptime clock to allow for concurrent NTP/PTP syntonization
- low frequency/low resolution
- always on
- low power,
- must not wrap
- requires a guard period to protect against late alarms similar to counter.h
high-resolution intra-timeslot radio counter:
- PPS-style syntonization with timeslot uptime at the beginning of each timeslot
- autonomous/non-adjusted local high-frequency oscillator (typically >= 1MHz)
- high frequency/high resolution
- may be stopped (sleep) and started (awake)
- may wrap, therefore requires active overflow protection,
- requires a guard period to protect against late alarms similar to counter.h
- provides pre-calculated counter values for hardware assisted scheduling of timed RX/TX
hybrid network subsystem uptime counter: The fast radio timer can be switched off to save power, so it must be resynchronized to the slower low-power clock when restarted. This requires hardware support: The timestamp of the fast radio timer must be captured at well defined edges of the low-power clock relating the two clocks with high precision and very low, deterministic jitter (see Linux PPS or BeagleBone GPIO PPS generator for comparison). The PPS pattern hides hardware-specific implementation details of synchronization, is vendor-agnostic and can therefore be re-used directly from L2 across L1 radio driver implementations. To target low-power devices, periodic PPS ticks are inadequate. A tickless (fetch-only) PPS is proposed for low-power systems. The two clocks together with access to a common PPS driver are the building blocks from which a high resolution, high precision, overflow protected hybrid radio uptime counter can be constructed.
syntonized TSCH uptime refererence as network subsystem clock: The TSCH time synchronization protocol reports phase deviations of the hybrid radio counter from timesource neighbours. Precise hardware assisted timestamping of incoming and outgoing radio packets is required for syntonization. As these timestamps will be captured related to the radio uptime counter, they may be used to discipline (syntonize) the radio uptime reference based on the radio uptime counter. Syntonization is based on algorithms that calculate some kind of statistic approximation based on each incoming/outgoing packet or keepalive message.

This same pattern of syntonized hybrid uptime counter and reference can be re-used with any timed token/cycled/slotted system like Profibus/Profinet or Sercos. These are often used in industrial real time environments (e.g. automation, robotics/motion control, etc.).

Potential for a system-wide "wall clock" in the presence of TDMA (slotted, cycled) protocols

Based on the proposed architecture the TSCH uptime reference (or Bluetooth uptime reference or that of any other TDMA/slotted protocol) could be made available as a "dynamic" distributed real-time reference w/o the need for higher level protocols (IP, NTP, PTP) or additional hardware (GPS modules or special ethernet cards).

The TDMA uptime reference can then be configured as the system-wide uptime reference with well defined precision and accuracy. To provide a distributed timescale an epoch plus other timescale parameters need to be agreed via an additional out-of-band channel (e.g. by exchanging a few proprietary network messages) between time neighbors.

The error (offset/jitter) of a TDMA uptime reference should be comparable to that of an NTP clock, AFAICS, certainly worse than PTP/GPS, but still good enough for many use cases. Some devices (especially those with ToF ranging capability) might be able to syntonize with much higher precision, maybe even in the range of (g)PTP which would enable PTP-style reference clock propagation across such wireless networks. In the TSCH case the clock syntonization hierarchy is similar to that of NTP strati. The closer a node to the PAN coordinator the "better" its accuracy (lower stratum).

Comparison with BLE air interface timing requirements

The concept of BLE's active clock (Vol 6, part B, section 4.2.1) is very similar to the requirements of intra-timeslot timing for TSCH.

The same similarity exists between BLE's sleep clock (ibid, section 4.2.2) and TSCH's timeslot synchronization clock.

Initial synchronization to a TSCH PAN is very similar to BLE's synchronization state and procedures (ibid, section 4.4.5).

The infrastructure to be developed SHOULD be fully compatible with Zephyr's existing BLE split controller's counter HAL and ticker.c including timer multiplexing and slot reservation, so that IEEE 802.15.4, gPTP and BLE subsystems can hopefully re-use the same basic precision timing framework for their respective scheduling purposes.

Potential Future Use Case: Channel Sounding / RTLS

Both, FiRa/UWB and BLE channel sounding will require precision timing. It would be nice if we could expose these highly accurate timing sources to applications. It might also turn out that the framework proposed here is already a good base or can easily be extended for timing in real time locating use cases.

Originally posted by @fgrandel in #19030 (comment)

The text was updated successfully, but these errors were encountered:

fgrandel self-assigned this Jul 26, 2024

This was referenced Jul 26, 2024

[RFC] Abstract Clock Subsystem Architecture #19030

Closed

roll-up issue for timer/clock enhancements #19282

Closed

fgrandel changed the title ~~[RFC] Proposed Abstract Precision Clock Subsystem Architecture (Syntonization, Timescale, Posix Clocks, etc.)~~ [RFC] Proposal for a Precision Clock Subsystem Architecture Jul 26, 2024

fgrandel mentioned this issue Sep 22, 2024

tests: kernel: timer: Fix failing tests for custom k_busy_wait() #73068

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Proposal for a Precision Clock Subsystem Architecture #76335

[RFC] Proposal for a Precision Clock Subsystem Architecture #76335

fgrandel commented Jul 26, 2024 •

edited

Loading

[RFC] Proposal for a Precision Clock Subsystem Architecture #76335

[RFC] Proposal for a Precision Clock Subsystem Architecture #76335

Comments

fgrandel commented Jul 26, 2024 • edited Loading

TLDR;

Concepts Model: Integration with existing Zephyr concepts and newly proposed concepts

Proposed Architecture

Uptime Counter Layer

Syntonization Layer

Abstract Timescale/clock Layer

Client Adapter Layer

Specifics of an "embedded" clock subsystem for Zephyr

Clock Diversity and Decentralization

Optimized for precision, constrained power, computation and memory resources

Modularization into basic building blocks is key

Need for direct access to counter values underlying clock sources

Case Study: TSCH TDMA protocol operation

Potential for a system-wide "wall clock" in the presence of TDMA (slotted, cycled) protocols

Comparison with BLE air interface timing requirements

Potential Future Use Case: Channel Sounding / RTLS

fgrandel commented Jul 26, 2024 •

edited

Loading