Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add timestamp field to Update message #121

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

earies
Copy link

@earies earies commented Mar 30, 2022

Currently, the specification as defined attaches a timestamp field to
the Notification message and a Notification message can contain one
or more Update messages. The timestamp field is defined as the
following:

The time at which the data was collected by the device from the
underlying source, or the time that the target generated the
Notification message (in the case that the data does not reflect an
underlying data source). This value is always represented according to
the definition in 2.2.1.

In various pipelines, many data sources can contribute to the data that
is packed in a single Notification message however the timestamp can
only represent the coalesced view. An example of this is data sources from a
distributed system (e.g. linecards) where hardware timestamps can represent the
true source prior to the aggregation and serialization towards the ultimate
TCP/gRPC session. The true "underlying source" timestamp is then lost and/or
only representative of the single packed Notification that aggretates these
Update messages.

This proposal is to add an optional timestamp field to the Update
message in order to have the ability to represent the true data source
timestamp. The timestamp field at the Notification message is retained for
backwards compatibility, the ability for an implementation to choose to support
1 or both timestamps and in the event of supporting both, gives the ability to
determine potential issues within the system pipeline by calculating deltas
from the original data sources.

If the proposal is accepted, relevant gNMI specification documents will be
updated in a subsequent commit.

@robshakir
Copy link
Contributor

I don't think that we should make this change.

(//cc @gcsl @marcushines)

We specifically made the choice when designing gNMI that Notification was the unit at which the timestamp is expressed, this means that there can be some packing of updates into a single Notification but also means that the packing of updates is limited by the commonality of their timestamp. Approaches which give more flexibility - such as this one - then lend themselves to more complex packing approaches, which can negatively impact latency.

I have a number of other observations:

  • you say that the timestamp is being used to represent the coalesced view when there are more granular timestamps for the underlying data. If the 'true source' has these, they should be expressed in individual Notification messages - such that this information is not lost. It doesn't seem like we need to have a change in the specification to accommodate not losing this information. The choice to pack data with different timestamps into a single Notification sounds like a bug here, rather than a feature.
  • The "cost" of additional Notification messages doesn't seem like it has generally been observed to be a problem across implementations that I've been involved in discussions, such that the advantage of 'packing' here doesn't seem clear. Do you have an analysis here as to what the advantage of this packing is?
  • I think there is a major backwards compatibility concern here -- whereby implementations of collectors need to now consider these different updates within a Notification, and allow for reconstructing Notification messages based on the updates. It's not clear to me that this is something that is simple to introduce in a backwards compatible way. There are also numerous, mature implementations in production deployment such that the cost of this change could be very high.
  • I have some concerns that allowing different timestamp values means that data will be expected to be from disparate producers which supply different paths. If this is the case (obviously, I am not sure whether this will be in your implementation) then this will break the use of the prefix field. The data volume reductions we've seen in production from use of prefix are high, so this is quite a risk to deployments AFAICS.

Overall, I'm therefore not inclined to accept the change. I'd appreciate other's thoughts.

@gcsl
Copy link
Collaborator

gcsl commented Mar 30, 2022

I concur with @robshakir here. The single timestamp was a deliberate decision given the streaming implementation and the goal for low-latency updates. Data that is collected together can be sent together so long as the timestamp is accurate for that data. We specifically did not allow for data at different times to be sent in a single payload as it would mean there would be too much flexibility in building massive payloads that are bursty on both the server and client side of the connection. Within gNMI there should be no need to focus on building large payloads to make efficient use of MTU at the TCP layer as gRPC streaming abstracts that away. Data should be queued for sending as soon as it is ready, not buffered in the application space for packing.

From the collection side of gNMI we break all notifications into single update payloads for indexing and distribution to clients as seen in the examples in github.com/openconfig/gnmi. There is an obvious on-the-wire overhead for a client that wants everything as it eliminates the savings of using a common prefix across multiple updates, but it enables efficient delivery of only the desired information to clients that request only a specific subset. The point being, we already actively use the worst case, one update per notification, at full production scale.

I am sure a metric could be derived that would favor more packing but I think that metric would be contrary to the real-time goals of delivering data with low latency.

@marcushines
Copy link
Contributor

I concur both rob and csl - the notification was built as the boundary between atomic things not updates.

@ccole-juniper
Copy link

The motivating issue is that the underlying hardware is tagging counters with "this was good @ ts" but those counters are getting batched together in a notification (since they are getting polled in a batch). This is causing jitter in computed rates due to loss of granularity.

  • The "cost" of additional Notification messages doesn't seem like it has generally been observed to be a problem across implementations that I've been involved in discussions, such that the advantage of 'packing' here doesn't seem clear.

As was pointed out, one alternative approach that we have considered is disaggregating updates in those cases. Our concern was potential performance impact (payload size and latency). We can measure this and see the effect of disaggregation.

@samante
Copy link

samante commented Apr 1, 2022

@marcushines, @robshakir: This request has originated from myself and I would kindly request you reconsider this PR. As Ebben points out, the design of distributed systems is such that there is asynchronous polling at irregular intervals that update octets or packet statistics from, for example, ASIC registers to linecard CPU (to Routing Engine/Processor). Let's assume for the sake of argument that the ASIC is publishing an update to the linecard CPU DRAM every approximately 2 seconds for all of the interfaces that are in use on that ASIC, e.g.: 8 interfaces per ASIC. Let's also assume that the collector is receiving telemetry at a separate rate of approximately every 10 seconds, (however, this frequency doesn't really matter as the problem still occurs even at lower export rates, as well). What one can easily observe externally at the collector is that at non-uniform intervals you see smooth average bits/sec and/or packets/sec, then at a random point in time in ~1 - 2 minute window, you randomly see a "spike" in traffic rate, then it returns to a uniform average bits/sec or packets/sec. The reason that this occurs is, internally, within the distributed system you have some ASICs that will send a uniform, consistent 4 samples of statistics to the linecard CPU DRAM most of the time within that, say, 10 second window before the data is published externally. However, due to the nature of asynchronous polling you end up with random points in time where the ASIC updates the linecard CPU DRAM with 5 samples of octets, packet and error statistics for the interfaces on that ASIC within that 10 second window before its published externally to a collector. The router implementation is not computing a rate (bits/sec or packets/sec) and, instead, that is left to the external collector. However, the collector has no way of knowing what was the start/stop times for the period of time that the statistics were collected in the linecard CPU DRAM. Having that information in the packet exported to the collector for the corresponding statistic allows the collector the necessary information to calculate a proper, consistent average rate. IMO, this is critically important for determining at lower time periods/granularities what is the average rate of: bits/sec, packets/sec, errors/sec (of various forms), etc. that the ASIC is observing. While I do understand your points of disaggregating the counters to leverage a single timestamp in the Notification packet, that seems like a potentially large overhead when we also consider the potentially large number of interfaces, ASICs and corresponding counters in the system that will need to be disaggregated and exported.

@gcsl
Copy link
Collaborator

gcsl commented Apr 1, 2022

@samante This sounds like a misattribution of timestamps, not the need to send data with different timestamps in a single payload. If a line card is sending a payload every 2s, and the collector is collecting every 10 but for some reason you have 4 vs 5 samples (or maybe even 6) in the window should have no effect on the computed rate if the data that is sent is sent with the timestamp at which the last data was collected. This has the implication that you might not be sending a consistent window size 8, 10 or 12s, but the computed rate should be consistent. If you were attempting to have a consistent window also, given there would be some jitter in the collection, you would instead of triggering the output based on a 10s timer, trigger the output on a multiple of 5 samples. In either case, the timestamp reported to the collector should always be that of the raw data, not an artificially created multiple of the requested sample rate applied at serialization time.

@samante
Copy link

samante commented Apr 1, 2022

@gcsl Your point, above, is logical ... but, unfortunately, is not necessarily how the implementations in the field have been built, to my knowledge. I believe the question posed, here, is: do we request every router implement the behavior you suggest and would that even be feasible with existing router hardware + software or might it be more straightforward to just allow the router to export the timestamp, specifically as an optional attribute in the packet, corresponding to a statistic and, thus, allow the collector to compute the rate appropriately.

@gcsl
Copy link
Collaborator

gcsl commented Apr 6, 2022

@samante The current encoding allows for correct timestamps to be assigned to every given statistic. gNMI is a stream and thus we are talking about whether to send multiple notifications, one update each with the correct timestamp as the current spec is written, vs. the request here to pack multiple updates with different timestamps into the same notification. As stated above, the current encoding was an intentional design decision. We have seen early implementations try to dump the entire tree in a single notification, which is precisely the opposite of our goal for streaming telemetry as it introduces massive latency of the individual updates because the entire tree must be built as a proto in memory, then serialzed, the full proto transmitted and finally deserialized (also completely in memory) before the client can access a single metric. There is overhead in streaming, but the average latency can be reduced by up to half as a client can act on each notification as it arrives. If there are metrics that truly have the same timestamp, they can be bundled into the same notification for additional efficiency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants