Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add diagnostic_msgs/Heartbeat message #179

Draft
wants to merge 2 commits into
base: noetic-devel
Choose a base branch
from

Conversation

peci1
Copy link

@peci1 peci1 commented Apr 28, 2022

Deep analysis of the current state in ROS 1 along with discussion can be found here: https://discourse.ros.org/t/add-heartbeat-message-type/24162 .

A short summary for why this message should be added:

The usage I’m most interested in is being able to figure out (both live and from recording) the rate of publication and delay (at publish time) of messages on a single topic that are too big to have multiple subscribers or are too big to be recorded at full frequency. I also want to be able to check the rate/delay-related properties of individual published messages (i.e. reading their Header). As this need showed up several times in different places during my professional life, I’d like to converge at something that is easily and generically usable in many places and that has hope for having good support in generic diagnostics tools.

A longer summary of the ROS 1 analysis:

  • no suitable Heartbeat message type is available in "official" ROS repos
  • runtime checks
    • Topic statistics can be used to check frequency/delay of messages
      • this is a global option - either all topics or no topics (it can be selectively turned on/off for nodes by changing the value of /enable_statistics parameter when launching nodes, but it is impractical as it requires synchronizing the launch sequence)
      • topic statistics do not provide information about which exact message was missing or late
      • all statistics are published to a single topic (/statistics) so finding data relevant for a single topic needs matching the topic name against all messages in the topic
      • it requires no change to existing code, all subscribers implicitly support it
      • the statistics are available only when at least one subscriber is present (but it is not necessary to subscribe to the topic from the statistics-analyzing node)
    • FrequencyStatus and TimeStampStatus diagnostic tasks can be used to check frequency/delay of published messages
      • setup of these diagnostic tasks can be sometimes simple and sometimes more complicated (switching to AsyncSpinner and so on)
      • the tasks do not provide information about which exact message was missing or late
      • getting machine-readable information from the tasks requires string-parsing, which is error-prone
      • the statistics are provided even when there are no subscribers of the topic
    • Heartbeat diagnostic task is not suitable as it reports a plain timer-based heartbeat telling that the process of the node is running
    • Using Bond messages would break the ROS principle of using semantically-fitting message types (bond is inherently point-to-point, publishers are point-to-multipoint)
  • recordable (post-mission debug) checks
    • rate/delay of recorded topics can be found out pretty easily by playing them back and running rostopic hz, looking at rqt_bag visualization etc.
    • non-recorded topic statistics can be found in bag files if topic statistics were enabled and the /statistics topic was recorded
      • to debug a single topic, you need to filter the messages on /statistics by topic name
      • there is no way of figuring out which exact message was missing/late
      • statistics are available only for publishers that had at least one subscriber
      • on a system with a lot of connections, recording /statistics can have noticeable performance impact (observed in SubT Virtual challenge)
    • FrequencyStatus and TimeStampStatus tasks can be recorded and used in playback to give an idea how the publications were working
      • to get exact frequencies/delays, string-parsing is needed
      • there is no way of figuring out which exact message was missing/late

@ros-discourse
Copy link

This pull request has been mentioned on ROS Discourse. There might be relevant details there:

https://discourse.ros.org/t/add-heartbeat-message-type/24162/27

Copy link
Member

@tfoote tfoote left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’d like to converge at something that is easily and generically usable in many places and that has hope for having good support in generic diagnostics tools.

I completely agree with this sentiment. And want to get there too. But I think that this message as proposed is a decent beginning. However I think that it's important to not try to pick the message before fleshing out the use cases and validating that this has the right data necessary to use it for the invisioned use cases from https://discourse.ros.org/t/add-heartbeat-message-type/24162

With a proposal to common_msgs we typically want to have a full reference implementation available for the evaluation to make sure that we're doing things correctly. Because if we create one and then find that it's not right our stability standards are very high and generally we won't want to change anything in a non-backwards compatible way within the distro, because diagnostic_msgs is very near the core.

It's generally going to be a better idea to prototype this in a package of it's own with the implementations nearby to demonstrate it. And then once there's a full demonstration that can be evaluated we can consider promoting the message into the core once it has been proven to be valuable.

On the technical side should we be indicating a way to correleate the data to the intended parallel stream of data?

# This message can be for example sent along with a large message (like point cloud)
# to allow another node to measure publishing frequency and delay without subscribing
# to the large message itself.
Header header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that a header is appropriate. The frame_id is irrelevant here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also what is the policy for the timestamp data? It's presumably important to set that on publish?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the thing. If you put just stamp there, you have to write all the analytics tools from scratch. If you put there Header, you can rostopic hz, you can add TimeStampStatus diagnostics etc. right away.

And I wouldn't say frame_id is irrelevant. Consider e.g. a topic where you mix messages from multiple sensors and differentiate between them via frame_id (not saying this is optimal, but it is a supported case IMO). Yes, rostopic hz would lose sense on such a topic, but it would still be super-easy to write some kind of demux node which would then provide several streams of diagnostic data, one for each frame_id.

Yes, maybe it should be explicitly specified that stamp should be just whatever stamp was put in the parallel data stream (if there is any), or the publish time (if used "standalone").

@peci1
Copy link
Author

peci1 commented May 5, 2022

Thank you for the review, Tully. I'll try to make a set of packages utilizing this concept and report back here when it is done.

The approach I used to correlate the heartbeats to the parallel stream was by using a naming convention - I appended /heartbeat to the name of the parallel topic. So we e.g. have os_cloud_node/points with pointclouds and os_cloud_node/points/heartbeat with their heartbeat. I think this follows the same logic as e.g. camera_info.

@tfoote
Copy link
Member

tfoote commented May 5, 2022

Thanks a set of packages with a prototype would be great.

Yeah, that sort of naming convention makes sense generally. That should be documented clearly so people understand it. And it might be good to play out some of the cases that might be a problem, such as remapping, muxing, recording and playback etc.

I'm going to bump this to be a draft PR so it's not in our review queue for now.

@tfoote tfoote marked this pull request as draft May 5, 2022 22:49
@peci1
Copy link
Author

peci1 commented Apr 9, 2023

Slowly getting there:

Message: https://github.com/ctu-vras/cras_msgs/blob/master/msg/Heartbeat.msg

topic_tools-like library publishing heartbeat of any messages with header: https://github.com/ctu-vras/ros-utils/blob/master/cras_topic_tools/src/heartbeat.cpp (usage).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants