Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort #739

Open
dk-teknologisk-lag opened this issue Jan 22, 2024 · 33 comments
Labels
more-information-needed Further information is required

Comments

@dk-teknologisk-lag
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04 - ros:rolling docker image
  • Installation type:
    • binary
  • Version or commit hash:
    • ros-rolling-fastrtps 2.11.2-1jammy.20231004.145650
  • DDS implementation:
    • Fast-RTPS
  • Client library (if applicable):
    • rclcpp

Steps to reproduce issue

It all stems from transferring pointcloud data from Ousters ROS2 driver to any subscriber - ros2bag/ros2 topic echo/hz etc - these also indicates dropped messages.
The minor example here, can reproduce it though.
As far as I know all sensors have their publishing QoS set to BEST_EFFORT, so this is also the case in this example.

To test it do:

1. clone https://github.com/dk-teknologisk-lag/ros2_test
2. open the project in VS-code
3. Build the project when the docker has launched
4. Launch the publisher with the scripts, ie **./pub_10MB.bash** or **ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000**
5. Launch the listener with **ros2 run cpp_pubsub listener**

Expected behavior

Messages get sent and received with the required frequency

Actual behavior

Messages get dropped occasionally, getting worse the higher frequency or package size. See image:
image

Additional information

I have searched everywhere to find a solution, but the majority is suggestion to change buffer sizes, but it doesn't seem to be applicable here, since it uses shared memory.
As seen on the image, it seemingly only uses around 1.5MB and has up to 64MB available.

@fujitatomoya
Copy link
Collaborator

CC: @Barry-Xu-2018

@dk-teknologisk-lag
Copy link
Author

I have opened a discussion here as well as I could reproduce it using their helloworld example with a bit of modifications, fyi - eProsima/Fast-DDS#4276

@Barry-Xu-2018
Copy link
Contributor

I can reproduce this issue. But on host (Not container), message_lost_callback is never called.
If QoS is set as Reliable, there is no problem. This issue is unrelated to the segment size of shared memory.

In FastDDS shared memory example, it also uses Reliable QoS. I simply modify it (topic_qos) to BEST_EFFORT.
I didn't find this issue (The same Fastdds version 2.11.2). Maybe the size of message is small (only 1M).

@dk-teknologisk-lag
Copy link
Author

I have update the example to linked to in the other discussion, but after changing it to BEST_EFFORT and a message size of 10MB I get the same behavior:
image

@dk-teknologisk-lag
Copy link
Author

I just tried to change to RELIABLE and here I also get dropped messages - also if I change back to just sending "Hello world" though a lot fewer:
image

I noticed earlier that when I was using the RELIABLE QoS it didn't printed the lost messages, but inspecting the message IDs, I could see that there was gaps.
In the above image, it jumps from 10195 to 10214, but it could probably be a limit of the console, that prints out of "order". But still, there are jumps in the message ids.

@dk-teknologisk-lag
Copy link
Author

How come it transfers 1024*1024 bytes and not 2 * 1024 * 1024 which the segment size is set to?

Also, whats the difference of the QoS topic vs DataWriter - does they both require the same?

@dk-teknologisk-lag
Copy link
Author

How come it transfers 1024*1024 bytes and not 2 * 1024 * 1024 which the segment size is set to?

Also, whats the difference of the QoS topic vs DataWriter - does they both require the same?

I figured it was the buffer size of data, rather than the size of the segment nor string. Got it working with a string of 10MB.

@dk-teknologisk-lag
Copy link
Author

dk-teknologisk-lag commented Jan 23, 2024

So, if I increase the segment size to 10 * 1024 * 1024, so it can hold an entire message in shared memory, I can run at 1000 hz about 3-500 hz, even though the sleep time is set to 1ms, but there are seemingly no package loss, sending 10MB messages.
image

I guess its the entire HelloWorld data struct that get copied, so I should allocate for 11MB + 4 bytes, since it has its data array of chars, consuming 1024*1024 bytes, and its uint32_t m_index field?

Can I set this using a XML file? Force it to not use builtin transport, but a specific shared memory with larger segment size?

@Barry-Xu-2018
Copy link
Contributor

Can I set this using a XML file? Force it to not use builtin transport, but a specific shared memory with larger segment size?

Do you want to test it on ros2 environment ?
I have not used XML to configure transport on ROS2 before. But I think you can refer to section 6.4.3 in https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html and prepare XML which was described in https://github.com/ros2/rmw_fastrtps/blob/rolling/README.md.

BTW, there is an easy way. Modify segment size at

auto shm_transport =
std::make_shared<eprosima::fastdds::rtps::SharedMemTransportDescriptor>();
domainParticipantQos.transport().user_transports.push_back(shm_transport);

        auto shm_transport =
          std::make_shared<eprosima::fastdds::rtps::SharedMemTransportDescriptor>();
        shm_transport->segment_size(xxxxxx);  // <== change the size of segment 
        domainParticipantQos.transport().user_transports.push_back(shm_transport);

And only rebuild rmw_fastrtps package.

@dk-teknologisk-lag
Copy link
Author

Currently I'm using the binary packages installation, so I would avoid having to deploy a custom build rmw_fastrtps package.

Currently tried with:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>12582912</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="DisableBuiltinTransportsParticipant">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

It doesn't complain about, whereas if I tried with segmentSize it did. But it doesn't seem to have an effect (commented out the shared memory setup in the HelloWorldsharedMem example.

@fujitatomoya
Copy link
Collaborator

Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort

this is expected. either shared memory or not, setting Best Effort means there is always the possibility to drop the message.

https://github.com/dk-teknologisk-lag/ros2_test/blob/853dab56a842c373faf1f585e231511d6a262cb0/src/cpp_pubsub/src/publisher_member_function.cpp#L46

this is Not bounded data type, which cannot use LoanedMessage nor Data Sharing Delivery.

see also eProsima/Fast-DDS#4276

@dk-teknologisk-lag after all, i suggest that you can try with LoanedMessage, message data type must be bounded. (and underneath, rmw_fastrtps will use Data Sharing Delivery to achieve zero copy data sharing.)

here is the demo code, https://github.com/ros2/demos/blob/rolling/demo_nodes_cpp/src/topics/talker_loaned_message.cpp

@EduPonz
Copy link

EduPonz commented Jan 23, 2024

Currently I'm using the binary packages installation, so I would avoid having to deploy a custom build rmw_fastrtps package.

Currently tried with:

<?xml version="1.0" encoding="UTF-8" ?>
<profiles xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <transport_descriptors>
        <!-- Create a descriptor for the new transport -->
        <transport_descriptor>
            <transport_id>shm_transport_only</transport_id>
            <type>SHM</type>
            <segment_size>12582912</segment_size>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="DisableBuiltinTransportsParticipant">
        <rtps>
            <!-- Link the Transport Layer to the Participant -->
            <userTransports>
                <transport_id>shm_transport_only</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>

It doesn't complain about, whereas if I tried with segmentSize it did. But it doesn't seem to have an effect (commented out the shared memory setup in the HelloWorldsharedMem example.

I'm afraid your participant profile is missing the is_default_profile="true" attribute, see for instance here.

@dk-teknologisk-lag
Copy link
Author

Messages get dropped when larger than 0.5MB - using shared memory - QoS is Best_effort

this is expected. either shared memory or not, setting Best Effort means there is always the possibility to drop the message.

https://github.com/dk-teknologisk-lag/ros2_test/blob/853dab56a842c373faf1f585e231511d6a262cb0/src/cpp_pubsub/src/publisher_member_function.cpp#L46

this is Not bounded data type, which cannot use LoanedMessage nor Data Sharing Delivery.

see also eProsima/Fast-DDS#4276

@dk-teknologisk-lag after all, i suggest that you can try with LoanedMessage, message data type must be bounded. (and underneath, rmw_fastrtps will use Data Sharing Delivery to achieve zero copy data sharing.)

here is the demo code, https://github.com/ros2/demos/blob/rolling/demo_nodes_cpp/src/topics/talker_loaned_message.cpp

Yeah, I understand that. But since sending from an actual sensor to PC can run with full 20Hz, with a bit more compressed point cloud format, resulting in 16MB/s for ie. an ouster OS1 lidar, it seems horrible if we can't get 20Hz in IPC out of the box.
But as I experienced, increasing the segment_size, ie the shared memory buffer seems to alleviate the dropped messages.

Yes, its unbound type and hence limited to the shared memory feature and not loaned messages. That could for sure be interesting to look into, but that would require a change in the Ouster driver itself, which is a bit out of scope for our current project.

If we get into cpu overload or timing issues for lidar odometry or something similiar we might try to use the loaned message api.

@dk-teknologisk-lag
Copy link
Author

I'm afraid your participant profile is missing the is_default_profile="true" attribute, see for instance here.

Think I did try that as well, currently debugging to figure out when and how the xml files are parsed. But if the is_default_profile, the... yeah, default profiles get set to those values?

So when this is executed:

eprosima::fastdds::dds::DomainParticipantQos domainParticipantQos =

I should get the xml default values here?

Tried to get it working with the modified examples (only the HelloWorldSharedMem) from here:
eProsima/Fast-DDS@master...dk-teknologisk-lag:Fast-DDS:bestefforthelloworld

But looking more closely, it doesn't seem to use any default QoS, but create its own - or should it work here as well?

But thanks for the suggestion, will try again tomorrow.

@Barry-Xu-2018
Copy link
Contributor

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Using this configuration works. It can significantly reduce the packet loss rate. Even with an increased segment size (test 30M), there is still the phenomenon of packet loss.

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000
RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=pub_sub_config.xml ros2 run cpp_pubsub listener

@dk-teknologisk-lag
Copy link
Author

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Using this configuration works. It can significantly reduce the packet loss rate. Even with an increased segment size (test 30M), there is still the phenomenon of packet loss.

RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml ros2 run cpp_pubsub talker --ros-args -p freq:=10 -p bytesize:=10000000
RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=pub_sub_config.xml ros2 run cpp_pubsub listener

It seems to somewhat work, yes. But it seems to add an additional buffer - ie. take a look at the screenshot below:
image

The red square marks when I launched with xml file you provided. It creates a buffer of 0.5MB and one with 10.5MB.

The blue is launched with xml, but commented the segmet_size, which seems to create a default sized buffer, ie. there are two of 0.5MB

The green is when launched without a xml file, which then just create a single buffer of the 0.5MB.

So it seems it doesn't use the supplied buffer from XML and thats probably why we still see the package loss.

@EduPonz
Copy link

EduPonz commented Jan 24, 2024

Hi @dk-teknologisk-lag,

The second buffer is there because you did not disable to builtin SHM transport, so you're adding a second one. Please try with the following:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

@dk-teknologisk-lag
Copy link
Author

I don't even seem to be able to disable SHM transport, ie. like this (borrowed from eProsima/Fast-DDS#2287):

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
<profiles>
    <transport_descriptors>
        <transport_descriptor>
            <transport_id>udp_transport</transport_id>
            <type>UDPv4</type>
        </transport_descriptor>
    </transport_descriptors>

    <participant profile_name="/topic">
        <rtps>
            <userTransports>
                <transport_id>udp_transport</transport_id>
            </userTransports>
            <useBuiltinTransports>false</useBuiltinTransports>
        </rtps>
    </participant>
</profiles>
</dds>

@dk-teknologisk-lag
Copy link
Author

Hi @dk-teknologisk-lag,

The second buffer is there because you did not disable to builtin SHM transport, so you're adding a second one. Please try with the following:

<?xml version="1.0" encoding="UTF-8"?>
<dds xmlns="http://www.eprosima.com/XMLSchemas/fastRTPS_Profiles">
    <profiles>
        <transport_descriptors>
            <!-- Create a descriptor for the new transport -->
            <transport_descriptor>
                <transport_id>shm_transport</transport_id>
                <type>SHM</type>
                <segment_size>10485760</segment_size>                                                                                                                   
            </transport_descriptor>
        </transport_descriptors>
        <participant profile_name="SHMParticipant" is_default_profile="true">
            <rtps>
                <!-- Link the Transport Layer to the Participant -->
                <userTransports>
                    <transport_id>shm_transport</transport_id>
                </userTransports>
                <useBuiltinTransports>false</useBuiltinTransports>
            </rtps>
        </participant>
    </profiles>
</dds>

Ahh, thanks. Seems to work like this - wonder why it didn't work with the previous so it just used UDP?

@dk-teknologisk-lag
Copy link
Author

Ahh. missed the is_default_profile="true" - seems to work also with UDP using 95MB/s on the loopback interface.

Thanks a lot for the help. Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

@dk-teknologisk-lag
Copy link
Author

So this can't be configure per topic - since it requires the is_default_profile="true" to be added?
Is it only the QoS that can be configured per topic?
https://fast-dds.docs.eprosima.com/en/latest/fastdds/ros2/ros2_configure.html#example

@dk-teknologisk-lag
Copy link
Author

Nevermind, guess I can just omit the xml config file for those nodes, that don't require large amount of shared mem.

@dk-teknologisk-lag
Copy link
Author

A question more. Why does both the publisher and subscriber create a shared memory buffer - according to this diagram
https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html#definition-of-concepts

the shared memory on the subscriber side is not used?

@Barry-Xu-2018
Copy link
Contributor

A question more. Why does both the publisher and subscriber create a shared memory buffer - according to this diagram
https://fast-dds.docs.eprosima.com/en/latest/fastdds/transport/shared_memory/shared_memory.html#definition-of-concepts

the shared memory on the subscriber side is not used?

On subscriber side, I think you don't need to set the segment size.

@dk-teknologisk-lag
Copy link
Author

Yeah, I just tried to make another config, with defaults and it works well, but still creates the smaller shared memory, but guess some of that is used for discovery?

On a side node, I can't get ros2 topic list to show the topic, if I run it with the custom xml profile. Even if I launch ros2 topic list with same xml file.
image

Ros2 topic echo doesn't work either.

And the ros2 topic hz work, but shows only 15 Hz, when I published with 50 - but that might be the transition to python - one core is at least maxed, which seems to be the bottleneck.

@dk-teknologisk-lag
Copy link
Author

If I disable all shared memory and run over UDP it works fine - even though the ros2 topic hz still only show about 15 Hz...

Seems to be what I will go for, for now.

@EduPonz
Copy link

EduPonz commented Jan 24, 2024

Yeah, I just tried to make another config, with defaults and it works well, but still creates the smaller shared memory, but guess some of that is used for discovery?

On a side node, I can't get ros2 topic list to show the topic, if I run it with the custom xml profile. Even if I launch ros2 topic list with same xml file. image

Ros2 topic echo doesn't work either.

And the ros2 topic hz work, but shows only 15 Hz, when I published with 50 - but that might be the transition to python - one core is at least maxed, which seems to be the bottleneck.

This is because you'd need to run ros2 daemon stop before calling to ros2 topic list again. You probably had a daemon running with default transports, which means discovery over UDP only. In any case, another thing you can do is to set a transport descriptor for a UDP transport into you XML and let participants have both. That way you'd have the same as you would by default, but with larger segments in the SHM transport.

Regarding the reader side segment:

  1. ROS 2 nodes always have writers (actually 10 of them out of the box) for different things such as parameter service, ros_discovery_info, etc.
  2. If you only have a SHM transport, the discovery traffic uses it, so there are discovery writers as well
  3. The reliability meta-traffic goes from reader to writer

@dk-teknologisk-lag
Copy link
Author

Ah, yeah okay. Works fine now I restarted the docker, but ros2 daemon stop would probably be fine too!
Thanks for the info.

@fujitatomoya
Copy link
Collaborator

Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

@EduPonz do you think this is something we should adjust on rmw_fastrtps, as far as i know we do not have these kind of setting in rmw_fastrtps, right? so this can be moved to https://github.com/eProsima/Fast-DDS? i am not sure if we want set or change the default for ROS 2, small or big is really application dependency.

if we are not changing any default, i think we can close this issue.

@dk-teknologisk-lag
Copy link
Author

Think we can close this, unless default should be something else than a 0.5MB, which seems quite low in a ROS application?

@EduPonz do you think this is something we should adjust on rmw_fastrtps, as far as i know we do not have these kind of setting in rmw_fastrtps, right? so this can be moved to https://github.com/eProsima/Fast-DDS? i am not sure if we want set or change the default for ROS 2, small or big is really application dependency.

if we are not changing any default, i think we can close this issue.

From my viewpoint, things should work out of the box. Generally, you should be able to send small messages even if you have allocated a "large" shared memory pool, but the other way around leads to packet drop, hence this issue.

The question of how large default should be, is of course a bit difficult to guess, but if you cover most cases, one could look towards large point clouds or 8K resolution images from cameras and set that as a target point - that should probably cover most cases.

The only downside is though, that you can run out of shared memory. In default docker its only 64MB, but you get a nice error message that it could not allocate space, if you run short of it.

In comparison, we have a NUC PC which has about 7GB shared mem and my laptop has 32GB. So a default of 10MB or 20, would only be small subset of those. Double the size, if its not easy to set a lower value for subscribers (see below).

If the package you try to send are larger than the shared memory available, you get no warning / error - just lower Hz / dropped messages.

A question more:
Is it possible to configure one SHM setting for publishers and a second for subscribers? I find it quite unfortunate that I have to prefix all ros commands, with
RMW_FASTRTPS_USE_QOS_FROM_XML=1 FASTRTPS_DEFAULT_PROFILES_FILE=my_config.xml
Think that the QOS_FROM_XML can be omitted in my case, but still.
It would be nice to be able to set it for the entire system, instead of X number of sensor nodes.

Alternative to increasing the default value, could be to parameterize it, so when you create a publisher you can specify the amount of shared memory, which the driver maintainer then can estimate based on the optimal for each of their drivers/sensors?

@Pleune
Copy link

Pleune commented Jun 6, 2024

I would like to add that I have run into this exact issue trying to view (I think) small images in rqt, where anything over 420x420 resolution plays extremely poorly. This happens when rqt can no longer get the entire message through shm, and I guess struggles with the udp method. I absolutely believe that the ros defaults should be changed to have a shm pool large enough for rqt to work on an average webcam.

@Mario-DL
Copy link
Contributor

Just to add some more insight in the configuration options. For large data transmissions we have max_msg_size and sockets_size to adjust, among other things, the size of the shm segment sizes.

@fujitatomoya
Copy link
Collaborator

I think that it would be probably better to have rmw_fastrtps configuration and setting in https://docs.ros.org/en/rolling/ about these kind of special settings. we already have some information in rmw_fastrtps repo, e.g https://github.com/ros2/rmw_fastrtps?tab=readme-ov-file#large-data-transfer-over-lossy-network, but that is not where users would check.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
more-information-needed Further information is required
Projects
None yet
Development

No branches or pull requests

7 participants