Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use custom allocator from publisher option. #2478

Open
wants to merge 2 commits into
base: rolling
Choose a base branch
from

Conversation

fujitatomoya
Copy link
Collaborator

address #2477

Copy link
Collaborator Author

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The related code is introduced by #1557, and that was fine at this moment. then, #1849 changed it to be allocated from stack. that has been addressed by #2443.

all local tests with --packages-select rclcpp pass.

@clalancette can you take a look?

Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me with green CI.

@alsora
Copy link
Collaborator

alsora commented Apr 3, 2024

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator Author

@thomasmoore-torc windows failure https://ci.ros2.org/job/ci_windows/21402/testReport/junit/(root)/rclcpp/test_publisher_with_type_adapter_gtest_missing_result/ seems that it gets crashed on somewhere in test_large_message_unique.

TEST_F(TestPublisher, test_large_message_unique)
{
// There have been some bugs in the past when trying to type-adapt large messages
// (larger than the stack size). Here we just make sure that a 10MB message works,
// which is larger than the default stack size on Linux.
using StringTypeAdapter = rclcpp::TypeAdapter<std::string, rclcpp::msg::LargeMessage>;
auto node = std::make_shared<rclcpp::Node>("my_node", "/ns", rclcpp::NodeOptions());
const std::string topic_name = "topic_name";
auto pub = node->create_publisher<StringTypeAdapter>(topic_name, 1);
static constexpr size_t length = 10 * 1024 * 1024;
auto message_data = std::make_unique<std::string>();
message_data->reserve(length);
std::fill(message_data->begin(), message_data->begin() + length, '#');
pub->publish(std::move(message_data));
}

Do you have any clue for that?

@fujitatomoya
Copy link
Collaborator Author

I can not reproduce this failure with linux platform.

@thomasmoore-torc
Copy link

@fujitatomoya - I don't see anything obvious as to why this would fail on the Windows platform. However, I don't readily have access to a Windows-based development environment to try and debug.

Did the most recent build fail with the same issues after you forced pushed the additional changes? I haven't discovered how to find those build results and the Rpr__rclcpp__ubuntu_noble_amd64 build in the checks seems different.

@fujitatomoya
Copy link
Collaborator Author

Did the most recent build fail with the same issues after you forced pushed the additional changes?

failure really related to this PR, but we can give it a shot only with rclcpp test.

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@fujitatomoya
Copy link
Collaborator Author

@thomasmoore-torc the same failure. I do not have windows either, i would like to ask some help on this verification. in the meantime, i will take a look what went wrong in the source code.

[ RUN      ] TestPublisher.test_large_message_unique
-- run_test.py: return code 3221226356
-- run_test.py: generate result file 'C:/ci/ws/build/rclcpp/test_results/rclcpp/test_publisher_with_type_adapter.gtest.xml' with failed test
-- run_test.py: verify result file 'C:/ci/ws/build/rclcpp/test_results/rclcpp/test_publisher_with_type_adapter.gtest.xml'

@fujitatomoya fujitatomoya added the help wanted Extra attention is needed label Apr 4, 2024
@fujitatomoya
Copy link
Collaborator Author

@clalancette @alsora any thoughts?

@Crola1702
Copy link
Contributor

I think the failing test @fujitatomoya is mentioning was caused by #2443 (check investigation).

It's a consistent issue in Windows Debug

@fujitatomoya
Copy link
Collaborator Author

@Crola1702 thanks for the info.

I got a question why CI #2443 (comment) on #2443 could not detect this warning on windows?

@Crola1702
Copy link
Contributor

I got a question why CI #2443 (comment) on #2443 could not detect this warning on windows?

This is because CI builds (such as the ones you linked) use sequential executor. Nightly builds are using parallel executor.

@fujitatomoya
Copy link
Collaborator Author

@thomasmoore-torc i think #2498 should fix our CI failure, let me try run CI again.

@fujitatomoya
Copy link
Collaborator Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@thomasmoore-torc
Copy link

@fujitatomoya - it looks like that didn't fix it. While it shouldn't be an issue, the implementation of struct TypeAdapter<std::string, rclcpp::msg::LargeMessage> would generally be considered dangerous because it performs blind memcpy operations. It might be worth trying the following change to this code:

  template<>
  struct TypeAdapter<std::string, rclcpp::msg::LargeMessage>
  {
    using is_specialized = std::true_type;
    using custom_type = std::string;
    using ros_message_type = rclcpp::msg::LargeMessage;
  
    static void
    convert_to_ros_message(
      const custom_type & source,
      ros_message_type & destination)
    {
-     destination.size = source.size();
-     std::memcpy(destination.data.data(), source.data(), source.size());
+     destination.size = std::min(source.size(), destination.data.max_size());
+     std::memcpy(destination.data.data(), source.data(), destination.size);
    }
  
    static void
    convert_to_custom(
      const ros_message_type & source,
      custom_type & destination)
    {
      destination.resize(source.size);
      std::memcpy(destination.data(), source.data.data(), source.size);
    }
  };

@fujitatomoya
Copy link
Collaborator Author

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@clalancette
Copy link
Contributor

I poked at this a bit on my Windows VM, and it really doesn't like the change to create_ros_message_unique_ptr for some reason.

In particular, when I built this in Windows Debug, I got the following stack trace:

minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp(904) : Assertion failed: _CrtIsValidHeapPointer(block)
minkernel\crts\ucrt\src\appcrt\heap\debug_heap.cpp(908) : Assertion failed: is_block_type_valid(header->_block_use)
unknown file: error: SEH exception with code 0xc0000005 thrown in the test body.

While I don't know exactly what it means, it suggests to me that create_ros_message_unique_ptr isn't actually allocating the memory we think it is. I think someone needs to go in and validate what this line really allocates.

Comment on lines +121 to +122
destination.size = std::min(source.size(), destination.data.max_size());
std::memcpy(destination.data.data(), source.data(), destination.size);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, while I agree that the old code was dangerous, I think this isn't great either. In particular, if we don't have room in the destination to copy the source in, then we'll copy something truncated, which also seems wrong. Particularly in this test case, I think we should throw an exception if the destination isn't large enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants