Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an optional timeout_sec input to Client.call() to fix issue https://github.com/ros2/rclpy/issues/1181 #1188

Merged
merged 9 commits into from
Nov 17, 2023

Conversation

KKSTB
Copy link
Contributor

@KKSTB KKSTB commented Oct 17, 2023

Add an optional timeout_sec input so that the call() function can break away from the deadlock if RMW fails to response. The call() function returns future.result() which is None.

Another approach of adding timeout to the future returned by call_async() is much more difficult to implement, requiring changes from the Future class and upward. Also there is no timeout input for the rclcpp version of async_send_request(). So I would prefer not taking this approach instead.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, we could deprecate those sync APIs like this as rclpy, because rclpy provides async APIs and user can manage from there, including these timeout behavior as they like.

On the other hand, I see this is one of the enhancements, at lease at this moment.

any thoughts?

@KKSTB
Copy link
Contributor Author

KKSTB commented Oct 19, 2023

No idea too. I think it depends on how rclpy in ROS2 is positioned:

  1. if rclpy is to provide the fundamentals only, then the sync APIs are too much to exist
  2. if rclpy is to also provide implementations that others would normally do and would likely frequently do, then the sync APIs can provide a neat solution, and probably help avoid common problems that would probably encounter by having a proper implementation and with issues fixed

@sloretz sloretz self-assigned this Nov 2, 2023
Copy link
Contributor

@sloretz sloretz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one nit addressed

rclpy/rclpy/client.py Show resolved Hide resolved
@fujitatomoya
Copy link
Collaborator

address #1181

@fujitatomoya
Copy link
Collaborator

@KKSTB can you add the test for timeout case to https://github.com/ros2/rclpy/blob/rolling/rclpy/test/test_client.py, then i will start the CI.

@KKSTB
Copy link
Contributor Author

KKSTB commented Nov 14, 2023

This is a bit difficult now as I cannot build rclpy successfully due to recent change in rcl_node_type_description_service_init() ros2/rcl@1b79535

Have to wait for rolling update.

@fujitatomoya
Copy link
Collaborator

@KKSTB i am not sure what you mean but you are doing the source build? https://docs.ros.org/en/rolling/Installation/Alternatives/Ubuntu-Development-Setup.html#build-ros-2, this is what we do for the mainline rolling development.

@KKSTB
Copy link
Contributor Author

KKSTB commented Nov 15, 2023

@fujitatomoya thanks. Although it takes a long time to build from source.

I was originally using binary install of rolling, therefore failed to build.

Copy link
Collaborator

@fujitatomoya fujitatomoya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall lgtm, several comments to discuss.

rclpy/test/test_client.py Outdated Show resolved Hide resolved
rclpy/rclpy/client.py Show resolved Hide resolved
rclpy/test/test_client.py Outdated Show resolved Hide resolved
rclpy/test/test_client.py Outdated Show resolved Hide resolved
rclpy/test/test_client.py Outdated Show resolved Hide resolved
KKSTB and others added 2 commits November 16, 2023 09:50
2. Remove test_sync_call_slow()
3. Shorten test time

Signed-off-by: KKSTB <[email protected]>
@fujitatomoya
Copy link
Collaborator

@ros-pull-request-builder retest this please

@clalancette
Copy link
Contributor

The Rpr jobs aren't going to succeed here until we do releases of rcl, rclpy, and rclcpp. We'll just have to rely on the jobs from https://ci.ros2.org for now.

@fujitatomoya
Copy link
Collaborator

CI:

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

Signed-off-by: KKSTB <[email protected]>
@KKSTB
Copy link
Contributor Author

KKSTB commented Nov 17, 2023

Yet another flake8 and fixed

@fujitatomoya
Copy link
Collaborator

yeah you fixed the final flake8 error, thanks starting CI 🤞

Building on the built-in node in workspace /var/lib/jenkins/jobs/ci_launcher/workspace

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Windows Build Status

@tonynajjar
Copy link
Contributor

I was just about to implement this and saw that it's already on rolling, can this be backported to humble please?

tonynajjar pushed a commit to angsa-robotics/rclpy that referenced this pull request Apr 12, 2024
@tonynajjar
Copy link
Contributor

Additionally a comment about the PR. It's surprising to me that you went with returning None if the call timed out instead of raising an Exception. Just returning None increases the risk of "silent bugs", especially for services not expecting a response.
@fujitatomoya would you be open if I opened a PR to raise an Exception?

@fujitatomoya
Copy link
Collaborator

Additionally a comment about the PR. It's surprising to me that you went with returning None if the call timed out instead of raising an Exception. Just returning None increases the risk of "silent bugs", especially for services not expecting a response. @fujitatomoya would you be open if I opened a PR to raise an Exception?

@tonynajjar @KKSTB can you review #1271?

i think it makes more sense to raise TimeoutError instead of None that goes not give us any information.

@fujitatomoya
Copy link
Collaborator

I was just about to implement this and saw that it's already on rolling, can this be backported to humble please?

@tonynajjar i am not inclined to do that. #1188 (review), probably we eventually deprecate this method in the future... @clalancette @sloretz @adityapande-1995 @wjwwood what do you think?

@wjwwood
Copy link
Member

wjwwood commented Apr 14, 2024

We can't change a stable function in Humble to suddenly be able to throw an exception (or return None). You could consider a rewrite of this feature as a separate function, so it's purely new API. But even then we have to be careful that we don't accidentally change the behavior of existing functions either intentionally or as a new bug. That's why we're generally not inclined to back port features to older versions.

@KKSTB
Copy link
Contributor Author

KKSTB commented Apr 15, 2024

Thanks @tonynajjar. You are right that returning None increases the chance of silent bug. What I considered at that time was to return whatever future.result() has without modifying the function heavily, therefore resulting in this. Maybe its an improvement to raise TimeoutError.

As for backporting to older version, I also hope this to happen in ROS Iron which I am using. But I go with the way similar to what @wjwwood suggest so that the fix (or any other customization or convenient wrappers) becomes readily available in my version of ROS Iron:

  1. Make a new class inheriting Node and Client, and customize the Client.call()
  2. override node.create_client() that returns your custom Client class, by assigning __class__ to your new Client class, like this: https://stackoverflow.com/questions/8062161/can-i-dynamically-convert-an-instance-of-one-class-to-another
  3. You may create a custom rclpy class that overrides rclpy.create_node() to conveniently create your custom node.

Its just my personal suggestion as a ROS user, and not representing what ROS team is suggesting. Hope that help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants