Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bridge: implement MSGQ to ZMQ bridge with subscriber-based publishing #32862

Merged
merged 2 commits into from
Aug 28, 2024

Conversation

deanlee
Copy link
Contributor

@deanlee deanlee commented Jun 28, 2024

resolve commaai/msgq#282, Implementing an efficient MSGQ to ZMQ bridge that only publishes services with active subscribers.

Changes:

  1. New MSGQ to ZMQ Bridge Implementation: Utilizes ZMQ's socket monitoring to bridge MSGQ services to ZMQ efficiently.
  2. Active Subscriber Detection: Only connects to MSGQ services requested by subscribers and publishes events when subscribers are active. Closes the connection to MSGQ if no subscribers are active.
  3. Resource Management: Enters a sleep state when no subscribers are available, ensuring minimal overhead and scalability.
  4. Always-On Operation: With no overhead when there are no subscribers, the bridge can always run on the car.

This update ensures efficient resource utilization and scalability by dynamically managing connections based on subscriber activity.

Copy link
Contributor

github-actions bot commented Jun 28, 2024

Thanks for contributing to openpilot! In order for us to review your PR as quickly as possible, check the following:

  • Convert your PR to a draft unless it's ready to review
  • Read the contributing docs
  • Before marking as "ready for review", ensure:
    • the goal is clearly stated in the description
    • all the tests are passing
    • the change is something we merge
    • include a route or your device' dongle ID if relevant

@deanlee deanlee force-pushed the msgq_to_zmq_bridge branch 6 times, most recently from 39328a2 to 58d6a7b Compare June 28, 2024 16:28
if (errno == EINTR) {
// Due to frequent EINTR signals from msgq, introduce a brief delay (200 ms)
// to reduce CPU usage during retry attempts.
util::sleep_for(200);
Copy link
Contributor Author

@deanlee deanlee Jun 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add a sleep here to reduce CPU usage while polling.
the MSGQ uses SIGUSR1 for notification between publishers and subscribers, resulting in extremely frequent EINTR signals. This interrupts the poll function frequently, leading to a busy retry loop. This can increase CPU usage from an expected 0% to around 4% during the polling period while waiting for subscribers to connect. Other interrupt-driven modules like camerad, sensorsd may face similar issues.

There's also a bug related to the SIGUSR1 signal: even after the subscriber disconnects, msgq's publisher keeps sending signals to the closed subscriber process.

commaai/msgq#617 is a solution. However, it still needs more testing to confirm its reliability.
msqg: use futex for IPC notifications commaai/msgq#625 is a soluton

@deanlee deanlee force-pushed the msgq_to_zmq_bridge branch 5 times, most recently from d86738b to 6591636 Compare June 30, 2024 10:44
@deanlee deanlee marked this pull request as ready for review June 30, 2024 10:51
@deanlee deanlee force-pushed the msgq_to_zmq_bridge branch 2 times, most recently from 1ae1769 to 0f51583 Compare July 8, 2024 14:50
@deanlee deanlee closed this Jul 12, 2024
@deanlee deanlee reopened this Jul 12, 2024
Copy link
Contributor

This PR has had no activity for 14 days. It will be automatically closed in 3 days if there is no activity.

@adeebshihadeh adeebshihadeh merged commit 2faa08c into commaai:master Aug 28, 2024
16 checks passed
@sshane
Copy link
Contributor

sshane commented Aug 28, 2024

@deanlee now PJ streaming looks rather choppy and sometimes skips a split second of data, can you take another look at this?

@deanlee
Copy link
Contributor Author

deanlee commented Aug 29, 2024

@sshane I didn't find issue when reviewing the code. Running zmq=1 ./junggle.py --stream also appears to be normal.Has this problem always been present, or is it intermittent?

@deanlee
Copy link
Contributor Author

deanlee commented Aug 29, 2024

I found the issue.looks like the old bridge has the same issue too.I'll fix it when back to my computer

ZwX1616 pushed a commit that referenced this pull request Sep 13, 2024
…#32862)

implement MSGQ to ZMQ bridge with subscriber-based publishing

Co-authored-by: Adeeb Shihadeh <[email protected]>
old-commit-hash: 2faa08c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bridge: only republish services that have subscribers
3 participants