Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orbit should retry extension registry in case of failure #4337

Closed
lucasmrod opened this issue Feb 23, 2022 · 3 comments
Closed

Orbit should retry extension registry in case of failure #4337

lucasmrod opened this issue Feb 23, 2022 · 3 comments
Labels
~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) ~backend Backend-related issue.

Comments

@lucasmrod
Copy link
Member

Goal

The issue with #3678 was that occasionally extension registry took more than 3s, so, Orbit, after timing out, was bringing both itself and osquery down (due to our use of oklog.Execute).

We fixed this in #3836 by increasing the timeout for Orbit's extension registry from 3s to 5m.

Zach also proposed retrying extension registering in Runner.Execute (see #3836 (comment)):
The retry would increase reliability in case of some (timeout and non-timeout) failure of the extension runner or any other issue in osquery.

We are currently just exiting the Runner.Execute function when Run fails, we should instead retry:

if err := r.srv.Run(); err != nil {
return err
}

@lucasmrod lucasmrod added ~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) idea ~backend Backend-related issue. labels Feb 23, 2022
@lucasmrod
Copy link
Member Author

Some bad news:

Even with the 5m timeout, I've hit the "i/o timeout error while registering extensions" in our CI/automation on Windows: https://github.com/fleetdm/fleet/actions/runs/1986053580/attempts/1

So, I think we should do both:

  1. Implement Zach's proposal to retry extension registration instead of failing (this issue).
  2. Create a separate issue to troubleshoot what's really going on. It's probably a bug in osquery or osquery-go (Extension cannot connect on first try when loaded automatically osquery/osquery-go#80).

@zhumo
Copy link
Contributor

zhumo commented Sep 28, 2022

What's the status of this?

@lucasmrod
Copy link
Member Author

Extension registering in Windows seems to be more stable in recent releases. There have been changes like this one that could have possibly help with this issue.

We haven't received reports of this issue for a while now. Let's close and re-open if need be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
~agent Related to Fleet's osquery runtime and agent autoupdater (Orbit) ~backend Backend-related issue.
Development

No branches or pull requests

3 participants