-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix data race when accessing unversionedHome #3669
Conversation
It is possible to have a data race when reading/modifying unversionedHome via the helper functions. This commit adds a mutex to prevent the data race.
This pull request does not have a backport label. Could you fix it @belimawr? 🙏
NOTE: |
SonarQube Quality Gate |
We have had this fix attempted before, I'm not convinced the mutex actually fixes the underlying bug, it only silences the race detector: Reposting that comment below |
Looking at the stack trace in the race, the race is triggered by test code:
I think it might be specifically coming from this function changing the value that was read by the command runtime used for the fake components: elastic-agent/pkg/component/runtime/manager_test.go Lines 3272 to 3290 in 2675646
I'm not actually sure that the mutexes here fix the problem. They guarantee only one goroutine reads or writes at a time, but it gives you no guarantee about the ordering of those reads an writes. If one goroutine reads the value and continues on, then another updates it, then the update is effectively lost if the first goroutine never reads the value again. It's like it never happened. I think this is what the race detector is telling us here. If my understanding of this is correct then:
elastic-agent/pkg/component/runtime/manager_test.go Lines 169 to 176 in 2675646
I would bet that the problem is actually a race from the previous instance of the runtime manager since the paths are all global, every goroutine in this test is reading from the same source. The ordering I suspect is:
If this is true the fix isn't mutexes, it's to stop using global state. We need to be able to create a set of paths per test. We could do this by having the paths all read from a single structure which can be atomically swapped out or copied. In the example above test A would never see the global paths write from test B because they would each have their own copy of the paths. You could probably also fix this by guaranteeing that all goroutines from previous tests exited before starting the next one. Doing both things would be nice, but you probably only need one of them. |
💔 Tests Failed
Expand to view the summary
Build stats
Test stats 🧪
Test errorsExpand to view the tests failures
|
Thanks @cmacknz. I saw it was triggered by test come, but because the container command uses one of those functions, I was not sure it it could actually happen. Avoiding global state is definitely a good idea. I'll close this PR and re-run the tests on my PR. |
What does this PR do?
It is possible to have a data race when reading/modifying unversionedHome via the helper functions. This PR adds a mutex to prevent the data race.
Why is it important?
It fixes this data race.
Jenkins Job: https://fleet-ci.elastic.co/blue/organizations/jenkins/elastic-agent%2Felastic-agent-mbp/detail/PR-3465/29/pipeline/
Failing PR: #3465
Checklist
- [ ] I have commented my code, particularly in hard-to-understand areas- [ ] I have made corresponding changes to the documentation- [ ] I have made corresponding change to the default configuration files- [ ] I have added tests that prove my fix is effective or that my feature works./changelog/fragments
using the changelog tool- [ ] I have added an integration test or an E2E test## Author's ChecklistHow to test this PR locally
You can run the test that triggered the data race
pkg/component/runtime TestManager_FakeInput_Configure
however it's not guaranteed that the data race will trigger. I ran the tests a 100 times and it was not enough to trigger the condition on my machine.## Related issues## Use cases## Screenshots## LogsQuestions to ask yourself