Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix race condition in citus_set_coordinator_host when adding multiple…
… coordinator nodes concurrently (#7682) When multiple sessions concurrently attempt to add the same coordinator node using `citus_set_coordinator_host`, there is a potential race condition. Both sessions may pass the initial metadata check (`isCoordinatorInMetadata`), but only one will succeed in adding the node. The other session will fail with an assertion error (`Assert(!nodeAlreadyExists)`), causing the server to crash. Even though the `AddNodeMetadata` function takes an exclusive lock, it appears that the lock is not preventing the race condition before the initial metadata check. - **Issue**: The current logic allows concurrent sessions to pass the check for existing coordinators, leading to an attempt to insert duplicate nodes, which triggers the assertion failure. - **Impact**: This race condition leads to crashes during operations that involve concurrent coordinator additions, as seen in #7646. **Test Plan:** - Isolation Test Limitation: An isolation test was added to simulate concurrent additions of the same coordinator node, but due to the behavior of PostgreSQL locking mechanisms, the test does not trigger the edge case. The lock applied within the function serializes the operations, preventing the race condition from occurring in the isolation test environment. While the edge case is difficult to reproduce in an isolation test, the fix addresses the core issue by ensuring concurrency control through proper locking. - Existing Tests: All existing tests related to node metadata and coordinator management have been run to ensure that no regressions were introduced. **After the Fix:** - Concurrent attempts to add the same coordinator node will be serialized. One session will succeed in adding the node, while the others will skip the operation without crashing the server. Co-authored-by: Mehmet YILMAZ <[email protected]>
- Loading branch information