Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky Test: RingHash_SwitchToLowerPriorityAndThenBack #7783

Open
easwars opened this issue Oct 25, 2024 · 0 comments
Open

Flaky Test: RingHash_SwitchToLowerPriorityAndThenBack #7783

easwars opened this issue Oct 25, 2024 · 0 comments
Labels
Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Type: Bug

Comments

@easwars
Copy link
Contributor

easwars commented Oct 25, 2024

We haven't see this so far on GitHub Actions, but it seems like this might be a bug in the code rather than in the test.

Full test log here: https://pastebin.com/u2v2JshT

    third_party/golang/grpc/internal/grpctest/grpctest.go:44: Leaked goroutine: goroutine 14686 [select]:
        [google3/third_party/golang/grpc/xds/internal/balancer/outlierdetection/outlierdetection.](http://google3/third_party/golang/grpc/xds/internal/balancer/outlierdetection/outlierdetection.)(*outlierDetectionBalancer).run(0xc000f6cf70)
        	third_party/golang/grpc/xds/internal/balancer/outlierdetection/balancer.go:687 +0x285
        created by [google3/third_party/golang/grpc/xds/internal/balancer/outlierdetection/outlierdetection.bb.Build](http://google3/third_party/golang/grpc/xds/internal/balancer/outlierdetection/outlierdetection.bb.Build) in goroutine 14678
        	third_party/golang/grpc/xds/internal/balancer/outlierdetection/balancer.go:76 +0x991
    third_party/golang/grpc/internal/grpctest/grpctest.go:44: Leaked goroutine: goroutine 14687 [select]:
        [google3/third_party/golang/grpc/internal/grpcsync/grpcsync.](http://google3/third_party/golang/grpc/internal/grpcsync/grpcsync.)(*CallbackSerializer).run(0xc00120f3a0, {0x1d08a78, 0xc0017093b0})
        	third_party/golang/grpc/internal/grpcsync/callback_serializer.go:88 +0x1e9
        created by [google3/third_party/golang/grpc/internal/grpcsync/grpcsync.NewCallbackSerializer](http://google3/third_party/golang/grpc/internal/grpcsync/grpcsync.NewCallbackSerializer) in goroutine 14678
        	third_party/golang/grpc/internal/grpcsync/callback_serializer.go:52 +0x205
    third_party/golang/grpc/internal/grpctest/grpctest.go:71: Goroutine leak check disabled for future tests
</details>

The problem seems to be as follows:

  • There are two priorities with backends in both
  • We start with priority0, and the priority becomes READY, and RPCs succeed
  • The backend then fails, priority0 is no longer usable, and we switch to priority1
  • RPCs succeed to priority1
  • The backend in priority0, comes back up, we switch back to it, and RPCs succeed

The test passes. But there is a leaked goroutine. Basically, the child of priority1, which is outlier detection is not closed. And the child of outlier detection, which is clusterimpl is not closed either.

I believe the problem arises because when priority1 is closed, it is moved to the idle cache in the balancergroup, but when the priority LB is closed soon after, for some reason, the child in the idle cache is not being cleaned up.

This failure happens about 2 times out of 100k, but I feel it is worth investigating.

@purnesh42H purnesh42H added the Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. label Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: Testing Includes tests and testing utilities that we have for unit and e2e tests within our repo. Type: Bug
Projects
None yet
Development

No branches or pull requests

2 participants