Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix goroutine leaks in grpc integration tests #5340

Conversation

WillSewell
Copy link
Contributor

@WillSewell WillSewell commented Apr 8, 2024

Which problem is this PR solving?

Description of the changes

  • The key issue was that the grpc client was only being killed in a runtime.SetFinalizer - i.e. when it is GCed. I think in the tests this was not shutting down before goleak had decided that the goroutine had leaked.
  • This change switches to a more conventional approach of killing the client as part of the Close method and then ensuring this is called in each of the tests.

How was this change tested?

  • While this change does not include a call to activate goleak, it was tested with this. The PR does not include goleak detection because there are still other violations related to elasticsearch in this package.

Checklist

Copy link

codecov bot commented Apr 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 95.21%. Comparing base (4681e36) to head (1b2cdee).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5340      +/-   ##
==========================================
+ Coverage   95.18%   95.21%   +0.03%     
==========================================
  Files         343      343              
  Lines       16780    16787       +7     
==========================================
+ Hits        15972    15984      +12     
+ Misses        608      605       -3     
+ Partials      200      198       -2     
Flag Coverage Δ
badger 10.58% <0.00%> (-0.01%) ⬇️
cassandra-3.x 18.51% <0.00%> (-0.02%) ⬇️
cassandra-4.x 18.51% <0.00%> (-0.02%) ⬇️
elasticsearch-5.x 20.97% <0.00%> (-0.02%) ⬇️
elasticsearch-6.x 20.97% <0.00%> (-0.03%) ⬇️
elasticsearch-7.x 21.02% <0.00%> (-0.03%) ⬇️
elasticsearch-8.x 21.21% <0.00%> (-0.03%) ⬇️
grpc 14.68% <100.00%> (+0.10%) ⬆️
kafka 10.21% <0.00%> (-0.01%) ⬇️
opensearch-1.x 21.07% <0.00%> (-0.02%) ⬇️
opensearch-2.x 21.07% <0.00%> (-0.02%) ⬇️
unittests 91.75% <50.00%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@WillSewell WillSewell force-pushed the fix-goroutine-leaks-in-grpc-integration-tests branch from 643e366 to e1db93f Compare April 8, 2024 22:31
The key issue was that the grpc client was only being killed in
a `runtime.SetFinalizer` - i.e. when it is GCed. I think in the
tests this was not shutting down before goleak had decided that
the goroutine had leaked.

This change switches to a more conventional approach of killing
the client as part of the Close method and then ensuring this
is called in each of the tests.

While this change does not include a call to active goleak, it
was tested with this. It does not include goleak detection
because there are still other violations related to elasticsearch
in this package.

Signed-off-by: Will Sewell <[email protected]>
@WillSewell WillSewell force-pushed the fix-goroutine-leaks-in-grpc-integration-tests branch from e1db93f to 74d4c6e Compare April 8, 2024 22:33
@WillSewell WillSewell marked this pull request as ready for review April 8, 2024 22:36
@WillSewell WillSewell requested a review from a team as a code owner April 8, 2024 22:36
@yurishkuro yurishkuro added the changelog:test Change that's adding missing tests or correcting existing tests label Apr 8, 2024
@yurishkuro
Copy link
Member

I merged a large PR that affected some of the test files. There are now merge conflicts (which would've been more difficult to resolve in that large PR than here, so I picked this one to resolve conflicts).

@WillSewell
Copy link
Contributor Author

No worries. The changes actually make this PR simpler.

One area that is pretty unpleasant about this change is the fact that that each subtest is calling CleanUp which in this case is actually initializing for the following subtest. This means the final subtest pointlessly initializes. For now I've taken the approach of just explicitly closing this after each test.

I think a nicer approach would be to introduce separate a Initialize callback (separate from CleanUp) but this is a larger change because I think it should be done across all integration tests.

@yurishkuro
Copy link
Member

I also don't like how CleanUp functions often call initialize(), this was not at all how it was meant to work, the cleanup functions were intended to remove data from the DB, not to reset the connection completely.

I am not opposed to larger refactoring (may not be as large as it seems) if we can make the workflow more robust and more intuitive.

I don't even remember why Refresh callback was needed - nothing implements it except for ES, and in case of ES the reason seems to be to ensure that the data is queryable after write once Refresh returns, which actually doesn't seem to work reliably anyway and we already have most of the assertions on reading data executing within waitForCondition, which makes synchronous flushing irrelevant.

@@ -109,6 +113,7 @@ func TestGRPCStorage(t *testing.T) {
}
s.initialize(t)
s.RunAll(t)
s.close(t)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should really be called before RunAll via defer()

Signed-off-by: Yuri Shkuro <[email protected]>
@yurishkuro yurishkuro enabled auto-merge (squash) April 11, 2024 21:52
@yurishkuro yurishkuro merged commit a9a7bc2 into jaegertracing:main Apr 11, 2024
37 checks passed
@WillSewell WillSewell deleted the fix-goroutine-leaks-in-grpc-integration-tests branch April 12, 2024 08:07
varshith257 pushed a commit to varshith257/jaeger that referenced this pull request May 3, 2024
## Which problem is this PR solving?
- Solves part of jaegertracing#5006

## Description of the changes
- The key issue was that the grpc client was only being killed in a
`runtime.SetFinalizer` - i.e. when it is GCed. I think in the tests this
was not shutting down before goleak had decided that the goroutine had
leaked.
- This change switches to a more conventional approach of killing the
client as part of the Close method and then ensuring this is called in
each of the tests.

## How was this change tested?
- While this change does not include a call to activate goleak, it _was_
tested with this. The PR does not include goleak detection because there
are still other violations related to elasticsearch in this package.

## Checklist
- [x] I have read
https://github.com/jaegertracing/jaeger/blob/master/CONTRIBUTING_GUIDELINES.md
- [x] I have signed all commits
- [ ] I have added unit tests for the new functionality
- [ ] I have run lint and test steps successfully
  - for `jaeger`: `make lint test`
  - for `jaeger-ui`: `yarn lint` and `yarn test`

---------

Signed-off-by: Will Sewell <[email protected]>
Signed-off-by: Yuri Shkuro <[email protected]>
Co-authored-by: Yuri Shkuro <[email protected]>
Signed-off-by: Vamshi Maskuri <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog:test Change that's adding missing tests or correcting existing tests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants