Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

netty: netty tests resource leakage issue #11593

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

vinodhabib
Copy link
Contributor

@vinodhabib vinodhabib commented Oct 3, 2024

I can see there are multiple tests in netty server and client side are randomly failing with resource leakage error and this PR will be fixing some of them at Server and Client side as below.

Note : Made some common method changes (as per Eric Suggestion/Reference) in all 3 tests files and currently tests which are shown below in Pending Server Base are one which are majorly impacting in both Server and Client test randomly which has common root cause while invoking the dataFrame() which we discussed earlier and tried suggestion as shown in the comment
#3353 (comment)

Server Side UTs Fixes:
sendFrameShouldSucceed()
inboundDataWithEndStreamShouldForwardToStreamListener()
inboundDataShouldForwardToStreamListener()
keepAliveEnforcer_sendingDataResetsCounters()
maxRstCount_withinLimit_succeeds()
maxRstCount_exceedsLimit_fails()

NettyHandlerTestBase.dataSizeSincePingAccumulates()

Client Side UTs Fixes
sendFrameShouldSucceed()

Pending UTs at Server Base
NettyHandlerTestBase.bdpPingWindowResizing
NettyHandlerTestBase.bdpPingLimitOutstanding
NettyHandlerTestBase.windowUpdateMatchesTarget
NettyHandlerTestBase.testPingBackoff

Note : you can refer the comment #3353 (comment) for more details on these 4 pending Server Base UTs behavior

Fixes #3353

# Conflicts:
#	netty/src/test/java/io/grpc/netty/NettyHandlerTestBase.java
@vinodhabib vinodhabib changed the title Defect 3353 - netty tests resource leak fix netty - netty tests resource leakage issue Oct 4, 2024
@vinodhabib vinodhabib changed the title netty - netty tests resource leakage issue netty: netty tests resource leakage issue Oct 4, 2024
…stenerMessageQueue in NettyServerTests as it's not working as expected
@vinodhabib vinodhabib changed the title netty: netty tests resource leakage issue grpc-netty: netty tests resource leakage issue Oct 14, 2024
@@ -1346,7 +1368,7 @@ private ByteBuf emptyGrpcFrame(int streamId, boolean endStream) throws Exception
try {
return dataFrame(streamId, endStream, buf);
} finally {
buf.release();
buf.clear();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this change.

Copy link
Contributor Author

@vinodhabib vinodhabib Oct 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we revert this change then below tests are failing with mentioned reason in snippet. so I used clear() also you can refer previous comment on the same.
#3353 (comment)
image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per documentation clear() does not release memory: https://github.com/netty/netty/blob/64c52255c073d4a308d3dbb56a12d3cca7692c47/buffer/src/main/java/io/netty/buffer/ByteBuf.java#L467

I tried with your PR and not calling either byteBuf.clear() or byteBuf.release() here and I did not observe the leak report for any of the 3 tests calling emptyDataFrame() viz headersWithErrAndEndStreamReturnErrorButNotThrowNpe, streamErrorShouldNotCloseChannel and clientHalfCloseShouldForwardToStreamListener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the above comment and snippet we are discussing about 3 tests failures with refCnt decrement error if we use byteBuf.release() instead of byteBuf.clear(), not about memory leakage issue

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, my question is why byteBuf.release() is even required when it doesn't show any report leak.

Copy link
Contributor Author

@vinodhabib vinodhabib Nov 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean to say we need to remove this try block which is doing the cleanup part, as this was an existing code? please confirm.

Copy link
Contributor

@kannanjgithub kannanjgithub Nov 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, leave it as is. For some reason I see
minus on buf.release() and
plus on buf.clear()
making it appear like it was your change.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the release() looks like it should go away (the ownership is being transferred to dataFrame(), but clear() is useless here. Just delete the line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

/**
* Must be called by subclasses to initialize the handler and channel.
*/
protected final void initChannel(Http2HeadersDecoder headersDecoder) throws Exception {
content = Unpooled.copiedBuffer("hello world", UTF_8);
//ByteBuf content = Unpooled.copiedBuffer("hello world", UTF_8);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -869,7 +873,8 @@ public void keepAliveEnforcer_sendingDataResetsCounters() throws Exception {
future.get();
for (int i = 0; i < 10; i++) {
future = enqueue(
new SendGrpcFrameCommand(stream.transportState(), content().retainedSlice(), false));
//new SendGrpcFrameCommand(stream.transportState(), content().retainedSlice(), false));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove commented line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@vinodhabib vinodhabib changed the title grpc-netty: netty tests resource leakage issue netty: netty tests resource leakage issue Oct 18, 2024
@kannanjgithub kannanjgithub added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 11, 2024
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 11, 2024
@@ -16,8 +16,8 @@

package io.grpc.netty;

import static com.google.common.base.Charsets.UTF_8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you make this change?

Copy link
Contributor Author

@vinodhabib vinodhabib Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this was changed as per your earlier suggestion/reference #3353 (comment)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand. I said nothing about this, and it wasn't in the code change I linked.

Copy link
Contributor Author

@vinodhabib vinodhabib Nov 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ejona86 Below snippet is captured from code change which you linked where I can see the UTF_8 import is already exists.

FileName - NettyHandlerTestBase.java master...ejona86:grpc-java:netty-leak-detector-in-tests#:~:text=final%20byte%5B%5D-,contentAsArray,-()%20%7B

image

Note : I removed chars \000\000\000\000\015 from String as its giving check style violation while build

@@ -1315,7 +1313,9 @@ private void rapidReset(int burstSize) throws Exception {
for (int period = 0; period < 3; period++) {
for (int i = 0; i < burstSize; i++) {
channelRead(headersFrame(streamId, headers));
channel().releaseOutbound();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need both releaseOutbound()s?

Copy link
Contributor Author

@vinodhabib vinodhabib Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added these releaseOutbounds() here as i saw leakage error in the below 2 UTs which are calling the rapidReset() method and it has outbound messages which are not released.

maxRstCount_withinLimit_succeeds()
maxRstCount_exceedsLimit_fails()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

releaseOutbound() is the sort of thing that only needs to be run once at the end of tests, to avoid leaks. Running it at the end of @Before could be okay if we don't want tests to see any queued messages, but there's no point to calling it multiple times here. Really, I'm surprised it isn't placed mostly just once in an @After to clean up after every test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@@ -1346,7 +1368,7 @@ private ByteBuf emptyGrpcFrame(int streamId, boolean endStream) throws Exception
try {
return dataFrame(streamId, endStream, buf);
} finally {
buf.release();
buf.clear();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the release() looks like it should go away (the ownership is being transferred to dataFrame(), but clear() is useless here. Just delete the line?

@ejona86 ejona86 added the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 12, 2024
@grpc-kokoro grpc-kokoro removed the kokoro:run Add this label to a PR to tell Kokoro the code is safe and tests can be run label Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Resource leak in netty tests
4 participants