Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GrpcServiceException: INTERNAL: RST_STREAM closed stream #881

Open
patriknw opened this issue May 4, 2023 · 1 comment
Open

GrpcServiceException: INTERNAL: RST_STREAM closed stream #881

patriknw opened this issue May 4, 2023 · 1 comment

Comments

@patriknw
Copy link
Member

patriknw commented May 4, 2023

When testing Projection over gRPC samples in AWS with application layer load balancer (alb ingress controller)

akka.grpc.GrpcServiceException: INTERNAL: RST_STREAM closed stream. HTTP/2 error code: INTERNAL_ERROR

client logs:

[2023-05-04 11:37:42,136] [DEBUG] [io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler] [] [] [grpc-default-worker-ELG-1-2] - [id: 0x5670e347, L:/192.168.54.227:34182 - R:k8s-shopping-shopping-604179632a-148180922.us-east-2.elb.amazonaws.com/3.14.190.250:443] INBOUND RST_STREAM: streamId=5 errorCode=1

[2023-05-04 11:39:29,467] [DEBUG] [io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler] [] [] [grpc-default-worker-ELG-1-2] - [id: 0x5670e347, L:/192.168.54.227:34182 - R:k8s-shopping-shopping-604179632a-148180922.us-east-2.elb.amazonaws.com/3.14.190.250:443] OUTBOUND PING: ack=false bytes=1111
[2023-05-04 11:39:29,569] [DEBUG] [io.grpc.netty.shaded.io.grpc.netty.NettyClientHandler] [] [] [grpc-default-worker-ELG-1-2] - [id: 0x5670e347, L:/192.168.54.227:34182 - R:k8s-shopping-shopping-604179632a-148180922.us-east-2.elb.amazonaws.com/3.14.190.250:443] INBOUND PING: ack=true bytes=1111

This tear down the connections and the projections are restarted.

nothing that can explain it in the server logs

Happens once per minute and that is the idle timeout for the load balancer.

Tried on server without success:

akka.http.server.http2.ping-interval=10s

In client we have channelBuilderOverrides

    _.keepAliveWithoutCalls(true)
      .keepAliveTime(10, TimeUnit.SECONDS)
      .keepAliveTimeout(5, TimeUnit.SECONDS)

One way to work around the problem is to periodically update the consumer filter, i.e. request from client. If this is a problem that we need to solve we could automatically emit keep alive messages from the GrpcReadJournal.

Maybe related https://stackoverflow.com/questions/66818645/http2-ping-frames-over-aws-alb-grpc-keepalive-ping

@johanandren
Copy link
Member

I highly suspect the issue is specific to ALB which does not pass through HTTP/2 pings, and does not seem to care about them from the server to ALB, and does signal timeouts in a weird way by RST_STREAM with protocol_error to the client (which is supposed to be about client/server talking invalid HTTP/2 as far as I understand it).

Looks like a possible workaround could be possible to tune the ALB config to a much longer idle timeout.

May still be worth working around with gRPC message level keepalives, but we have not seen this from any other load balancers/proxies ALB as far as I know, so an upstream fix of some sort also sounds like it would make sense. I couldn't find any public issue tracker for ALB/ELB to look for existing reports.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants