Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

p2p: resolved deadlock on p2p server shutdown #2183

Merged
merged 2 commits into from
Jan 26, 2024

Conversation

MatusKysel
Copy link
Contributor

Description

This PR removes forceful stop of p2p server that was caused by deadlock on channels. In protoTracker() we should be waiting for signals from all handlerDoneCh, but with return on stopCh we returned and protocol handler got stuck on sending signal to handlerDoneCh. This changes removes this deadlock and also part of the code from p2p server that forcefully killed the server.

p2p/server.go Outdated Show resolved Hide resolved
@zzzckck
Copy link
Collaborator

zzzckck commented Jan 26, 2024

Could you also post the deadlock callstack?

@zzzckck zzzckck requested a review from galaio January 26, 2024 05:43
p2p/server.go Outdated
@@ -448,18 +445,7 @@ func (srv *Server) Stop() {
}
close(srv.quit)
srv.lock.Unlock()

stopChan := make(chan struct{})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous code tries to skip srv.loopWG.Wait() if stopTimeout reach, maybe this is to solve issues before?
Is it ok just remove above case <-cs.handler.stopCh:, and keep this logic unchanged?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it's ok but I removed because:

  • this is hiding potential deadlock in the future
  • there is no right timeout value, it was 5s but for graceful shutdown of node with 2k connection is much more

I am ok with returning it but increasing timeout to like 30s or something, but still, geth has nothing like that in their code and they just wait

@MatusKysel MatusKysel merged commit 220be95 into develop Jan 26, 2024
6 checks passed
@MatusKysel MatusKysel deleted the fix-p2p-server-timeout branch January 26, 2024 09:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants