Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moonbase node crashes every 10 min #2502

Closed
jonathanudd opened this issue Sep 27, 2023 · 5 comments
Closed

Moonbase node crashes every 10 min #2502

jonathanudd opened this issue Sep 27, 2023 · 5 comments

Comments

@jonathanudd
Copy link

jonathanudd commented Sep 27, 2023

I'm running a Moonbase Alpha node currently version 0.33.0
The node crashes every 10 min and works fine after a restart. This problem has occurred since August at least so not isolated to 0.33.0.
See log for more details.
moonbase_crash.log

Arguments used
--chain=alphanet --state-pruning=archive --rpc-max-connections=1000 --execution=wasm --wasm-execution=compiled --rpc-external --rpc-port=9933 --rpc-cors=all --rpc-methods=unsafe --prometheus-external --name="\U0001F6E1 DWELLIR MOONBASE ALPHA RPC 1 \U0001F6E1" --wasm-runtime-overrides=/home/polkadot/wasm --runtime-cache-size=16 --max-runtime-instances=32 -- --execution=wasm --bootnodes=/dns/0.westend.paritytech.net/tcp/30333/p2p/12D3KooWKer94o1REDPtAhjtYR4SdLehnSrN8PEhBnZm5NBoCrMC --bootnodes=/dns/westend.bootnode.amforc.com/tcp/30333/p2p/12D3KooWJ5y9ZgVepBQNW4aabrxgmnrApdVnscqgKWiUu4BNJbC8

@crystalin
Copy link
Collaborator

Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 [🌗] 💤 Idle (1 peers), best: #5186945 (0xd139…b580), finalized #5186944 (0x1f86…e42a), ⬇ 0.2kiB/s ⬆ 0.2kiB/s
Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 Accepting new connection 1/1000
Sep 27 07:21:36 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:36 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:37 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:37 Accepting new connection 1/1000
Sep 27 07:21:37 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:37 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:38 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:38 Accepting new connection 1/1000
Sep 27 07:21:38 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:38 HTTP serve connection failed hyper::Error(Shutdown, Os { code: 107, kind: NotConnected, message: "Transport endpoint is not connected" })
Sep 27 07:21:39 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:39 Accepting new connection 1/1000
Sep 27 07:21:41 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:41 [Relaychain] 💤 Idle (8 peers), best: #12248368 (0x20e5…ae7c), finalized #12248365 (0x0629…5836), ⬇ 2.7kiB/s ⬆ 1.7kiB/s
Sep 27 07:21:41 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:41 [🌗] 💤 Idle (2 peers), best: #5186945 (0xd139…b580), finalized #5186944 (0x1f86…e42a), ⬇ 0.6kiB/s ⬆ 0.5kiB/s
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ✨ Imported #12248369 (0x2556…6c25)
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ♻️  Reorg on #12248369,0x2556…6c25 to #12248369,0xe211…c557, common ancestor #12248368,0x20e5…ae7c
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] ✨ Imported #12248369 (0xe211…c557)
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="availability-distribution-subsystem" err=FromOrigin { origin: "availability-distribution", source: IncomingMessageChannel(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="candidate-validation-subsystem" err=FromOrigin { origin: "candidate-validation", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="statement-distribution-subsystem" err=FromOrigin { origin: "statement-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Overseer exited with error err=Generated(SubsystemStalled("availability-store-subsystem"))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="bitfield-signing-subsystem" err=FromOrigin { origin: "bitfield-signing", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Essential task `overseer` failed. Shutting down service.
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] error receiving message from subsystem context: Generated(Context("Signal channel is terminated and empty.")) err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] Failed to receive a message from Overseer, exiting err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="network-bridge-tx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="dispute-distribution-subsystem" err=FromOrigin { origin: "dispute-distribution", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="chain-api-subsystem" err=FromOrigin { origin: "chain-api", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="availability-recovery-subsystem" err=FromOrigin { origin: "availability-recovery", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="approval-voting-subsystem" err=FromOrigin { origin: "approval-voting", source: Generated(Context("Signal channel is terminated and empty.")) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="dispute-coordinator-subsystem" err=FromOrigin { origin: "dispute-coordinator", source: SubsystemReceive(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] err=Subsystem(Generated(Context("Signal channel is terminated and empty.")))
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="network-bridge-rx-subsystem" err=FromOrigin { origin: "network-bridge", source: SubsystemError(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="provisioner-subsystem" err=FromOrigin { origin: "provisioner", source: OverseerExited(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="candidate-backing-subsystem" err=FromOrigin { origin: "candidate-backing", source: OverseerExited(Generated(Context("Signal channel is terminated and empty."))) }
Sep 27 07:21:42 juju-1b6dd3-0 polkadot[1442906]: 2023-09-27 07:21:42 [Relaychain] subsystem exited with error subsystem="runtime-api-subsystem" err=Generated(Context("Signal channel is terminated and empty."))
Sep 27 07:22:42 juju-1b6dd3-0 polkadot[1442906]: Error: Service(Other("Essential task failed."))

@bkchr I thought this was fixed already (This is using polkadot v0.9.43)

@bkchr
Copy link

bkchr commented Sep 27, 2023

Good question. Should have been? I don't remember 🙈

@crystalin
Copy link
Collaborator

I reported it there, let's see:
paritytech/polkadot-sdk#1730

@jonathanudd
Copy link
Author

I deployed a new node which is fully synced and in use now which doesn't have this problem.
So this is not a problem that affects our services anymore.

I still have the old node if you want me to try something with it.

@RomarQ
Copy link
Contributor

RomarQ commented Oct 11, 2024

This does not seem to be an issue anymore, closing.

@RomarQ RomarQ closed this as completed Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants