Race condition in net_plugin that can lead to unnecessary unlinkable block exception #643

arhag · 2024-08-26T22:23:07Z

The net_plugin will send a request to peer asking it to return a particular span of blocks referenced by block number. It will then send another request with a later span of block to next peer (round robin). The net_plugin waits until it receives (which includes successful unpacking of the serialized data) all of the blocks from the latest requested span before sending out the request for the next span of blocks. But it does not validate the integrity of the block data received prior to making the next request.

So if the block data was corrupted some how, e.g. missing a transaction, the net_plugin would realize this after it had already sent out the next request. The corrupted block would of course be dropped, but the block that comes after would likely be received by another peer before the net_plugin has a chance to re-request the block that was corrupted. So that block would run into a unlinkable block exception when attempting to add the block to the fork database since its necessary parent block (the corrupted block) would not yet be in the fork database. The unlinkable block exception causes net_plugin to post that block into the queue for the main thread to process along with the other remaining blocks it receives from that peer for the span of blocks requested.

Those sequence of blocks will all ultimately fail to link because the corrupted block is missing. But after a certain number of failure, net_plugin will close the connection with its "bad" peer, and then after some time, after it reconnects, it will re-request the necessary blocks and resolve the issue automatically. But it gets there after a very inefficient process.

It would be more efficient if the net_plugin was aware of the block corruption and the need to request blocks starting from the block it needs.

One simple way to achieve this is to wait until all the received blocks from one requested span of blocks have been minimally verified to have block integrity before requesting the next span. For example, ensuring the last block of the span has been successfully been added into the fork database prior to requesting the next span is one way of achieving this goal. If it encounters a corrupted block, it can disconnect from that peer immediately (ignoring the remainder of the blocks in the span that the peer is sending), and then send a request for a new span of blocks to the next peer starting with the block number of the corrupted block. This solution can be part of the work of #529 since that also changes the behavior of net_plugin to never post an unlinkable block to the queue for the main thread.

enf-ci-bot added the triage label Aug 26, 2024

heifner added 👍 lgtm and removed triage labels Aug 27, 2024

arhag added this to the Spring v1.1.0-rc1 milestone Aug 27, 2024

heifner self-assigned this Sep 26, 2024

heifner added the OCI Work exclusive to OCI team label Sep 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition in net_plugin that can lead to unnecessary unlinkable block exception #643

Race condition in net_plugin that can lead to unnecessary unlinkable block exception #643

arhag commented Aug 26, 2024

Race condition in net_plugin that can lead to unnecessary unlinkable block exception #643

Race condition in net_plugin that can lead to unnecessary unlinkable block exception #643

Comments

arhag commented Aug 26, 2024