-
Notifications
You must be signed in to change notification settings - Fork 39
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
feat: Add GenesisHash to AttestationStatusClient and reject unexpecte…
…d updates (BFT-496) (#167) ## What ❔ - [x] Add the initial `GenesisHash` to `AttestationStatus` and change `AttestationStatusWatch::update` to take `GenesisHash` and return an error if the latest from the client indicates a change in genesis. This stops the runner and thus should stop the node itself when the error bubbles up. On external nodes this is redundant with the routine that monitors the main node API for changes. On the main node I shouldn't happen. - [x] Change `next_batch_to_attest() -> Option<BatchNumber>` into `attestation_status() -> Option<AttestationStatus>` on `AttestationStatusClient`, so it now includes the `GenesisHash` as well. - [x] `LocalAttestationStatusClient` returns the `GenesisHash` it was started with ## Why ❔ This came out of a discussion here: matter-labs/zksync-era#2544 (comment) Notably the `ConsensusDal::attestation_status()` now returns the `GenesisHash`, which I thought was only to signal through the API from the main node to the external nodes that a reorg has happened (which they would eventually learn anyway through polling the genesis endpoint, causing them to restart), and they should not attest for batches if they are on a different fork. The comment raised the issue that on the main node I discarded the `genesis` field from the response because I assumed it can only be the same as the one we started the `Executor` with, and second guessing it in the `BatchStore` would be awkward: the original genesis wasn't available to check against (it's cached in the `BlockStore`) and running a query to fetch it (even if the `PersistentBatchStore` supported it) would just compare it to what it already is. The way I handled this mismatch for external nodes was to just stop updating the status by returning `None` and wait for the restart, treating it like any other transient RPC error, it was just logged. It does make sense though to elevate it higher and be able to stop the node if it's in an inconsistent state. Now it's part of the `AttestationStatusWatch` because that is what we consider to be our interface with the node, (and not the `AttestationStatusRunner` which is a convenience construct). ### Reorgs without restart? If an inconsistency happened on the main node it wouldn't necessarily have to be fatal: if the genesis was allowed to change, and the components such as the `BlockStore` were able to handle it at all, then ostensibly the `AttestationStatus*` stuff could recover as well, for example by resetting the `BatchVotes` if they see a change in the `genesis`. The problem currently is that they might have already received and discarded votes for the new fork, which would not be gossiped again until clients are reconnected. Apparently we don't plan to handle regenesis on the fly, so `AttestationStatus::genesis` is only present to prevent any attempt at changing it by _causing_ a restart instead. ### Atomicity In the first version of this PR the `BatchStore` and `PersistentBatchStore` were also changed to have an `attestation_status()` method that returned a `GenesisHash` along with the `BatchNumber`. The only way I could make sense of this was if 1) changes in genesis while running the main node were allowed and 2) they could happen between calls to `BlockStore::genesis()` and `BatchStore::next_batch_to_attest()`, since those are not atomic methods. We discussed that this will not be supported, the `Genesis` cached in `BlockStore` will not change. Since we only start the `Executor` and the `AttestationStatusRunner` [after regenesis](https://github.com/matter-labs/zksync-era/blob/1d206c0af8f28eb00eb1498d6f2cdbb45ffef72a/core/node/consensus/src/mn.rs#L39) in `zksync-era`, we don't have to worry about this changing. In that light it would be counter intuitive to return the genesis hash from `BatchStore`. (In the future we could consider using Software Transactional Memory to have an in-memory representation of the state where we can do consistent read and writes between multiple Watch-like references. The DAL is capable of transactional reads, and it would be nice if we could combine reading for example here the genesis and the last/next batch and not have to worry about having to compare hashes, when all of these are local reads.)
- Loading branch information
Showing
6 changed files
with
152 additions
and
49 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.