Skip to content

Commit

Permalink
port notion
Browse files Browse the repository at this point in the history
  • Loading branch information
kelemeno committed Oct 26, 2024
1 parent 9e437bf commit eabacb5
Show file tree
Hide file tree
Showing 33 changed files with 6,855 additions and 1 deletion.
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
File renamed without changes
File renamed without changes
File renamed without changes.
Binary file added docs/custom_da_support/custom_da.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
49 changes: 49 additions & 0 deletions docs/custom_da_support/custom_da_support.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,49 @@
# Custom DA support

# Custom DA support

## Intro

We introduced modularity into our contracts to support multiple DA layers, easier support for Validium and Rollup mode, and to settlement via the Gateway.

![CustomDA.png](./Custom_DA.png)

### Background

**Pubdata** - information published by the ZK Chain that can be used to reconstruct its state, it consists of l2→l1 logs, l2→l1 messages, contract bytecodes, and compressed state diffs.

```solidity
struct PubdataInput {
pub(crate) user_logs: Vec<L1MessengerL2ToL1Log>,
pub(crate) l2_to_l1_messages: Vec<Vec<u8>>,
pub(crate) published_bytecodes: Vec<Vec<u8>>,
pub(crate) state_diffs: Vec<StateDiffRecord>,
}
```

The current version of ZK Chains supports the following DataAvailability(DA) modes:

- `Calldata` - uses Ethereum tx calldata as pubdata storage
- `Blobs` - uses Ethereum blobs calldata as pubdata storage
- `No DA Validium` - posting pubdata is not enforced

The goal is to create a general purpose solution, that would ensure DA consistency and verifiability, on top of which we would build what is requested by many partners and covers many use cases like on-chain games and DEXes: **Validium with Abstract DA.**

This means that a separate solution like AvailDA, EigenDA, Celestia, etc. would be used to store the pubdata. The idea is that every solution like that (`DA layer`) provides a proof of inclusion of our pubdata to their storage, and this proof can later be verified on Ethereum. This results in an approach that has more security guarantees than `No DA Validium`, but lower fees than `Blobs`(assuming that Ethereum usage grows and blobs become more expensive).

## Proposed solution

The proposed solution is to introduce an abstract 3rd party DA layer, that the sequencer would publish the data to. When the batch is sealed, the hashes of the data related to that batch will be made available on L1. Then, after the DA layer has confirmed that its state is synchronized, the sequencer calls a `commitBatches` function with the proofs required to verify the DA inclusion on L1.

### Challenges

On the protocol level, the complexity is in introducing two new components: L1 and L2 DA verifiers. They are required to ensure the verifiable delivery of the DA inclusion proofs to L1 and consequent verification of these proofs.

The L2 verifier would validate the pubdata correctness and compute a final commitment for DA called `outputHash`. It consists of hashes of `L2→L1 logs and messages`, `bytecodes`, and `compressed state diffs`(blob hashes in case of blobs). This contract has to be deployed by the chain operator and it has to be tied to the DA layer logic, e.g. DA layer accepts 256kb blobs → on the final hash computation stage, the pubdata has to be packed into the chunks of <256kb, and a either the hashes of all blobs, or a rolling hash has to be be part of the `outputHash` preimage.

The `outputHash` will be sent to L1 as a L2→L1 log, so this process is a part of a bootloader execution and can be trusted.

The hashes of data chunks alongside the inclusion proofs have to be provided in the calldata of the L1 diamond proxy’s `commitBatches` function.

L1 contracts have to recalculate the `outputHash` and make sure it matches the one from the logs, after which the abstract DA verification contract is called. In general terms, it would accept the set of chunk’s hashes (by chunk here I mean DA blob, not to be confused with 4844 blob) and a set of inclusion proofs, that should be enough to verify that the preimage (chunk data) is included in the DA layer. This verification would be done by specific contract e.g. `Attestation Bridge`, which holds the state tree information and can perform verification against it.
Binary file removed docs/gateway/Hyperchain-scheme.png
Binary file not shown.
Binary file removed docs/gateway/L1-GM-Chain.png
Binary file not shown.
Binary file removed docs/gateway/L1-L2.png
Binary file not shown.
Binary file removed docs/gateway/chain-asset-id-registration.png
Binary file not shown.
Binary file removed docs/gateway/chain-migration.png
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/create_new_chain.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/l1_l2_messaging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/l1_l3_messaging.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/migrate_to_gw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/gateway/gateway-images/withdraw_from_gw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
347 changes: 347 additions & 0 deletions docs/gateway/gateway.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Gateway protocol versioning and upgradability

One of the hardest part about gateway (GW) is how do we synchronize interaction between L1 and L2 parts that can potentially have different versions of contracts. This synchronization should be compatible with any future STM that may be present on the gateway.

Here we describe various scenarios of standard/emergency upgrades and how will those play out in the gateway setup.

# General idea

We do not enshrine any particular approach on the protocol level of the GW. The following is the approach used by the standard Era STM, which also manages GW.

Upgrades will be split into two parts:

- “Inactive chain upgrades” ⇒ intended to update contract code only and not touch the state or touch it very little. The main motivation is to be able to upgrade the L1 contracts without e.g. adding new upgrade transactions.
- “Active chain upgrades” ⇒ same as the ones that we have today: full-on upgrade that also updates bootloader, insert system upgrade transaction and so on.

In other words:

`active upgrade = inactive upgrade + bootloader changes + setting upgrade tx`

The other difference is that while “active chain upgrades” are usually always needed to be forced in order to ensure that contracts/protocol are up to date, the “inactive chain upgrades” typically involve changes in the facets’ bytecode and will only be needed before migration is complete to ensure that contracts are compatible.

To reduce the boilerplate / make management of the upgrades easier, the abstraction will be basically implemented at the upgrade implementation level, that will check `if block.chainid == s.settlementLayer { ... perform active upgrade stuff } else { ... perform inactive upgrade stuff, typically nothing m}.`

# Lifecycle of a chain

While the chain settles on L1 only, it will just do “active chain upgrades”. Everything is the same as now.

When a chain starts its migration to a new settlement layer (regardless of whether it is gateway or not):

1. It will be checked the that the protocolVersion is the latest in the STM in the current settlement layer (just in case to not have to bother with backwards compatibility).
2. The `s.settlementLayer` will be set for the chain. Now the chain becomes inactive and it can only take “inactive” upgrades.
3. When migration finishes, it will be double checked that the `protocolVersion` is the same as the one in the target chains’ STM.

If the chain has already been deployed there, it will be checked that the `protocolVersion` of the deployed contracts there is the same as the one of the chain that is being moved.
4. All “inactive” instances of a chain can receive “inactive” upgrades of a chain. The single “active” instance of a chain (the one on the settlement layer) can receive only active upgrades.

In case step (3) fails (or for any other reason the chain fails), the migration recovery process should be available. (`L1AssetRouter.bridgeRecoverFailedTransfer` method). Recovering a chain id basically just changing its `settlementLayerId` to the current block.chainid. It will be double checked that the chain has not conducted any inactive upgrades in the meantime, i.e. the `protocolVersion` of the chain is the same as the one when the chain started its migration.

In case we ever do need to do more than simply resetting `settlementLayerId` for a chain in case of a failed migration, it is the responsibility of the STM to ensure that the logic is compatible for all the versions.

# Stuck state for L1→GW migration

The only unrecoverable state that a chain can achieve is:

- It tries to migrate and it fails.
- While the migration has happening an “inactive” upgrade has been conducted.
- Now recovery of the chain is not possible as the “protocol version” check will fail.

This is considered to be a rare event, but it will be strongly recommended that before conducting any inactive upgrades the migration transaction should be finalized. (TODO: we could actively force it, but it is a separate feature, i.e. require confirmation of a successful migration before any upgrades on a migrated chain could be done).

# Safety guards for GW→L1 migrations

Migrations from GW to L1 do not have any chain recovery mechanism, i.e. if the step (3) from the above fails for some reason (e.g. a new protocol version id is available on the STM), then the chain is basically lost.

### Protocol version safety guards

- Before a new protocol version is released, all the migrations will be paused, i.e. the `pauseMigration` function will be called by the owner of the Bridgehub on both L1 and L2. It should prevent migrations happening in the risky period when the new version is published to the STM.
- Assuming that no new protocol versions are published to STM during the migration, the migration must succeed, since both STM on GW and on L1 will have the same version and so the checks will work fine.
- The finalization of any chain withdrawal is permissionless and so in the short term the team could help finalize the outstanding migrations to prevent funds loss.



> The approach above is somewhat tricky as it requires careful coordination with the governance to ensure that at the time of when the new protocol version is published to STM, there are no outstanding migrations.
In the future we will either make it more robust or add a recovery mechanism for failed GW → L1 migrations.
>
### Batch number safety guards

Another potential place that may lead for a chain to not be migratable to L1 is if the number of outstanding batches is very high, which can lead to migration to cost too much gas and being not executable no L1.

To prevent that, it is required for chains that migrate from GW that all their batches are executed. This ensures that the number of batches’ hashes to be copied on L1 is constant (i.e. just 1 last batch).

# Motivation

The job of this proposal is to reduce the number of potential states in which the system can find itself in to a minimum. The cases that are removed:

- Need to be able to migrate to a chain that has contracts from a different protocol version
- Need to be able for STM to support migration of chains with different versions. Only `bridgeRecoverFailedTransfer` has to be supported for all the versions, but its logic is very trivial.

The reason why we can not conduct “active” upgrades everywhere on both L1 and L2 part is that for the settlement layer we need to write the new protocol upgrade tx, while NOT allowing to override it. On other hand, for the “inactive” chain contracts, we need to ignore the upgrade transaction.

# Forcing “active chain upgrade”

For L1-based chains, forcing those upgrades will work exactly same as before. Just during `commitBatches` the STM double checks that the protocol version is up to date.

The admin of the STM (GW) will call the STM (GW) with the new protocol version’s data. This transaction should not fail, but even if it does fail, we should be able to just re-try. For now, the GW operator will be trusted to not be malicious

### Case of malicious Gateway operator

In the future, malicious Gateway operator may try to exploit a known vulnerability in an STM.

The recommended approach here is the following:

- Admin of the STM (GW) will firstly commit to the upgrade (for example, preemptively freeze all the chains).
- Once the chains are frozen, it can use L1→L2 communication to pass the new protocol upgrade to STM.

> The approach above basically states that “if operator is censoring, we’ll be able to use standard censorship-resistance mechanism of a chain to bypass it”. The freezing part is just a way to not tell to the world the issue before all chains are safe from exploits.
It is the responsibility of the STM to ensure that all the supported settlement layers are trusted enough to uphold to the above protocol. Using any sort of Validiums will be especially discouraged, since in theory those could get frozen forever without any true censorship resistance mechanisms.

Also, note that the freezing period should be long enough to ensure that censorship resistance mechanisms have enough time to kick in
>
# Forcing “inactive chain upgrade”

Okay, imagine that there is a bug in an L1 implementation of a chain that has migrated to Gateway. This is a rather rare event as most of the action happens on the settlement layer, together with the ability to steal the most of funds.

In case such situation does happen however, the current plan is just to:

- Freeze the ecosystem.
- Ask the admins nicely to upgrade their implementation. Decentralized token governance can also force-upgrade those via STM on L1.

# Backwards compatibility

With this proposal the protocol version on the L1 part and on the settlement layer part is completely out of sync. This means that all new mailboxes need to support both accepting and sending all versions of relayed (L1 → GW → L2) transactions.

For now, this is considered okay. In the future, some stricter versioning could apply.

## Notes

### Regular chain migration moving chain X from Y to Z (where Y is Z’s settlement layer)

So assume that Y is L1, and Z is ‘Gateway’.

Definition:

`Hyperchain(X)` - ‘a.k.a ST / DiamondProxy’ for a given chain id X

`STM(X)` - the State transition manager for a given chain id X

1. check that `Hyperchain(X).protocol_version == STM(X).protocol_version` on chain Y.
2. Start ‘burn’ process (on chain Y)
1. collect `‘payload’` from `Hyperchain(X)` and `STM(X)` and `protocol_version` on chain Y.
2. set `Hyperchain(X).settlement_layer` to `address(Hyperchain(Z))` on chain Y.
3. Start ‘mint’ process (on chain Z)
1. check that `STM(X).protocol_version == payload.protocol_version`
2. Create new `Hyperchain(X)` on chain Z and register in the local bridgehub & STM.
3. pass `payload` to `Hyperchain(X)` and `STM(X)` to initialize the state.
4. If ‘mint’ fails - recover (on chain Y)
1. check that `Hyperchain(X).protocol_version == payload.protocol_version`
1. important, here we’re actually looking at the ‘HYPERCHAIN’ protocol version and not necessarily STM protocol version.
2. set `Hyperchain(X).settlement_layer` to `0` on chain Y.
3. pass `payload` to `IHyperchain(X)` and `STM(X)` to initialize the state.

### ‘Reverse’ chain migration - moving chain X ‘back’ from Z to Y.

(moving back from gateway to L1).

1. Same as above (check protocol version - but on chain Z)
2. Same as above (start burn process - but on chain Z)
1. same as above
2. TODO: should we ‘remove’ the IHyperchain from Z completely? (’parent’ chain Y doesn’t really have an address on Z).
3. Same as above (start ‘mint’ - but on chain Y)
1. same as above
2. creation is probably not needed - as the contract was already there in a first place.
3. same as above - but the state is ‘re-initialized’
4. Same as above - but on chain ‘Z’

### What can go wrong:

**Check 1 - protocol version**

- chain is on the older protocol version before the migration start
- resolution: don’t allow the migration, tell protocol to upgrade itself first.

**Check 3a — protocol version on destination chain**

- destination chain STM is on the OLDER version than the payload
- resolution: fail the transfer - seems that STMs were not upgraded.
- destination chain STM is on the NEWER version that then payload
- For simplicity - we could fail the transfer here too.

**Check 4a — protocol version on the source chain in case of transfer failure**

- source IHyperchain is on the ‘older’ protocol version than the payload
- in theory - impossible, as this means that the IHyperchain protocol version was ‘reverted’.
- source IHyperchain is on the ‘newer’ protocol version than the payload
- This is the **main** worst case scenario - as this means that the IHyperchain was updated (via ‘inactive’ update) while the protocol transfer was ongoing.
- This is the ‘Stuck state’ case described in the paragraph above.
File renamed without changes
Loading

0 comments on commit eabacb5

Please sign in to comment.