Skip to content
This repository has been archived by the owner on Aug 2, 2022. It is now read-only.

statetrack_plugin implementation #6321

Closed
wants to merge 56 commits into from
Closed

Conversation

andresberrios
Copy link

@andresberrios andresberrios commented Nov 15, 2018

Change Description

This PR is for reviewing the implementation of the statetrack_plugin and related code changes to chainbase.

This is an early version of the code and still required more testing. There are various open questions for which we wanted to request feedback from the Block.one developers.
The details on the plugin and motivation are explained in the plugin's readme file: https://github.com/mmcs85/eos/tree/master/plugins/statetrack_plugin

The associated pull request for the chainbase changes is here: EOSIO/chainbase#28

The associated receiver for the operations sent by this plugin is here: https://github.com/andresberrios/statemirror

Consensus Changes

This plugin doesn't directly introduce changes to consensus but there might be some implications regarding transaction processing time limits that should be evaluated. The details are explained in the above referenced plugin readme file.

API Changes

No API changes are introduced, but new hooks would be available for plugins to utilize if the chainbase pull request is merged.

Documentation Additions

Most of the documentation related to the current implementation of the plugin is detailed in the above referenced readme file.

Copy link
Contributor

@wanderingbort wanderingbort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andresberrios Thanks for taking on such an ambitious addition to nodeos

I have a high three level concerns with the signal model as presented that I would like to hear your thoughts on before digging in more.

  1. The Signals as spec'd do not seem to provide enough information for a downstream consumer to get a consistent view of the state at a particular point in time. For instance if I wanted to query a few different bits of information at "the end of a block A" or "after applying transaction X" there doesn't seem to be enough information produced to achieve this. The information can be used to create a replicated copy of the consensus state that follows the the database pretty closely in realtime but lacks the contextual information needed to understand if a create/update/delete is associated with a certain higher level construct (like a transaction) or a low level construct (like undoing a speculatively executed transaction).
  2. pulling a page from dmux, if a downstream consumer wanted to "reduce" state into meaningful metadata, perhaps deriving information at state-transitions, this level of signaling does not adequately represent micro forks as orphaned bits outside of consensus. For instance, if Node 1 sees transitions A -> B -> (C' -> C'-rollback -> )C -> D where C' is a microfork and Node 2 sees A -> B -> C -> D it is hard to guarantee that their reduced meta state is equivalent. Put another way, this doesn't abstract away any of the complexity of consensus and (1) implies that you cannot reconstruct a method of handling it downstream.
  3. A more meta concern is that this produces signaling outside of the atomicity of actions. This may create a false value proposition for contract authors to write more often than necessary (multiple times to the same row in a single action). While that is valid, it may present a bad incentive to place more burden on the shared scarce resources of on-chain processing to facilitate off-chain side-effects. I'm not opposed to allowing things like this but, I do want to have a discussion of the trade-offs it creates

Again, thank you for attempting this, it is truly ambitious and starts a very good conversation

@mmcs85
Copy link

mmcs85 commented Nov 16, 2018

Hi there @wanderingbort. Thank you from giving a review to this cuz we really think that this projects needs validation from EOSIO devs to evaluate our expectation and understand is viability.

To be honest my ideia (I think Andrés as well) for this plugin at this phase was only to stream the database operations to a queue without any context like trx / block / fork that it belongs to. Same as the chain_plugin fetch's rows every API request and does not take that context in consideration.

  1. I think for supporting this is always possible to subscribe other signals and construct on the plugin side the relation between db op and trx / blocks. For example is possible to add an event on applied action and since (I'm assuming this so need you advice) I can group ops by action by storing the ops that happen before and add them to the action that was applied. It only works on a single thread and if sequential order is guaranteed.
    Other possible solution is to add that information in each index operation. could it impact performance?

  2. It depends of the way your reducer function as the ability to correct the algorithm on the undo operations.
    Also I understand that in some cases you may want to know the op that is being reversed as well And for that a op needs some kind of ID to identify that is a undo op off a previous operation.

Maybe I'm wrong but since I subscribe the undo and emit their reverse operations In theory I'm streaming forks operations and the reverse operations when they are removed / (orphaned?) as well.

Should I subscribe events on the reversible_blocks chainbase in controller and fork database push_block and pop_block ?

  1. Indeed the price we pay for this features is that operations can be duplicated due to:
  • Action Multiple changes to same row
  • Forks applied and undo operations
  • Failed transaction with exceptions

But the thing is as long there is a use case for tracking this information is ok for me.
Also there is some mechanisms to reduce the impact of this:

  • Avoid using this plugin on a node that as available public API
  • Get only previous validated and executed transactions from local nodes if possible.
  • Filter the tables you want to get information since contract developers mostly care for a small subset of contract - tables.

Also just realized that there is a new state history plugin being developed by you guys so I'm not sure this is the best way to approach the need to track only table state.

@andresberrios
Copy link
Author

Thanks @wanderingbort for taking the time to review this. To expand on what @mmcs85 has mentioned:

It is true that we initially were looking to only stream the state DB operations, without them needing much associated context like the action/transaction that caused them. This is simply because we were pursuing the use cases where that would be enough, considering that after querying the state, the client could get notified of further modifications to the snapshot that was received via WebSockets or whatever other mechanism that the developer would like to implement or use in the receiver side, thus maintaining a consistent view of the state after any necessary undos are executed. This does not mean that there aren't more use cases that this system can cover very effectively, and that's also why we needed to get in touch with you guys and see what you consider important to support. On top of that, as @mmcs85 explained a bit, we already had some ideas for solutions to support further use cases.

  1. As @mmcs85 mentioned, there are some straightforward potential ways to support more context in the operations:

    • Regarding action/transaction context: Assuming a single thread for action processing and consistent sequential order of events (both assumptions seem to be currently valid), one could use the applied action event to track the checkpoints of action start and finish, and consider all state DB operations in between to belong to that particular action. This could be done either on the plugin side or on the receiver side, avoiding putting more load on the nodeos instance for tracking this.
    • Regarding undo operations: In a way, we already had this functionality supported in a previous version of the implementation. We originally were sending the commit, squash, and undo operations as well, along with the chainbase revision they corresponded to. This gave us context in the receiver to keep track of revisions (which are just denoted by the block number) and allowed us to group "changesets" or groups of state DB operations according to what block they belonged to. We would also keep a set of reversed operations ready to be applied in case of an undo, and these reversed operations were discarded once we received a commit operation, which (as we deduced) should signal block finality by committing the specified revision, making it un-undoable. You can check this implementation here, in a previous version. We eventually discarded this implementation for the time being since we found a way to send the actual reversed operations directly from chainbase, saving a lot of extra queries on the receiver side. We did keep in mind that some context for block finality and undo operations should be reintroduced at some point.
  2. Please correct me if I'm wrong, but I think this point is covered, since your reducer implementation would most likely gracefully handle the undos without any further complexity. For instance, if you would want to keep track fo the total sum of the numbers stored in column amount for each row in the table, you would need to do 3 things:
    1. Handle the emplace operations by adding the amount of that row to the total.
    2. Handle the erase operations by subtracting the amount of the row to be removed.
    3. Handle the modify operations by subtracting the old amount and adding the new one.
    This way, your reducer is agnostic of whether there was an undo or a fork, since it will always simply maintain consistency with whatever is actually stored in the table, as the undos will emit the reversed operations required to apply them. We rely on nodeos' own ability to keep it's own chainbase DBs in a consistent state when forks happen, and we just follow those DBs as the single source of truth. As @mmcs85 said:

Maybe I'm wrong but since I subscribe the undo and emit their reverse operations In theory I'm streaming forks operations and the reverse operations when they are removed / (orphaned?) as well.

Should I subscribe events on the reversible_blocks chainbase in controller and fork database push_block and pop_block ?

If this behaves as we believe it does, then we shouldn't need any further checks with reversible blocks or the fork database, as all of that would be handled by nodeos and applied to the state chainbase, which we hooked into.

Besides this, in some cases, the developer might find it more valuable to run a reducer based on actual applied actions and their parameters instead of state DB operations. In this case, using this plugin, they can simply use the actions DB table. Each applied action will be in this table (which has (code, scope, table) as (system, system, actions) in the current implementation), and they can apply a forward reduction step when there is an emplaced action, and a reversed reduction step when actions are erased from the table like in case there's a fork.

  1. I'm glad to see you've thought so deeply about this and found this potential mis-incentive. I can only answer with my personal view, as I have no hard proof, but it seems to me that it's hard to come up with a use case where developers might want to write twice to the same DB row in the same action just to trigger two different side-effect handlers. In any case that I can think of, for side-effects (unlike reducers), it would make more sense to hook into applied actions (and maybe even irreversible ones) than to state DB changes. Even though this opens the possibility for hooking things in weird esoteric ways, I don't see it necessarily as an incentive to do so. The fine-grained (sub-atomic) nature of the state DB operation updates has more benefit for other things, like reducers, state DB mirroring (to support more complex queries and indexing), and frontend real-time updates.

There might be other use cases that we're not yet considering, but for most dapp development needs that we've thought of, this implementation fulfills all needs so far. It's also worth mentioning that we've had strong interest in this system from some of the biggest dapps out there, so there's definitely a demand.

@b1bart
Copy link
Contributor

b1bart commented Jul 30, 2019

in the time between this PR and now we've successfully created plumbing to allow plugins to exist out of repo. I am going to close this PR as it is stale and should probably exist as a sibling repo that leverages this feature. See #5026

@b1bart b1bart closed this Jul 30, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants