Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't require contracts to be deployed for computation game #117

Closed
johannbarbie opened this issue May 29, 2019 · 34 comments
Closed

don't require contracts to be deployed for computation game #117

johannbarbie opened this issue May 29, 2019 · 34 comments
Assignees
Labels
Milestone

Comments

@johannbarbie
Copy link
Member

johannbarbie commented May 29, 2019

problem: currently the contract assume that code is deployed onchain, to be able to read the the op-code for the execution. this assumption is invalid, as explained in this diagram.

task: create a proof system giving an efficient way to provide the opcode at position of pc.

Roles

bounty gardener: @peara / 15%
bounty worker: @peara / 60%
bounty reviewer: name / 25%

@johannbarbie
Copy link
Member Author

same as #18 really, just needs to be done without excuses

@peara
Copy link
Member

peara commented Jun 1, 2019

So Enforer and Verifier are on mainnet while SpendingConditions are on plasma?
I remember issues of the approach at #18 was opcodes like PUSH32 require 2 words and cannot be submitted easily?

@peara
Copy link
Member

peara commented Jun 1, 2019

Idea copied from verification game.
Something likes this 🤔

opcode_submit_proposal1

When requesting:

  • Opcode at 0 - PUSH1: submit ((PUSH1, '01'), [H2, H10, H14])
  • Opcode at 2 - PUSH2: submit ((PUSH2, '0102'), [H4, H9, H14])
  • Opcode at 4 - PUSH32: submit ((PUSH32, '01..32'), [H6, H12, H13])
  • Opcode at 6 - ADD: submit ((ADD, RETURN), [H8, H11, H13])
  • Opcode at 7 - RETURN: submit ((RETURN, Z(0)), [H7, H11, H13])

@peara
Copy link
Member

peara commented Jun 1, 2019

Hmm, the above solution cannot account for the differences in length of opcodes and data.
Maybe we could insert pc to leaf nodes...

opcode_submit_proposal1-2

We may also remove the redundant nodes and used Sorted Merkle Tree.

@johannbarbie
Copy link
Member Author

So Enforer and Verifier are on mainnet while SpendingConditions are on plasma?

exactly. the code and data that is used for the game are provided in the mainnet transaction that initializes the game. at this point the code could be saved, but it would have 2 disadvantages:
expensive + sets a limit to the code length.

opcodes like PUSH32 require 2 words and cannot be submitted easily?

i would like to dissect the problem here. there is the proof needed for the opCode at pc, and there is data for range opcodes, like push32 and jump that also requires access to the code. i would look at these as separate problems, and try to solve them one after the other, opCode by opCode.

@peara
Copy link
Member

peara commented Jun 3, 2019

I think using the above Merkle tree is enough to solve both issues.
We also need a function that called before submitProof to supply those.
Depend on opcodes we may require 1 or 2 Merkle proofs.
But this is a lot of gases and time required.

We may record the values submitted and stub it in EVMCode.
If no values are supplied, the submitProof just got reverted.

@johannbarbie johannbarbie added this to the Song of Dawn milestone Jun 5, 2019
@peara
Copy link
Member

peara commented Jun 8, 2019

@johannbarbie
Can I take this one?
Have you estimate it?

@johannbarbie
Copy link
Member Author

@peara you are welcome to take it. you want to be gardener and worker? I scheduled a sizing meeting next wednesday, then we'll know the size

@peara
Copy link
Member

peara commented Jun 9, 2019

@johannbarbie Thanks.
I can be gardener. Will give a expected changes tomorrow.

@peara
Copy link
Member

peara commented Jun 10, 2019

opcode_submit_proposal1(1)

@peara peara self-assigned this Jun 11, 2019
@peara
Copy link
Member

peara commented Jun 11, 2019

Changes:

  • EVMCode: change to use MerkleProof of code. Support query for code (can be 1 byte or 32 bytes) at a specific pc. Return null if proof at such pc is not known.
  • Enforcer: drop codeContractAddress when register new dispute
  • Verifier: drop codeContractAddress as Enforcer. submitProof will require codes with proofs. At most 2 Merkle proofs are need. 1 for the opcode at current pc. 1 for the data as in PUSH or for another opcode as in JUMP.
  • EVMRuntime: in run, revert if encounter unknown code.
  • Fix tests: Merkelizer, Verifier, Enforcer, disputes, i.e. a lots 😄

@johannbarbie johannbarbie added the size-L ~ 13 hours label Jun 12, 2019
@pinkiebell
Copy link
Contributor

We also have to take opcodes like CODESIZE, CODECOPY into account.
That means we basically need dynamic-size proofs with a enforced limit to the tree size.
I also vote that each leaf is fixed-size WORD = 32 bytes and we should either put the array length as the first leaf or derive/save it somewhere else. In addition, the leafHash would be like h(position, data), so that we can check that this word is indeed that one we requested.

And on the plus side, we could re-use that implementation for callData and returnData.

@peara
Copy link
Member

peara commented Jun 13, 2019

@pinkiebell Thanks, I have forgotten CODECOPY.
What do you mean by each leaf is fixed-size? And array length is the size of code, in 32-bytes words, isn't it?

I think we need dynamic-size leaves? As in #117 (comment), it could simplify the checking in contract. Although producing this tree could be much more complicated on client side, as the client needs to be aware of actually opcodes being run.

If we use fixed-size leaves, then a Merkle proof as in Sparse Merkle tree should be enough. One can just submit the position at the same time.

Although I like to use dynamic-size leaves, I may go with fixed-size leaves for now. Because distinguishing opcodes from data isn't easy and can be tricked if not don't well.

@pinkiebell
Copy link
Contributor

@pinkiebell Thanks, I have forgotten CODECOPY.
What do you mean by each leaf is fixed-size? And array length is the size of code, in 32-bytes words, isn't it?

Yes, fixed size of 32 bytes for leaf, but the length should be given in bytes not number of words.
But this might be not so important now, because we seem to settle on storing the information of the actual size of the bytecode elsewhere. See leapdao/leap-node#242

I think we need dynamic-size leaves? As in #117 (comment), it could simplify the checking in contract. Although producing this tree could be much more complicated on client side, as the client needs to be aware of actually opcodes being run.

If we use fixed-size leaves, then a Merkle proof as in Sparse Merkle tree should be enough. One can just submit the position at the same time.

Although I like to use dynamic-size leaves, I may go with fixed-size leaves for now. Because distinguishing opcodes from data isn't easy and can be tricked if not don't well.

If we use dynamic-size leaves we would save 62 bytes in the best case, that would be ~4216 in gas savings at best, correct me if I'm wrong here.
I'm still open to dynamic-size leaves, but I don't see the advantage yet 🙂

@peara
Copy link
Member

peara commented Jun 13, 2019

Well the saving is more than that. Because we can do a simple compare in most case to determine the opcode and data. If we use fixed-size leaves, we need to calculate the exact position of the required opcodes or data. Not to mention the case we need to supply 3 Merkle proofs if data is spanned across 2 adjacent words.
It isn't only saving gases, it also reduce potential bugs.
Assumed that we implement the dynamic-sized tree correctly 😉

@peara
Copy link
Member

peara commented Jun 13, 2019

Just recognized that CODECOPY doesn't have a limit to the length of copied code 😟
How should we process this one?
Maybe a lot of Merkle proofs...

@pinkiebell
Copy link
Contributor

@peara That limit needs to specified and enforced TODO 😀

@pinkiebell
Copy link
Contributor

pinkiebell commented Jun 13, 2019

Unknown

Roll Roll ..Roll your proof gently down the stack... Verify Verify Verify ...the values must be checked.

@peara
Copy link
Member

peara commented Jun 14, 2019

@pinkiebell @johannbarbie
I think we can solve this by adding a step before submitProof.
Let's call it submitData, only callable by solver.
This step will supply memory, calldata, return and code.
All of them should have this format:

[
  {
    pos: uint256,
    value: bytes32
  }
]

After solver call submitData, dispute will enter DataSubmitted state with a timeout similar to a respond.
At this time, anyone can submit a Merkle proof to refute the submitted data.
For example: refute("code", pos, proofs).
If any of these proofs is correct, the dispute is resolved and solver will be slashed.
After this timeout period, solver can continue with submitProof, this time no additional data is supply, just trigger EVMRuntime.step. We may call it just proof.

In DataSubmitted period, any false data can be detected. When proof is run, any missing data will result in solver being slashed.

I think the downside of this approach is storage cost of data, which is significantly more than current approach. What do you guys think?

@pinkiebell
Copy link
Contributor

I think the downside of this approach is storage cost of data, which is significantly more than current approach. What do you guys think?

How much storage that would be? Do I understand correctly that we have to save all data required for that step?

@peara
Copy link
Member

peara commented Jun 14, 2019

Yes, I think so. For memory it is normally 2 bytes32, for code it is 2-3 bytes32. For callData and returnData, there are no limit 🤔
Maybe we should increase bondAmount to take these into account, with a mechanism to refund exceeding amount.

@pinkiebell
Copy link
Contributor

I fear that this could drastically raise gas cost 😨
@johannbarbie What's your opinion on this?

@peara
Copy link
Member

peara commented Jun 14, 2019

Store 1 word cost is equal ~15 Merkle proofs verification cost 😕
I guess I will go with Merkle proofs for now.

@peara
Copy link
Member

peara commented Jun 16, 2019

After some more thinking, I think the cost can be reduce to 1 word in storage. It should be similar to how Ethereum storage works and could be used as EVMStateRoot.

For example, we could create a Sparse Merkle Tree from spendcond's codes.
Each part (1 word) of codes would correspond to a leaf:

leaves[keccak256(["code", position, value])] = value = value

When submitting proofs, solver will submit data of the form [{pos, val},...]. Then the hash root of those value will be calculated as above and checked with EVMStateRoot.

In this scheme, we use only 1 hashing procedure for all data from ExecutionState and can support for arbitrary data length.

Current ExecutionState:

    struct ExecutionState {
        bytes data;
        bytes32[] stack;
        bytes32[] mem;
        bytes32 customEnvironmentHash;
        bytes returnData;
        uint pc;
        uint gasRemaining;
        uint stackSize;
        uint memSize;
    }

For bytes32 values such as pc and gasRemaining, we can let them have position 0 to 4 in state tree:

leaves[0] = pc
leaves[1] = gasRemaining
leaves[2] = stackSize
leaves[3] = memSize
leaves[4] = customEnvironmentHash

For others such as stack, memory and data, each word's position will be determined by:

leaves[keccak256('data_type', position, value)] = value

The main cost of this scheme should be calculating Sparse Merkle Tree. This should be similar or lower than current procedure of calculating trees for different type of data.

@johannbarbie @pinkiebell What do you guys think?

Notes:

  • We can put codeLength to this tree as well

@johannbarbie
Copy link
Member Author

one tree to rule them all 🤔
we have code for sparse merkle tree with 160 bits key length here already: https://github.com/leapdao/leap-contracts/blob/master/test/sparseMerkleTree.js

extending this to 256 key length wouldn't be hard.

2 things i don't understand yet:

  • every proof would need to run 256 hashes, how fast would we reach block gas limit?
  • what exactly is the advantage of the sparse tree over the single merkle trees?

@johannbarbie
Copy link
Member Author

how is the storage related to this proof process, btw? 🤔

@pinkiebell
Copy link
Contributor

One tree for everything sounds nice at first but how much will updates/inserts to this tree construction cost? I think this would be substantially more?

@peara
Copy link
Member

peara commented Jun 17, 2019

@johannbarbie
It should be less than 256 hashes, depends on ExecutionState
Each hash costs around 40 gases (load 2 words to stack then keccak256), so I think this won't be an issue.
The advantage of sparse tree is it's convenience. We don't have an intrinsic order of all leaves to use in a normal tree.
I mentioned storage because before, I suggested store code data on storage.

@pinkiebell
I don't expect any update/insert. This tree would be constructed only at submitProof.

@pinkiebell
Copy link
Contributor

pinkiebell commented Jun 17, 2019

I don't expect any update/insert. This tree would be constructed only at submitProof.

In submitProof we have to check if all the data matches to the previousStep and after running one step in submitProof needs to update the leaves and check if the root hash matches computationPath.right.

Or does this design change anything of this procedure?

@peara
Copy link
Member

peara commented Jun 17, 2019

I see. So this would be 2 tree construction at most.

@pinkiebell
Copy link
Contributor

The idea was just that we could do

A sparse linked list for opcode at byte pos X, would work exactly the same as in EVMMemory.
You can of course skip the assembly and use 'normal' solidity code.
Example:
EVMCode > { bytePos: 4, value: 0xfa, next: <memory ptr to B> } B: { bytePos: 7, value: 0xaa, next: 0}

@peara
Copy link
Member

peara commented Jun 26, 2019

What I'm doing is https://github.com/leapdao/solEVM-enforcer/pull/137/files#diff-1aa016b76b63dd417b9159a6d65d3a32
What is the purpose of using a linked list?

@peara
Copy link
Member

peara commented Jun 26, 2019

Is it to replace dynamic array? I'm not so sure about memory in EVM. They won't be messed up? If so I will change to this

@pinkiebell
Copy link
Contributor

Is it to replace dynamic array?

Yes

I'm not so sure about memory in EVM. They won't be messed up? If so I will change to this

As long as you make sure you update free memory pointer (if you manually allocate), if you don't use manual memory management, you should be fine.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants