-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unify/fix expiration bump logic in host. #957
Conversation
- Store bumps directly in the storage. This addresses several issues with the previous bump implementation, namely rollbacks (these weren't supported at all) and consistency with storage (previously it was possible to bump an existing entry). - Move autobump logic to host - Make inclusion/exclusion logic for current ledger consistent when computing expiration ledger (only include it for the new entries) - Add `bump_contract_instance_and_code` host fn to allow bumping arbitrary contracts - useful for factory contracts For simplicity we assume max expiration is correct for newly-created entries (these are managed exclusively by the host, so we can trust them to not exceed the limits).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and questions. I think it's basically ok in approach, though I'm nervous about the "modifying read-only entries and checking for a subset of the changes later" part; somehow I convinced myself before that this approach literally can't work safely, but I'm not finding any record of how, so that's too vague an objection to block this on.
(I also thought the current behaviour wrt. rollbacks or lack thereof was intentional -- can you check with @jayz22 on that?)
We could do a parallel map of read-only bumps and store it separately in the
I don't see how lack of rollbacks can be intentional. Performing bumps from e.g. failed contract calls doesn't really make any sense. |
So .. I recovered the conversation in which I talked this logic through before, I was mis-remembering, it was with @sisuresh not @jay and it had to do with read-write dependencies and parallelism. The thing is, if you have two threads that both do a bump on a read-only entry, they really might execute in parallel. So you have to ensure that:
I think this may have to do with why there were no rollbacks on bumps, though I'm not sure. I'll continue to try to figure that out. I want to be clear though: it's essential that we not accidentally introduce any read-after-write data dependencies on values we're assuming to be "read-only" with this work. It's entirely possible for a mistake here to defeat the whole concurrency control protocol we have planned, serializing everything into a single thread. We need to be very careful. The code in the PR, as written, I think violates this condition. It reads expiry times before bumping them, which means if two txs run back to back on a single thread and they both "bump a read-only entry", they will be observing their co-situation on that thread; scheduling the two of them on two separate threads would produce different results. That's not ok. We need a design that does not have that characteristic. |
This is defined, but probably not in a discoverable place... Currently we charge bumps relative to the current ledger state (so if A bumps by 1000 and then B bumps by 1100, then A will be charged for bump by 1000 and B will be charged for bump by 100). When we have concurrent execution, everyone will be charged against the ledger snapshot before tx is executed - that's the only way to get deterministic fees. Note, that this logic has nothing to do with the host really.
The final bump is not observable of course. But every host instance can (and should) observe the bump it has performed within the invocation.
I'm not sure what do you mean here. All the host instances are independent, aren't they? Nothing in the current design suggests running several host functions in parallel. Every invocation needs to have a separate host with separate storage and budget.
Rollbacks are about the logic of the current invocation. They should behave in an intuitive fashion and it shouldn't depend on the environment the host runs in, concurrency etc. From the contract perspective it is expected that
I think this comes back to the point above - it is surprising to me that you're talking about running several operations on the same host instance. That's not possible with the current design for much more reasons than just rent bumps. If this ever becomes a design goal, we'll need to change a lot about how host is implemented. This PR (and the followups) work under the assumption that every invocation is isolated within a host instance. That's what the current design definitely allows for. It's up to the embedder to provide the proper storage state that doesn't lead to non-determinism. |
I also want to emphasize once more that this PR doesn't change the observable host behavior compared to the current implementation. It only addresses a few bugs (or at least non-intuitive behavior, depending on how you name it). Currently it's core that does autobumps and aggregates the duplicate bumps following exactly the same logic as has been implemented here. |
Another thought: we definitely don't enforce any tx sequencing in any way in general. So it's possible that tx 1 modifies an entry, then tx 2 reads it etc. Clearly, we will need a protocol change for introducing the concurrent execution. It will also take care of bumps, but I really think these are actually much simpler than everything else - for bumps you just need to be sure to provide the initial expiration ledger of all the RO entries. RW-RO entry coordination would need some complex tx ordering changes on the other hand. |
No, this PR changes semantics. It's very important that we get clear on expectations here because this can break concurrency. The assumed/intended model is that a txset S will get broken into N logical partitions P1..PN and each logical partition has sequential semantics: if two transactions T1 and T2 in P1 both have key K in their RW footprint then T2 will observe T1's write to K. The logical partitions are driven by the footprints and encapsulate all such observable sequential data dependency, such that we may choose to schedule P1 and P2 on a single thread, or two separate threads, and the transactions in either partition cannot observe that scheduling decision. Creating observable read after write dependencies that span partitions (as a write to a RO footprint entry would do) breaks this model. This has nothing to do with the lifecycle of a host. One transaction will still get one host instance. But changes to storage written by T1 are visible by T2. That has why there are RW entries in the footprint at all: to group together all the transactions with overlapping RW dependencies into a partition to give them a defined serialization order. |
We are not going to make up a concurrency control protocol in a future protocol change. The system design already has a concurrency control protocol. It's why there are footprints. If you're not aware of how it works that's fine and I'm happy to keep discussing it until it's clear, but there's definitely already a design here. |
There are no observable dependencies here (or to be more precise, preventing these has nothing to do with the host). The scheduler will need to make sure to provide host with the proper data snapshot, such that RO entry passed to P1 is the same as RO entry passed to P2. This is both true for the current implementation and updated implementation, it's just that the code was scattered around before. In any case, core will need to reconcile the bumps to pick the max one, but that will only happen after all transactions are applied. And that logic has nothing to do with the host logic. Host is just a black box that takes a ledger snapshot and produces some delta to it (which is represented in an explicit fashion in the followup PR). I don't think we should really care about what happens within the host itself - the host storage is not a direct abstraction of the ledger and it's not meant to be blindly written to the ledger. |
Hm, ok, I .. can imagine what you're saying might work, apologies for overstating my case; but I am still a little concerned about the details. I think there are .. maybe a few separate issues we need to work through on this subject, not all of which are directly caused-by or solvable-by this PR. I guess I just want to make sure this PR is not itself painting us further into a corner, and also make sure we're all on the same page about the corner we're currently already painted into with bumps, and maybe try to formulate a plan to fix it rather than making it worse:
|
Yes, this is what I referred to when I said that I don't change anything functionally. The implementation is also temporary as we don't do RO bumps properly at all. This change doesn't change anything however - core will still get the bumps in exactly the same fashion as before.
I don't quite agree here. There is nothing wrong about these semantics from the contract standpoint. Bumps shouldn't be in any way special to the user. While cross-tx bumps indeed shouldn't be observable until the whole tx set is processed, from the tx standpoint they definitely do have to be observable, as otherwise we won't be able to come up with proper fees and meta. So every tx must be executed 'as if' it was the only tx that does bumps. Given that requirement, it doesn't really matter what we do in the host. The only thing we need to do to ensure parallelism is to provide the proper ledger snapshot to the host. Otherwise, I might make the same statement about RW entries - if you try to execute two host instances that touch the same RW value in parallel, then you'll get inconsistent results. This shouldn't really be the host business to make these decisions.
Also, current expiration time is only necessary for the autobumps. Here we look at the max expiration, which is a function of the ledger sequence. If you're concerned about autobumps, we might as well get rid of them (though that would need a broader discussion). But again, given the proper snapshot, there is no issue as well: let's say tx A and tx B see an entry at expiration N and autobump to N+10. Both pay for 10 ledgers bump, then core reconciles this into N+10 expiration. |
"Observable" here can mean two things: one is "you can observe that a bump was issued that wants a delta of N" (which isn't a problem or even possible to prevent -- you can always just emit an event saying that you did it), the other is "you can observe the post-state of that delta, that current expiry is K" which I'm not ok with.
But the whole point of entries being marked RW is that all txs sharing a given RW dependency get grouped into a sequentially-executed partition. |
I think we're on the same page here, and this change doesn't cause any observability changes. The output of the function call is 'this invocation wants to bump expiration of K from N to N + K'. It doesn't imply that it should be applied immediately.
Sure, but what prevents us from defining the semantics for bumps (which would be 'RO bumps shouldn't immediately modify the ledger state'). This really has nothing to do with the host logic in the end. This implementation is just more convenient and less error-prone than the alternatives. We could have a separate map with bumps only that would need to care about the storage state in the same fashion. If you feel like that's somehow safer, then we can got for that as well, but really it doesn't change the final output and doesn't fix the issue of core not providing the proper snapshot for RO (which is being addressed separately). |
Discussed in person, we'll revisit the issues around bump concurrency semantics later. |
What
bump_contract_instance_and_code
host fn to allow bumping arbitrary contracts - useful for factory contractsWhy
Bugfixes and closing feature gaps.
Known limitations
N/A