-
Notifications
You must be signed in to change notification settings - Fork 683
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
approval-voting: coalesce multiple approvals in a single signed message #701
Comments
We've discussed lengthening tranche time with a similar lengthening of the no-show time, which likely makes the timing here less tight. Also, if we do not do that then we could've some extra information in the assignment VRFs which says when we expect to start work, and acts like delaying receiving some of the assignments within the VRF, so giving them a later no-show time. |
Even waiting half a second at a time to send out approval votes shouldn't impact finality much. Waiting a bit longer, i.e. up to 1/2 of 1 GRANDPA round, would probably help reduce message volume a lot as well. no-show times aren't the only factor here. we obviously don't want to impose delays that cause nodes to be mistakenly seen as no-shows, but we also don't want to delay finality any longer than necessary. I suppose there's a balance here, and it's driven by the performance impact of network message volume vs. amount of bundled messages. |
That one is going to be tricky with disputes, where we assume that there is one signed message per candidate/dispute. Also in order to verify the signature, we need to provide the candidate hash for the signing payload. Currently we just pass the hash of the candidate in question for checking the signature. If we combine multiple, then suddenly we would need to know all the other candidate hashes as well. I think this is solvable by reconstructing the signing payload (all signed candidate hashes) at the receiving side of the approval distribution and store it in the local representation of the vote. Some more compact representation might be possible as well. All we need is a proof that a given Whether or not a bitfield is ideal is also a different question. If we assume only a couple of candidates to be checked by each validators (which should be the case), having a list of u16 indices might be more efficient and also more future proof:
Size: 4xu16 = 8byte Depending on what we do with the above |
Excellent point @eskimor . I actually did not consider disputes at the time of writing this :) |
It's true As an aside, if there are equivocations then you've the same assignments on all equivocations, but afaik we rerun the whole process on all equivocations, and do not fine tune messages to be applicable across equivocations. We might want to change this really, but overall this requires more nuanced refinements. not necessarily what I original wrote down, but anyways any bitfields need to identify candidates unambiguously. |
Yes, this I also believe that, but it also depends on the other overheads aside from the sig checks like internal book keeping. Coalescing would make processing approvals in parallel less efficient assuming that a work is split by candidate hash - workers would have to check signatures twice, but the other overhead should not be duplicated. |
Why twice? You're talking about a worker thread/future which processes approval (and assignment) gossip messages, and pushes them onto the tallying data structure, yes? We'd need some extra |
Importing approval-votes for time overruns and for disputes will become very messy with this. @burdges Wouldn't batch verification get us similar speedups? In case the problem is not clear, let me expand a bit: If we get a time overrun on one candidate or a dispute, then we would no longer have a single signed approval to import on chain, but one big one - e.g. for X other unrelated candidates. This will lead to either huge amounts of duplicated storage in the dispute coordinator for example and also makes chain imports X times more expensive - in case of multiple disputes/time overruns also importing lots of redundant data. Or we add some indirection to avoid the duplication everywhere, likely possible but would be a significant effort, adding complexity. |
It's orthogonal, but batch verifications bring exactly the same problem I thought @sandreim described. This is also not really true: As messages take different routes, we're not talking huge batches here, so the improved asymptotics of pippenger lag somewhat. Assuming batch verification works out like https://github.com/dalek-cryptography/ed25519-dalek then we should reach roughly a 50% savings quickly, but afterwards savings do not accrue too quickly. It's possible those benchmarks predate HdV adding pippenger, but that's ages ago, so probably not. If we can really check a signature on a
Isn't this irrelevant? We import the approval, which is a signed We should sign a Merkle root of the I'm unsure if this is priority, but yes it becomes a TODO. At 1000 cores we'd risk 32 kb extra per votes to put on chain. I suppose the batch verification looks less complex, once we count this merkle thing to prevent spaming, but right now we should expect only 50% savings from batch verification, batch verification adds much complexity too, and this Merkle-of-Vec thing should really be a standard tool for us. |
An easier fix, you cap the Vec lengths. Alternatively, you could cap the approval votes messages from each sender within a time window, not sure if this demands uncapped Vecs or the Merkle thing. This messes up our gossip assumptions, which sucks, but maybe if we used the sender's time stamp here then either their messages become somehow irrelevant, or else we merge these into a slashable equivocation in their approval votes. Also, we're not talking about backers votes here. I guess those contain more information. |
Yeah I talk about having separate threads/workers that get assigned mutually exclusive per candidate work. A mutex could work here, to avoid the work if another thread has checked the signature before, but will defeat the purpose of parallelization of work, since we could block up to the time it takes to check a signature on a different worker. If we do not use the mutex, we need to check the signature twice if 2 workers need to look at the messages because they both have candidates to work on in the same message. @eskimor you make an excellent point about duplicating storage, while yes, as @burdges suggests, capping how many approvals can we coalesce at a time would mitigate some of that. We could also add some additional constraints to how the coalescing works, such that the message itself is for candidates that will be checked by a single worker - assuming we define a function |
As for data structures, if you're worried about stalling on mutexes, then at the extreme each candidate record could contain:
As a signature checker thread, you append only to your own linked list, so they never compete for the
so the tallier only holds each mutex for one swap instruction every half second or so. It's now the talliers job to integrate each linked list into it's data structure before doing its tally. These linked lists are not mmap friendly of course, even if the talliers data structures are made mmap friendly, so this limits syncing to disk, but that's not too bad. We definitely have some complexity costs here so maybe we should double check how much this'll really save. |
I don't think the solution above can be applied to our already complicated codebase. The complexity goes really deep as we process approvals/assignments in 2 subsystems. In approval distribution we do some minimal processing to ensure deduplication of messages, avoiding working on finalized blocks as well as book keeping of individual messages via finger printing. All of this requires us to keep some state that we inspect in order to decide how we process an individual approval - for example, if we haven't seen an assignment before the approval we discard the approval (without importing it) and decrease the reputation of the peer that sent it. However, the heavy lifting is done by approval voting when approval distribution requests it to import approvals and assignments. It's important to note that approval voting reads/writes to the DB, so having multiple workers requires that they work on differenet data sets. This makes sense wrt to the in memory state that we keep about the assignments and votes - That being said, the simplest way I see forward for implementing this parallel processing is to shard the state at the DB level by candidates, having the workers operate on different candidates in approval voting and enabling approval distribution to do the processing without waiting for individual approvals/assignments to be imported by approval voting workers. I totally agree this complicates things some more, but it will be mostly plumbing work to avoid the serial processing of requests that we do now. |
Also worth reconsidering the lazy checking idea again, should be even faster and probably ends up being less complicated/invasive. Major blocker: Gossip concerns. |
By lazy you mean only checking the signature of approvals if the sender is not the signer? |
We'd need to check before forwarding the message ourselves. Abstractly I'd expect this adds complexity for little gain in the worst case (and the worst case matters)
I suspect this breaks batch verification too, indeed batch verification should be harder than checking a short Vec. Ignoring our specific subsystem divisions, we abstractly have roughly 4 phases in the pipeline:
We've diverse concerns at boundaries between these phases, like 1-2 risks spam if messages accumulate unchecked, and 2 must know the forwarder to adjust reputation. Yet, we cannot do any fancy signature optimizations if 2 does not happen independently first before 3 and 4. In particular, we cannot do batch verification for messages forwarded by multiple peers: If Alice and Eve both send us messages, then we can batch verify the messages from each on separately. If otoh we've some abstract verifier that batch verifies them both together then Eve can poison Alice's batch, and we must then reject everything before we touch reputation, so we've made our DoS problem worse, not better. Also, this abstract batch verifier has its own internal mutexes, etc. |
One thing to note is that we do have batches of assignments or approvals coming from the same peer here https://github.com/paritytech/polkadot/blob/master/node/network/approval-distribution/src/lib.rs#L568, and we are processing those right now serially by sending them one by one to the approval-voting and waiting for the result of the signature. You are saying we could batch those and check them all at once, with a possible max speedup of 50% compared with checking them individually ? |
Yes, that's true, but.. We're seemingly not verifying them there, only later in the pipeline, where likely they're separated by core, right? We'd gain less by then because mostly they'd all land on different cores. I suppose our information dependency here is the candidate hash itself being "valid", so maybe the current system first checking the candidate hashes exist and splits them by candidate, yes? I'm essentially saying this candidate hash check should only "attach" where the message goes, but not do the split until after the signature check happens. |
If we do this then we should consider what happens when processing a vote in which some candidate hashes make sense, but not others. We presumably now run politeness at the level of relay chain blocks now, no? Assuming yes then we'd require approval votes always derive from only one specific relay chain block, maybe even include its block hash. I think then little changes during the regular path, but..
|
…er approval-voting In, the current implementation every time we process an assignment or an approval that needs checking after the approval voting, we will wait till approval-voting answers the message. Given that approval-voting will execute some signatures checks that take significant time(between 400us and 1 millis) per message, that's where most of the time in the approval-distribution, see https://github.com/paritytech/polkadot/issues/6608#issuecomment-1590942235 for the numbers. So, modify approval-distribution, so that it picks another message from the queue while the approval-voting is busy doing it's work. This will have a few benefits: 1. Better pipelinening of the messages, approval-voting will always have work to do and it won't have to wait for the approval-distribution to send it a message. Additionally, some of the works of the approval-distribution will be executed in parallel with work in approval-voting instead of serially. 2. By allowing approval-distribution to process messages from it's queue while approval-voting confirms that a message is valid we give the approval-distribution the ability to build a better view about what messages other peers already know, so it won't decide to gossip messages to some of it's peers once we confirm that message as being correct. 3. It opens the door for other optimizations in approval-voting subsystem, which would still be the bottleneck. Note! I still expect the amount of work the combo of this two systems can do, to still be bounded by the numbers of signatures checks it has to do, so we would have to stack this with other optimizations we have in the queue. - https://github.com/paritytech/polkadot/issues/6608 - https://github.com/paritytech/polkadot/issues/6831 [] Evaluate impact in versi [] Cleanup code an make CI happy to make the PR meargeable. Signed-off-by: Alexandru Gheorghe <[email protected]>
…voting In, the current implementation every time we process an assignment or an approval that needs checking in the approval voting, we will wait till approval-voting answers the message. Given that approval-voting will execute some signatures checks that take significant time(between 400us and 1 millis) per message, that's where most of the time in the approval-distribution, see https://github.com/paritytech/polkadot/issues/6608#issuecomment-1590942235 for the numbers. So, modify approval-distribution, so that it picks another message from the queue while the approval-voting is busy doing it's work. This will have a few benefits: 1. Better pipelinening of the messages, approval-voting will always have work to do and it won't have to wait for the approval-distribution to send it a message. Additionally, some of the works of the approval-distribution will be executed in parallel with work in approval-voting instead of serially. 2. By allowing approval-distribution to process messages from it's queue while approval-voting confirms that a message is valid we give the approval-distribution the ability to build a better view about what messages other peers already know, so it won't decide to gossip messages to some of it's peers once we confirm that message as being correct. 3. It opens the door for other optimizations in approval-voting subsystem, which would still be the bottleneck. Note! I still expect the amount of work the combo of this two systems can do, to still be bounded by the numbers of signatures checks it has to do, so we would have to stack this with other optimizations we have in the queue. - https://github.com/paritytech/polkadot/issues/6608 - https://github.com/paritytech/polkadot/issues/6831 [] Evaluate impact in versi [] Cleanup code an make CI happy to make the PR meargeable. Signed-off-by: Alexandru Gheorghe <[email protected]>
As both this and #607 look complex, should we experiment with a faster signature anyways? I guess not right now, as doing that is not completely trivial either. |
What do you mean by faster signatures, do you mean pulling the improvements mentioned here: #732 or using a different algorithm ? Another path, that I plan to experiment with is the lazy checking(suggested by @eskimor) where we do not check the approvals at all the moment we received them and check them in the moment we have to import them on chain, but that has some other implication with the gossip algorithm, which I'm currently trying to fully understand. On the best case, we would have to not gossip at all and ask each validator to send its approval to all others. |
SummaryAlright, based on the feedback and data I've got in paritytech/polkadot#7482, paritytech/polkadot#7393 and other discussions we had inside the team, I think this optimisation is what should implement next and this together with paritytech/polkadot#6782 should gives us enough gains, so that the approvals-voting & approvals-distribution, not be the bottleneck when increasing the numbers of validators and parachains. Addressing concernsWhy not
|
Also I think these could stack. Once core groups implemented, assignments/approvals will refer to a core group index. A bitfield representing the signed core groups or a vec of core group indices (assuming u8). It will however require to get know all candidate hashes when checking the signature on node side and runtime.
We also need to implement some constraints here like maximum wait time until we sign approval and gossip such that it doesn't interfere with finality. I'd have two different configuration options: FWIW, this sounds like a good plan. |
Added an initial draft of the proposal here: paritytech/polkadot#7554, feedback welcomed. |
* Adds minimal config for snowbase. * Formatting * Removes space. * Adds runtime benchmarks for eth beacon client for snowbase * Adds generated weights for snowbase and updates template path comments in weights files. * Fix formatting.
Initial implementation for the plan discussed here: #701 Built on top of #1178 v0: paritytech/polkadot#7554, ## Overall idea When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available. This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance: - Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time. - Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains - Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed. ## Other fixes: - Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments. - Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments. ## TODO: - [x] Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think. - [x] More and more testing. - [x] Test in versi. - [x] Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration. - [x] Make sure the backwards compatibility works correctly - [x] Make sure this direction is compatible with other streams of work: #635 & #742 - [x] Final versi burn-in before merging --------- Signed-off-by: Alexandru Gheorghe <[email protected]>
Bumps [lru](https://github.com/jeromefroe/lru-rs) from 0.7.5 to 0.7.6. - [Release notes](https://github.com/jeromefroe/lru-rs/releases) - [Changelog](https://github.com/jeromefroe/lru-rs/blob/master/CHANGELOG.md) - [Commits](jeromefroe/lru-rs@0.7.5...0.7.6) --- updated-dependencies: - dependency-name: lru dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
This is aimed at reducing the amount of messages and work done per relay chain block. Currently we send one signed message for each candidate approved under a block as soon as candidate validation completes the check.
We should look at coalescing multiple approval votes for different candidates in a single message, instead of immediately sending individual messages out. This change would introduce some latency and we should calibrate the time we wait for more candidates to be approved to avoid no-shows. I expect this to work really well for tranche 0 approvals with paritytech/polkadot#6782. In the happy case, each validator would send out exactly 2 messages per relay chain block as a tranche0 checker.
We would need to define some V3 primitives for ApprovalVote such that it wraps a bitfield in which each bit represents the candidate by index in the inclusion relay chain block.
Also, IndirectSignedApprovalVote wire message needs be updated to provide the signing context for verification.
The text was updated successfully, but these errors were encountered: