Approve multiple candidates with a single signature #1191

alexggh · 2023-08-27T10:43:45Z

The pr migrates: paritytech/polkadot#7554, preliminary measurements and tests are discussed there.

Initial implementation for the plan discussed here: #701 on top of: #1178

Overall idea

When approval-voting checks a candidate and is ready to advertise the approval, defer it in a per-relay chain block until we either have MAX_APPROVAL_COALESCE_COUNT candidates to sign or a candidate has stayed MAX_APPROVALS_COALESCE_TICKS in the queue, in both cases we sign what candidates we have available.

This should allow us to reduce the number of approvals messages we have to create/send/verify. The parameters are configurable, so we should find some values that balance:

Security of the network: Delaying broadcasting of an approval shouldn't but the finality at risk and to make sure that never happens we won't delay sending a vote if we are past 2/3 from the no-show time.
Scalability of the network: MAX_APPROVAL_COALESCE_COUNT = 1 & MAX_APPROVALS_COALESCE_TICKS =0, is what we have now and we know from the measurements we did on versi, it bottlenecks approval-distribution/approval-voting when increase significantly the number of validators and parachains
Block storage: In case of disputes we have to import this votes on chain and that increase the necessary storage with MAX_APPROVAL_COALESCE_COUNT * CandidateHash per vote. Given that disputes are not the normal way of the network functioning and we will limit MAX_APPROVAL_COALESCE_COUNT in the single digits numbers, this should be good enough. Alternatively, we could try to create a better way to store this on-chain through indirection, if that's needed.

Other fixes:

Fixed the fact that we were sending random assignments to non-validators, that was wrong because those won't do anything with it and they won't gossip it either because they do not have a grid topology set, so we would waste the random assignments.
Added metrics to be able to debug potential no-shows and mis-processing of approvals/assignments.

TODO:

Get feedback, that this is moving in the right direction. @ordian @sandreim @eskimor @burdges, let me know what you think.
More and more testing.
Test in versi.
Make MAX_APPROVAL_COALESCE_COUNT & MAX_APPROVAL_COALESCE_WAIT_MILLIS a parachain host configuration.
Make sure the backwards compatibility works correctly
Make sure this direction is compatible with other streams of work: Slash approval voters on approving invalid blocks - dynamically #635 & Time Disputes #742
Final versi burn-in before merging

Signed-off-by: Andrei Sandu <[email protected]>

The pr migrates: - paritytech/polkadot#7554 Signed-off-by: Alexandru Gheorghe <[email protected]>

Signed-off-by: Alexandru Gheorghe <[email protected]>

…tiple_candidates_polkadot_sdk

Signed-off-by: Alexandru Gheorghe <[email protected]>

…reim/the_v2_assignments

Signed-off-by: Alexandru Gheorghe <[email protected]>

Signed-off-by: Andrei Sandu <[email protected]>

…reim/the_v2_assignments Signed-off-by: Andrei Sandu <[email protected]>

Signed-off-by: Andrei Sandu <[email protected]>

…reim/the_v2_assignments Signed-off-by: Andrei Sandu <[email protected]>

Signed-off-by: Andrei Sandu <[email protected]>

Signed-off-by: Alexandru Gheorghe <[email protected]>

This reverts commit 5e004e1.

…o feature/approve_multiple_candidates_polkadot_sdk_v2

Signed-off-by: Alexandru Gheorghe <[email protected]>

…tiple_candidates_polkadot_sdk_v3

…ignments

V2 was not put into the list of fallbacks for the validation protocol, so the test wrongly fall-backed on v1. Signed-off-by: Alexandru Gheorghe <[email protected]>

…es_polkadot_sdk

Polkadot-Forum · 2024-01-12T07:33:49Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/raising-awareness-new-network-validation-protocol-version-v3-coming/5639/1

## Summary Built on top of the tooling and ideas introduced in #2528, this PR introduces a synthetic benchmark for measuring and assessing the performance characteristics of the approval-voting and approval-distribution subsystems. Currently this allows, us to simulate the behaviours of these systems based on the following dimensions: ``` TestConfiguration: # Test 1 - objective: !ApprovalsTest last_considered_tranche: 89 min_coalesce: 1 max_coalesce: 6 enable_assignments_v2: true send_till_tranche: 60 stop_when_approved: false coalesce_tranche_diff: 12 workdir_prefix: "/tmp" num_no_shows_per_candidate: 0 approval_distribution_expected_tof: 6.0 approval_distribution_cpu_ms: 3.0 approval_voting_cpu_ms: 4.30 n_validators: 500 n_cores: 100 n_included_candidates: 100 min_pov_size: 1120 max_pov_size: 5120 peer_bandwidth: 524288000000 bandwidth: 524288000000 latency: min_latency: secs: 0 nanos: 1000000 max_latency: secs: 0 nanos: 100000000 error: 0 num_blocks: 10 ``` ## The approach 1. We build a real overseer with the real implementations for approval-voting and approval-distribution subsystems. 2. For a given network size, for each validator we pre-computed all potential assignments and approvals it would send, because this a computation heavy operation this will be cached on a file on disk and be re-used if the generation parameters don't change. 3. The messages will be sent accordingly to the configured parameters and those are split into 3 main benchmarking scenarios. ## Benchmarking scenarios ### Best case scenario *approvals_throughput_best_case.yaml* It send to the approval-distribution only the minimum required tranche to gathered the needed_approvals, so that a candidate is approved. ### Behaviour in the presence of no-shows *approvals_no_shows.yaml* It sends the tranche needed to approve a candidate when we have a maximum of *num_no_shows_per_candidate* tranches with no-shows for each candidate. ### Maximum throughput *approvals_throughput.yaml* It sends all the tranches for each block and measures the used CPU and necessary network bandwidth. by the approval-voting and approval-distribution subsystem. ## How to run it ``` cargo run -p polkadot-subsystem-bench --release -- test-sequence --path polkadot/node/subsystem-bench/examples/approvals_throughput.yaml ``` ## Evaluating performance ### Use the real subsystems metrics If you follow the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-grafana for installing locally prometheus and grafana, all real metrics for the `approval-distribution`, `approval-voting` and overseer are available. E.g: <img width="2149" alt="Screenshot 2023-12-05 at 11 07 46" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/cb8ae2dd-178b-4922-bfa4-dc37e572ed38"> <img width="2551" alt="Screenshot 2023-12-05 at 11 09 42" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/8b4542ba-88b9-46f9-9b70-cc345366081b"> <img width="2154" alt="Screenshot 2023-12-05 at 11 10 15" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/b8874d8d-632e-443a-9840-14ad8e90c54f"> <img width="2535" alt="Screenshot 2023-12-05 at 11 10 52" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/779a439f-fd18-4985-bb80-85d5afad78e2"> ### Profile with pyroscope 1. Setup pyroscope following the steps in https://github.com/paritytech/polkadot-sdk/tree/master/polkadot/node/subsystem-bench#install-pyroscope, then run any of the benchmark scenario with `--profile` as the arguments. 2. Open the pyroscope dashboard in grafana, e.g: <img width="2544" alt="Screenshot 2024-01-09 at 17 09 58" src="https://github.com/paritytech/polkadot-sdk/assets/49718502/58f50c99-a910-4d20-951a-8b16639303d9"> ### Useful logs 1. Network bandwidth requirements: ``` Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block ``` 2. Cpu usage by the approval-distribution/approval-voting subsystems. ``` approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` 3. Time passed until a given block is approved ``` Chain selection approved after 3500 ms hash=0x0101010101010101010101010101010101010101010101010101010101010101 Chain selection approved after 4500 ms hash=0x0202020202020202020202020202020202020202020202020202020202020202 ``` ### Using benchmark to quantify improvements from #1178 + #1191 Using a versi-node we compare the scenarios where all new optimisations are disabled with a scenarios where tranche0 assignments are sent in a single message and a conservative simulation where the coalescing of approvals gives us just 50% reduction in the number of messages we send. Overall, what we see is a speedup of around 30-40% in the time it takes to process the necessary messages and a 30-40% reduction in the necessary bandwidth. #### Best case scenario comparison(minimum required tranches sent). Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 53289 KiB total, 5328 KiB/block Payload bytes sent to peers: 52489 KiB total, 5248 KiB/block approval-distribution CPU usage 6.732s approval-distribution CPU usage per block 0.673s approval-voting CPU usage 9.523s approval-voting CPU usage per block 0.952s ``` vs Optimisation enabled ``` Number of blocks: 10 Payload bytes received from peers: 32141 KiB total, 3214 KiB/block Payload bytes sent to peers: 37314 KiB total, 3731 KiB/block approval-distribution CPU usage 4.658s approval-distribution CPU usage per block 0.466s approval-voting CPU usage 6.236s approval-voting CPU usage per block 0.624s ``` #### Worst case all tranches sent, very unlikely happens when sharding breaks. Unoptimised ``` Number of blocks: 10 Payload bytes received from peers: 746393 KiB total, 74639 KiB/block Payload bytes sent to peers: 729151 KiB total, 72915 KiB/block approval-distribution CPU usage 118.681s approval-distribution CPU usage per block 11.868s approval-voting CPU usage 124.118s approval-voting CPU usage per block 12.412s ``` vs optimised ``` Number of blocks: 10 Payload bytes received from peers: 503993 KiB total, 50399 KiB/block Payload bytes sent to peers: 629971 KiB total, 62997 KiB/block approval-distribution CPU usage 84.061s approval-distribution CPU usage per block 8.406s approval-voting CPU usage 96.532s approval-voting CPU usage per block 9.653s ``` ## TODOs [x] Polish implementation. [x] Use what we have so far to evaluate #1191 before merging. [x] List of features and additional dimensions we want to use for benchmarking. [x] Run benchmark on hardware similar with versi and kusama nodes. [ ] Add benchmark to be run in CI for catching regression in performance. [ ] Rebase on latest changes for network emulation. --------- Signed-off-by: Andrei Sandu <[email protected]> Signed-off-by: Alexandru Gheorghe <[email protected]> Co-authored-by: Andrei Sandu <[email protected]> Co-authored-by: Andrei Sandu <[email protected]>

... to add approval_voting_params API which will allow us to enable approvals coalescing implementation from: - paritytech/polkadot-sdk#1191 Note! Bumping the version will not enable the new logic, that will be enable at a later date we we decide to call set_approval_voting_params with max_approval_coalesce_count greater than 1. Signed-off-by: Alexandru Gheorghe <[email protected]>

... to add approval_voting_params API which will allow us to enable approvals coalescing implementation from: - paritytech/polkadot-sdk#1191 Note! Bumping the version will not enable the new logic, that will be enable at a later date we we decide to call set_approval_voting_params with max_approval_coalesce_count greater than 1.   --------- Signed-off-by: Alexandru Gheorghe <[email protected]>

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

Polkadot-Forum · 2024-05-21T14:59:56Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/what-are-subsystem-benchmarks/8212/1

Polkadot-Forum · 2024-05-21T16:34:05Z

This pull request has been mentioned on Polkadot Forum. There might be relevant details there:

https://forum.polkadot.network/t/update-validator-set-size-increase-on-kusama/8218/1

sandreim and others added 5 commits August 25, 2023 19:15

merge from archived repo

7230df4

Signed-off-by: Andrei Sandu <[email protected]>

cargo lock

d04c182

Signed-off-by: Andrei Sandu <[email protected]>

Merge remote-tracking branch 'origin' into sandreim/the_v2_assignments

f4f0e70

Signed-off-by: Andrei Sandu <[email protected]>

Approve multiple candidates with a single signature

341c7af

The pr migrates: - paritytech/polkadot#7554 Signed-off-by: Alexandru Gheorghe <[email protected]>

Fix build warnings

619fff2

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 342308e to 619fff2 Compare August 27, 2023 11:20

alexggh mentioned this pull request Aug 27, 2023

[DNM] Migrate PR/7554 from polkadot repo #1172

Closed

alexggh added 5 commits August 27, 2023 14:51

Merge remote-tracking branch 'origin/master' into feature/approve_mul…

5f1558d

…tiple_candidates_polkadot_sdk

ci: fix worker binaries could not be found

ed1d9d0

Signed-off-by: Alexandru Gheorghe <[email protected]>

Add missing bits

7d7b82c

Signed-off-by: Alexandru Gheorghe <[email protected]>

Build with network-protocol-staging

7bc13d3

Signed-off-by: Alexandru Gheorghe <[email protected]>

Validate disconnect theory

53f8556

Signed-off-by: Alexandru Gheorghe <[email protected]>

alexggh force-pushed the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch from 1cb26cd to 7bc13d3 Compare August 28, 2023 09:22

sandreim and others added 3 commits August 28, 2023 15:54

Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

442b1e4

…reim/the_v2_assignments

Log errors when banning peers

5e004e1

Signed-off-by: Alexandru Gheorghe <[email protected]>

fix zombienet test

9850b2f

Signed-off-by: Andrei Sandu <[email protected]>

alexggh mentioned this pull request Aug 29, 2023

Versi high number of PeerDisconnect when scaling up number of validators and parachains #1263

Closed

sandreim and others added 10 commits August 29, 2023 19:06

Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

f71eb31

…reim/the_v2_assignments Signed-off-by: Andrei Sandu <[email protected]>

cargo lock

46cfaf1

Signed-off-by: Andrei Sandu <[email protected]>

Merge branch 'master' of github.com:paritytech/polkadot-sdk into sand…

0086502

…reim/the_v2_assignments Signed-off-by: Andrei Sandu <[email protected]>

superfluous

47beabd

Signed-off-by: Andrei Sandu <[email protected]>

Merge branch 'master' into sandreim/the_v2_assignments

ee88408

Separate approval

3d3e37c

Signed-off-by: Alexandru Gheorghe <[email protected]>

Revert "Log errors when banning peers"

da61d98

This reverts commit 5e004e1.

Merge remote-tracking branch 'origin/sandreim/the_v2_assignments' int…

9c0375c

…o feature/approve_multiple_candidates_polkadot_sdk_v2

Cleanup post migrating hacks when migrating from polkadot repo

f3fee24

Signed-off-by: Alexandru Gheorghe <[email protected]>

Fixup clippy

6338d33

Signed-off-by: Alexandru Gheorghe <[email protected]>

sandreim mentioned this pull request Sep 19, 2023

[DNM] Test / Debug #1635

Closed

alexggh added 2 commits September 25, 2023 16:37

Merge remote-tracking branch 'origin/master' into feature/approve_mul…

d4fb01a

…tiple_candidates_polkadot_sdk_v3

Merge remote-tracking branch 'origin/master' into sandreim/the_v2_ass…

5832ad7

…ignments

alexggh and others added 2 commits December 12, 2023 13:56

Fixup 0002-upgrade-node failures

8099c16

V2 was not put into the list of fallbacks for the validation protocol, so the test wrongly fall-backed on v1. Signed-off-by: Alexandru Gheorghe <[email protected]>

Merge branch 'master' into alexaggh/feature/approve_multiple_candidat…

44f0210

…es_polkadot_sdk

alexggh merged commit a84dd0d into master Dec 13, 2023
115 of 116 checks passed

alexggh deleted the alexaggh/feature/approve_multiple_candidates_polkadot_sdk branch December 13, 2023 06:43

alexggh mentioned this pull request Jan 12, 2024

Support for new network validation protocol(v3) qdrvm/kagome#1923

Closed

github-actions bot mentioned this pull request Feb 19, 2024

Update substrate/polkadot/cumulus from v1.3.0 to v1.6.0 moondance-labs/tanssi#419

Closed

alexggh mentioned this pull request Feb 28, 2024

Bump ParachainHost to api version 10 on kusama polkadot-fellows/runtimes#204

Merged

github-actions bot mentioned this pull request Mar 13, 2024

Update polkadot-sdk from v1.3.0 to v1.7.2 moonbeam-foundation/moonbeam#2703

Closed

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024

Message transactions mortality (paritytech#1191)

dbf4d3b

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024

Message transactions mortality (paritytech#1191)

8bd190d

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024

Message transactions mortality (paritytech#1191)

2c66630

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 8, 2024

Message transactions mortality (paritytech#1191)

c4d3cea

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

7be04e6

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

fd94fdf

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

0996fbc

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

77090a5

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

a5bf6fb

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 9, 2024

Message transactions mortality (paritytech#1191)

c72c9f4

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024

Message transactions mortality (paritytech#1191)

ca967d8

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

serban300 pushed a commit to serban300/polkadot-sdk that referenced this pull request Apr 10, 2024

Message transactions mortality (paritytech#1191)

10dcf95

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

bkchr pushed a commit that referenced this pull request Apr 10, 2024

Message transactions mortality (#1191)

1ef41a5

* transactions mortality in message and complex relays * logging + enable in test deployments * spellcheck * fmt

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approve multiple candidates with a single signature #1191

Approve multiple candidates with a single signature #1191

alexggh commented Aug 27, 2023 •

edited

Loading

Polkadot-Forum commented Jan 12, 2024

Polkadot-Forum commented May 21, 2024

Polkadot-Forum commented May 21, 2024

Approve multiple candidates with a single signature #1191

Approve multiple candidates with a single signature #1191

Conversation

alexggh commented Aug 27, 2023 • edited Loading

Overall idea

Other fixes:

TODO:

Polkadot-Forum commented Jan 12, 2024

Polkadot-Forum commented May 21, 2024

Polkadot-Forum commented May 21, 2024

alexggh commented Aug 27, 2023 •

edited

Loading