Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

backing: move the min votes threshold to the runtime #7577

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

alindima
Copy link
Contributor

@alindima alindima commented Aug 3, 2023

Companion: paritytech/cumulus#3050

  • the value is stored in the HostConfiguration
  • the client will now retrieve this value via a new runtime call (currently staging) and cache it per-session

TODO: test with versi

@alindima alindima marked this pull request as draft August 3, 2023 08:56
@paritytech-ci paritytech-ci requested review from a team August 3, 2023 08:56
@alindima alindima requested a review from sandreim August 3, 2023 08:57
@eskimor
Copy link
Member

eskimor commented Aug 3, 2023

Yes this should only ever change on session boundaries, but we should not put it into SessionInfo directly because of this. It should be session buffered as SessionInfo right now though, just not packed together in a single struct.

@@ -155,6 +162,34 @@ impl RuntimeInfo {
self.get_session_info_by_index(sender, relay_parent, session_index).await
}

/// Get minimum_backing_votes by relay parent hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically this is per session, but you use the relay parent hash to query the state at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cache is indexed by the session index but if the value is not present in the cache it falls back to requesting it from the runtime (which always needs the relay parent, right?).

Maybe the doc comment is confusing, is that it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like Get the minimum backing votes for the specified session. would be a better description.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relay parent is needed in case there is a cache miss, because you need to fetch the state of some relay parent. Comment is indeed misleading, as we are fetching by session index, we just use the relay parent to access the state.

/// requisite number of votes for validity from a group.
fn requisite_votes(&self, group: &Self::GroupId) -> usize;
/// Get a validator group size.
fn get_group_size(&self, group: &Self::GroupId) -> Option<usize>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it need to return Option ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it should ever return None under the current usage model but it could if there's a bug.

Still, the method needs to return Option because it queries a Hashmap and we can't know at compile-time whether or not the validator group is present for all the paraIDs.

node/core/backing/src/lib.rs Outdated Show resolved Hide resolved
primitives/src/runtime_api.rs Outdated Show resolved Hide resolved
also cache it per-session in the backing subsystem

Signed-off-by: alindima <[email protected]>
@alindima alindima force-pushed the alindima-add-min-backing-votes-to-runtime branch from 424f3a4 to 5feadf4 Compare August 21, 2023 11:14
@alindima
Copy link
Contributor Author

I had to force-push because of the numerous conflicts after merging the async backing PR, sorry about that.
How are such situations handled normally? (conflicts with the base branch)

@paritytech-cicd-pr
Copy link

The CI pipeline was cancelled due to failure one of the required jobs.
Job name: test-linux-stable
Logs: https://gitlab.parity.io/parity/mirrors/polkadot/-/jobs/3419482

@rphmeier
Copy link
Contributor

Yes this should only ever change on session boundaries, but we should not put it into SessionInfo directly because of paritytech/polkadot-sdk#734.

Implementing session buffering for many storage items is probably painful. As suggested in the issue I still believe the best way forward is to have a struct SessionInfo used by the runtime but not exposed elsewhere.

@alindima alindima added A3-in_progress Pull request is in progress. No review needed at this stage. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. E0-runtime_migration PR introduces code that might require downstream chains to run a runtime upgrade. labels Aug 22, 2023
also enable it for rococo/versi for testing
this dependency has been recently introduced by async backing
this is not needed, as the RuntimeAPISubsystem already takes care
of versioning and will return NotSupported if the version is not
right.
alindima added a commit to paritytech/cumulus that referenced this pull request Aug 23, 2023
@alindima alindima added A0-please_review Pull request needs code review. and removed A3-in_progress Pull request is in progress. No review needed at this stage. labels Aug 24, 2023
Copy link
Member

@eskimor eskimor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall!

@@ -155,6 +162,34 @@ impl RuntimeInfo {
self.get_session_info_by_index(sender, relay_parent, session_index).await
}

/// Get minimum_backing_votes by relay parent hash.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The relay parent is needed in case there is a cache miss, because you need to fetch the state of some relay parent. Comment is indeed misleading, as we are fetching by session index, we just use the relay parent to access the state.

on_demand_fee_variability : Perbill::from_percent(3),
on_demand_target_queue_utilization : Perbill::from_percent(25),
on_demand_ttl : 5u32.into(),
minimum_backing_votes : 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wasn't there a constant we can use?

@@ -85,6 +85,10 @@ fn default_allowed_relay_parent_tracker() -> AllowedRelayParentsTracker<Hash, Bl
allowed
}

fn minimum_backing_votes(group_len: usize) -> usize {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should expose a variant of this function in primitives and use it both on the node side and the runtime side.

@@ -983,10 +996,18 @@ async fn construct_per_relay_parent_state<Context>(

let parent = relay_parent;

let (validators, groups, session_index, cores) = futures::try_join!(
let session_index =
try_runtime_api!(runtime_info.get_session_index_for_child(ctx.sender(), parent).await);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this wasn't introduced by you, but this macro completely defeats the point of our error handling. If we don't want to bring down the node, the correct way of handling this would be to make all runtime errors non fatal, instead of swallowing them in a macro.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only fatal error we have here is canceled which never happens, except the node is shutting down, so this is fine.

try_runtime_api!(runtime_info.get_session_index_for_child(ctx.sender(), parent).await);

let minimum_backing_votes =
request_min_backing_votes(parent, ctx.sender(), |parent, sender| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this wrapper? Can't get_min_backing_votes not already fall back to the constant in case of unsupported?

relay_parent: Hash,
minimum_backing_votes: u32,
) {
self.minimum_backing_votes.put(relay_parent, minimum_backing_votes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. This should be cached per SessionIndex not per relay parent as it is only supposed to change at session boundaries.

state
.per_session
.insert(session_index, PerSessionState::new(session_info, &state.keystore));
let minimum_backing_votes = request_min_backing_votes(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's why you have this wrapper, because we don't use RuntimeInfo everywhere. One point of RuntimeInfo was developer ergonomics. Therefore it feels kind of bad to make the calls to runtime_info more clunky. I would do the legacy check right in the get function and provide a standalone utility function for subsystems not using RuntimeInfo.

Conceptually we should be using RuntimeInfo everywhere or not at all. So full cleanup would be to port subsystems to the same, but that's a different story not needed to be tackled now. With the SessionInfo split up we might re-consider a few things anyway.

@@ -606,6 +606,8 @@ pub enum RuntimeApiRequest {
Authorities(RuntimeApiSender<Vec<AuthorityDiscoveryId>>),
/// Get the current validator set.
Validators(RuntimeApiSender<Vec<ValidatorId>>),
/// Get the minimum required backing votes.
MinimumBackingVotes(RuntimeApiSender<u32>),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should contain the SessionIndex.

Copy link
Contributor

@sandreim sandreim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @alindima , it looks good to me so far.

Also, before any test on Versi you should do a first check with Zombienet.

StorageVersion::new(9).put::<Pallet<T>>();

weight_consumed
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: We could be more descriptive in the warning message, and also account for the case when someone is accidentally skipping a migration and current storage version is 7 when upgrading from an older runtime.

pub async fn request_min_backing_votes<'a, S, Func, Fut>(
parent: Hash,
sender: &'a mut S,
get_min_backing_votes: Func,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just move the code in the function instead of having to pass the closure ?

@@ -155,6 +162,34 @@ impl RuntimeInfo {
self.get_session_info_by_index(sender, relay_parent, session_index).await
}

/// Get minimum_backing_votes by relay parent hash.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like Get the minimum backing votes for the specified session. would be a better description.

total_validity += 1
}
fn get_group_size(&self, group: &Self::GroupId) -> Option<usize> {
let count = self.authorities.values().filter(|g| *g == group).count();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it doesn't make sense to require the backing group size without having a candidate backed first by a group of at least 1 validator.

Can count ever be 0 in any valid scenario ? I think it could only happen if there are more backing groups than number of validators, but that implies some bug that allows candidates to be backed outside the context of our scheduling.

So, we could just return 0 and you wouldn't need the Option.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D5-nicetohaveaudit ⚠️ PR contains trivial changes to logic that should be properly reviewed. E0-runtime_migration PR introduces code that might require downstream chains to run a runtime upgrade.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants