-
Notifications
You must be signed in to change notification settings - Fork 1.6k
backing: move the min votes threshold to the runtime #7577
base: master
Are you sure you want to change the base?
Conversation
Yes this should only ever change on session boundaries, but we should not put it into |
@@ -155,6 +162,34 @@ impl RuntimeInfo { | |||
self.get_session_info_by_index(sender, relay_parent, session_index).await | |||
} | |||
|
|||
/// Get minimum_backing_votes by relay parent hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically this is per session, but you use the relay parent hash to query the state at.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cache is indexed by the session index but if the value is not present in the cache it falls back to requesting it from the runtime (which always needs the relay parent, right?).
Maybe the doc comment is confusing, is that it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like Get the minimum backing votes for the specified session.
would be a better description.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relay parent is needed in case there is a cache miss, because you need to fetch the state of some relay parent. Comment is indeed misleading, as we are fetching by session index, we just use the relay parent to access the state.
/// requisite number of votes for validity from a group. | ||
fn requisite_votes(&self, group: &Self::GroupId) -> usize; | ||
/// Get a validator group size. | ||
fn get_group_size(&self, group: &Self::GroupId) -> Option<usize>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to return Option ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it should ever return None
under the current usage model but it could if there's a bug.
Still, the method needs to return Option because it queries a Hashmap
and we can't know at compile-time whether or not the validator group is present for all the paraIDs.
also cache it per-session in the backing subsystem Signed-off-by: alindima <[email protected]>
424f3a4
to
5feadf4
Compare
I had to force-push because of the numerous conflicts after merging the async backing PR, sorry about that. |
The CI pipeline was cancelled due to failure one of the required jobs. |
Implementing session buffering for many storage items is probably painful. As suggested in the issue I still believe the best way forward is to have a |
…cking-votes-to-runtime
also enable it for rococo/versi for testing
this dependency has been recently introduced by async backing
this is not needed, as the RuntimeAPISubsystem already takes care of versioning and will return NotSupported if the version is not right.
Companion for paritytech/polkadot#7577 Signed-off-by: alindima <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall!
@@ -155,6 +162,34 @@ impl RuntimeInfo { | |||
self.get_session_info_by_index(sender, relay_parent, session_index).await | |||
} | |||
|
|||
/// Get minimum_backing_votes by relay parent hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The relay parent is needed in case there is a cache miss, because you need to fetch the state of some relay parent. Comment is indeed misleading, as we are fetching by session index, we just use the relay parent to access the state.
on_demand_fee_variability : Perbill::from_percent(3), | ||
on_demand_target_queue_utilization : Perbill::from_percent(25), | ||
on_demand_ttl : 5u32.into(), | ||
minimum_backing_votes : 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wasn't there a constant we can use?
@@ -85,6 +85,10 @@ fn default_allowed_relay_parent_tracker() -> AllowedRelayParentsTracker<Hash, Bl | |||
allowed | |||
} | |||
|
|||
fn minimum_backing_votes(group_len: usize) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should expose a variant of this function in primitives and use it both on the node side and the runtime side.
@@ -983,10 +996,18 @@ async fn construct_per_relay_parent_state<Context>( | |||
|
|||
let parent = relay_parent; | |||
|
|||
let (validators, groups, session_index, cores) = futures::try_join!( | |||
let session_index = | |||
try_runtime_api!(runtime_info.get_session_index_for_child(ctx.sender(), parent).await); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, this wasn't introduced by you, but this macro completely defeats the point of our error handling. If we don't want to bring down the node, the correct way of handling this would be to make all runtime errors non fatal, instead of swallowing them in a macro.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only fatal error we have here is canceled
which never happens, except the node is shutting down, so this is fine.
try_runtime_api!(runtime_info.get_session_index_for_child(ctx.sender(), parent).await); | ||
|
||
let minimum_backing_votes = | ||
request_min_backing_votes(parent, ctx.sender(), |parent, sender| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need this wrapper? Can't get_min_backing_votes
not already fall back to the constant in case of unsupported?
relay_parent: Hash, | ||
minimum_backing_votes: u32, | ||
) { | ||
self.minimum_backing_votes.put(relay_parent, minimum_backing_votes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. This should be cached per SessionIndex
not per relay parent as it is only supposed to change at session boundaries.
state | ||
.per_session | ||
.insert(session_index, PerSessionState::new(session_info, &state.keystore)); | ||
let minimum_backing_votes = request_min_backing_votes( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's why you have this wrapper, because we don't use RuntimeInfo
everywhere. One point of RuntimeInfo
was developer ergonomics. Therefore it feels kind of bad to make the calls to runtime_info more clunky. I would do the legacy check right in the get function and provide a standalone utility function for subsystems not using RuntimeInfo
.
Conceptually we should be using RuntimeInfo
everywhere or not at all. So full cleanup would be to port subsystems to the same, but that's a different story not needed to be tackled now. With the SessionInfo
split up we might re-consider a few things anyway.
@@ -606,6 +606,8 @@ pub enum RuntimeApiRequest { | |||
Authorities(RuntimeApiSender<Vec<AuthorityDiscoveryId>>), | |||
/// Get the current validator set. | |||
Validators(RuntimeApiSender<Vec<ValidatorId>>), | |||
/// Get the minimum required backing votes. | |||
MinimumBackingVotes(RuntimeApiSender<u32>), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should contain the SessionIndex
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @alindima , it looks good to me so far.
Also, before any test on Versi you should do a first check with Zombienet.
StorageVersion::new(9).put::<Pallet<T>>(); | ||
|
||
weight_consumed | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: We could be more descriptive in the warning message, and also account for the case when someone is accidentally skipping a migration and current storage version is 7 when upgrading from an older runtime.
pub async fn request_min_backing_votes<'a, S, Func, Fut>( | ||
parent: Hash, | ||
sender: &'a mut S, | ||
get_min_backing_votes: Func, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just move the code in the function instead of having to pass the closure ?
@@ -155,6 +162,34 @@ impl RuntimeInfo { | |||
self.get_session_info_by_index(sender, relay_parent, session_index).await | |||
} | |||
|
|||
/// Get minimum_backing_votes by relay parent hash. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Something like Get the minimum backing votes for the specified session.
would be a better description.
total_validity += 1 | ||
} | ||
fn get_group_size(&self, group: &Self::GroupId) -> Option<usize> { | ||
let count = self.authorities.values().filter(|g| *g == group).count(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it doesn't make sense to require the backing group size without having a candidate backed first by a group of at least 1 validator.
Can count
ever be 0 in any valid scenario ? I think it could only happen if there are more backing groups than number of validators, but that implies some bug that allows candidates to be backed outside the context of our scheduling.
So, we could just return 0 and you wouldn't need the Option.
Companion: paritytech/cumulus#3050
TODO: test with versi