Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missed rounds #1245

Open
wants to merge 33 commits into
base: main
Choose a base branch
from
Open

Missed rounds #1245

wants to merge 33 commits into from

Conversation

drsk0
Copy link
Contributor

@drsk0 drsk0 commented Oct 16, 2024

Purpose

Compute the missed rounds for validators and update them in the scheduler.

Changes

See above

Checklist

  • My code follows the style of this project.
  • The code compiles without warnings.
  • I have performed a self-review of the changes.
  • I have documented my code, in particular the intent of the
    hard-to-understand areas.
  • (If necessary) I have updated the CHANGELOG.

CLA acceptance

_Remove if not applicable.

By submitting the contribution I accept the terms and conditions of the
Contributor License Agreement v1.0

@drsk0 drsk0 force-pushed the suspend_resume_configure_baker branch 2 times, most recently from 0eda6ae to be864c2 Compare October 17, 2024 13:13
Base automatically changed from suspend_resume_configure_baker to main October 21, 2024 09:01
@drsk0 drsk0 force-pushed the missed_rounds branch 3 times, most recently from bd0dc99 to 94c9f4a Compare October 22, 2024 12:45
@drsk0 drsk0 marked this pull request as ready for review October 23, 2024 08:28
@drsk0 drsk0 requested a review from td202 October 23, 2024 10:48
finalizationAwake :: !Bool
finalizationAwake :: !Bool,
-- | The number of missed rounds in the reward period
missedRounds :: !Word16
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use Word64 so we don't even have to consider overflow being a problem for this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field also has to be conditional, because it won't be present before P8.

Copy link
Contributor Author

@drsk0 drsk0 Oct 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed it to Word64. I think even Word16 is unlikely to overflow though. General question: Do we optimize on the size of serialized types or we don't care?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also made the field optional. Another general question: What are the benefits of this current approach of checking for backwards compatibility in the type system, over just supporting one protocol version at HEAD and have well tested migration paths from the previous protocol version? I feel the effort is not proportional to the benefits at the moment, but maybe I'm missing something.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word16 shouldn't overflow with the current chain parameters, but we shouldn't rely on that.

I think it is generally good to optimize for space somewhat, but balance the amount of space saved vs complexity. In this case, I think it could be worth doing some space saving. For instance, finalizationAwake takes a whole byte of serialization, but only the low-order bit is used - you could use another bit to record if missedRounds is non-zero. Usually it will be zero, and as there's going to be at least one of these written per block, the savings could be worth while.

On the backwards compatibility question, I'm not entirely sure what the alternative you are suggesting is. In practise, the node would need to support at least 2 protocol versions at once in order to migrate. As it stands, the node can catch up from the original genesis to the latest block. If we were implementing a new node fresh today, maybe we wouldn't bother with the older protocol versions at all, and we might do a bunch of things differently from the outset. However, the node as it exists is a result of a series of incremental steps, and the approach to protocol versions that we have allows us to make them reasonably easily, in my opinion.

Comment on lines +171 to +172
-- | The current round.
Round ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not really needed, because the current round will be tcRound tc + 1 if the timeout certificate is present.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't the tcRound point to some earlier round?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In the timeout certificate checking code, we have:

                Present tc
                    -- Check that the TC round is correct.
                    | tcRound tc /= blockRound pendingBlock - 1 -> do
                        logEvent Konsensus LLTrace $ "Block " <> show pbHash <> " timeout certificate is not for the correct round."
                        flag $ BlockTCRoundInconsistent sBlock
                        rejectBlock

@drsk0 drsk0 force-pushed the missed_rounds branch 7 times, most recently from 8c77619 to e55f767 Compare November 6, 2024 17:08
Copy link
Contributor

@td202 td202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. I think the main thing is to add/update testing for rotating the capital distribution.

concordium-consensus/blb755855-143.dat Outdated Show resolved Hide resolved
Comment on lines +277 to +279
-- | Rotate the capital distribution, so that the current capital distribution
-- is replaced by the next one, and set up empty pool rewards, except that
-- missed rounds are carried over from the old pool rewards.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should have some tests for this. SchedulerTests.KonsensusV1.EpochTransition already has tests for rotating capital distribution, so I'd suggest adding some there. In particular, we want to test cases where there are bakers with missed rounds and some bakers are added and removed to ensure the result is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are currently pinned to P7. I will move those to P8 so we can test something meaningful.

Comment on lines +171 to +172
-- | The current round.
Round ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. In the timeout certificate checking code, we have:

                Present tc
                    -- Check that the TC round is correct.
                    | tcRound tc /= blockRound pendingBlock - 1 -> do
                        logEvent Konsensus LLTrace $ "Block " <> show pbHash <> " timeout certificate is not for the correct round."
                        flag $ BlockTCRoundInconsistent sBlock
                        rejectBlock

@drsk0 drsk0 force-pushed the missed_rounds branch 3 times, most recently from 876fd34 to 71a853b Compare November 7, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants