Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.0-beta3] avoid unnecessary state history log "unrotations" that can remove retained logs on replay/resync #298

Merged
merged 2 commits into from
Jun 16, 2024

Conversation

spoonincode
Copy link
Member

@spoonincode spoonincode commented Jun 15, 2024

In the early days, ship would truncate its log any time the applied block number was less than the log's head block. But this could lead to substantial chunks of the log being removed in the case of a replay or resync (such as when restarting from a snapshot).1

So instead, sometime in leap, the log behavior was modified such that when the applied block number decreases relative to the head of the log, the log would check if the blockid in the log matched that as the new applied block. If it matched, it'd do nothing, since that is indicative of a replay or resync. If it did not match, the log would be truncated as before, since that is indicative of a new fork being applied.

The log rewrite in spring 1.0 maintains this strategy. However, the blockid duplication check is performed in the log, not the log_catalog. This means for a partitioned log where a block is being applied in a retained log the log_catalog would always unrotate any higher numbered logs before figuring out it doesn't need to "truncate" (append now in spring1) the log.

So, add a duplication check in log_catalog too to guard against this. And add a handful of tests that are in the same ballpark as replay/resync (writing identical blocks to the log again).

(it is a little uneasy to see some code duplication in log and log_catalog. The original idea was that log and log_catalog would share the same public interface such that if nodeos was configured without log splitting it could simply use log directly -- maybe via some template even since the interface would completely match up. This commonality of interface still remains: pack_and_write_entry(), get_entry(), get_block_id(), block_range(), and empty() all have an identical interface between the two. But there was nothing ever attempted to strictly use log if that's all that is required)

Footnotes

  1. There also could be disastrous consequences if accidentally restarting from genesis that would erase the entire log. That is guarded against differently than what is further discussed here.

//check if this log already has the same blockid at the given blocknum. This is indicative of a resync or replay and there is no need to write the
// same log entry again. otherwise we risk unrotating and blowing away existing log files
if(get_block_id(block_num) == id)
return;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to add this feature now, but I have had to provide a patch in the past to bypass this behavior so that a corrupted SHiP log could be repaired by replaying from a snapshot before the corrupted log.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that reminds me of AntelopeIO/leap#2393

Probably we should add something to spring-util to allow simple mutations of the log similar to the block log.

@ericpassmore
Copy link
Contributor

Note:start
group: STABILITY
category: INTERNALS
summary: Improved block id duplication logic to retain the correct state history with replay/resync.
Note:end

@spoonincode spoonincode merged commit 253c12f into release/1.0-beta3 Jun 16, 2024
36 checks passed
@spoonincode spoonincode deleted the avoid_unrotate_+_rewrite_tests_beta3 branch June 16, 2024 00:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants