-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Archival bucketlist #4403
base: master
Are you sure you want to change the base?
Archival bucketlist #4403
Conversation
76b6594
to
05d6144
Compare
5cace22
to
ca87d88
Compare
Cargo.toml
Outdated
@@ -3,7 +3,7 @@ members = ["src/rust", "lib/tracy-client-sys"] | |||
|
|||
[patch.crates-io] | |||
tracy-client-sys = { path = "lib/tracy-client-sys" } | |||
stellar-xdr = { git = "https://github.com/stellar/rs-stellar-xdr", rev = "ae805d0f8c28ca86327a834eea9ce7d29b0a63bb" } | |||
stellar-xdr = { git = "https://github.com/stellar/rs-stellar-xdr", rev = "9a31b81412712ab2dfbe32cbeaba6bdb754c264b" } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to bump this to the most recent revision and wrap all the new XDR uses into the next version macro (or wait until the protocol release)
src/main/main.cpp
Outdated
@@ -382,7 +382,7 @@ main(int argc, char* const* argv) | |||
// built with vnext, causing a curr diff against next. This works now | |||
// because the xdr is indentical, but the moment that changes this checkk | |||
// will fail and will need to be fixed. | |||
checkXDRFileIdentity(); | |||
// checkXDRFileIdentity(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to revert this
src/bucket/BucketList.cpp
Outdated
template <> BucketListDepth BucketListBase<LiveBucket>::kNumLevels = 11; | ||
|
||
// TODO: I made this number up, do some analysis and pick a better value or | ||
// make this a configurable network config. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, can we actually change this value at runtime in-between ledgers?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. We will probably end up making this a config setting at which point I'll change it to a dynamic runtime value, but for now it's easier just to inherit the older BucketList's global const value for this refactor.
src/bucket/BucketList.cpp
Outdated
|
||
for (uint32_t i = static_cast<uint32>(mLevels.size()) - 1; i != 0; --i) | ||
{ | ||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to remove this (or uncomment if that's universally useful for debuggin)
if (protocolVersionStartsFrom(currentHeader.ledgerVersion, | ||
ProtocolVersion::V_22)) | ||
{ | ||
// TODO: Hash Archive Bucket |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the status of this TODO? I.e. is the PR blocked on this?
@@ -1148,26 +1304,30 @@ BucketManagerImpl::assumeState(HistoryArchiveState const& has, | |||
ZoneScoped; | |||
releaseAssertOrThrow(mApp.getConfig().MODE_ENABLES_BUCKETLIST); | |||
|
|||
for (uint32_t i = 0; i < BucketList::kNumLevels; ++i) | |||
// TODO: Assume archival bucket state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about this TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outstanding TODOs are outside the scope of this PR. I've added some additional context on their dependencies. They require changing the history archive structure, which is a pretty big endeavor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's maybe open an issue or a few issues for these - in general, Core codebase avoids TODOs merged, and if we do have some, I would prefer to have them point at the issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do, the master issue has most of these covered in sub-issues: #4392. I agree it's a little messy but there's a lot of changes that are all kinda have dependencies and I'm trying to merge them in small pieces
src/bucket/BucketSnapshot.h
Outdated
std::vector<BulkLoadReturnT>& result, | ||
LedgerKeyMeter* lkMeter) const; | ||
|
||
// friend struct BucketLevelSnapshot<BucketT>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove
src/bucket/LedgerCmp.h
Outdated
if (aty != HOT_ARCHIVE_DELETED && aty != HOT_ARCHIVE_LIVE) | ||
{ | ||
throw std::runtime_error( | ||
"Malformed bucket: expected DELETED/RESTORED key."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have 'RESTORED' key category, do we?
src/util/ProtocolVersion.h
Outdated
@@ -33,7 +33,8 @@ enum class ProtocolVersion : uint32_t | |||
V_18, | |||
V_19, | |||
V_20, | |||
V_21 | |||
V_21, | |||
V_22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't forget to update this to include v23 and use it where appropriate.
I also would suggest using a named constant instead of the direct version checks in order to make changes easier.
Note: changes in this PR impact BucketListDB quite a bit, I think we need to resolve #4433 to make sure we have good coverage. Wdyt? |
6ca5887
to
344647f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM on the surface level. The code looks fine for now. It would be nicer if this PR was smaller, but I'm not sure if there is a way to stage this change. Rebasing should at least help getting rid of meta diffs.
src/bucket/Bucket.h
Outdated
}; | ||
|
||
template <class BucketT> class SearchableBucketListSnapshot; | ||
class LiveBucket : public Bucket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Live/Hot buckets/bucket lists would benefit from at least top-level doc comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some comments and rebased.
2ee0adc
to
4e5753c
Compare
4e5753c
to
1022bf7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for slow review. Mostly all makes sense and looks .. well, a little gnarly, but good! Especially in light of recent chat.
Requested changes fall into 3 groups:
- A handful of "definite" changes (like I think one or two little typo-sized bugs)
- A handful of "maybe you could refactor/dedupe this new duplication" cases
- An even-more-general / open-ended "is this really the best way to do the type parameterization" questions.
I'd like to at least see cases 1 fixed and 2 attempted (or explained why not to, if you have already tried and the result is worse). Case 3 is more a latent request for discussion / exploration of the alternative parameterization patterns, like maybe it all relates to the impl pattern or something? I'm curious but won't block on the issue either way. I can also try some sketching alternative parameterization patterns on my own.
I guess I would like, if nothing else, for all the function bodies that contain a branch on "if-constexpr type-A-or-B" to also contain the "static-assert-type-is-either-A-or-B" call (and for it to be factored out into a named helper). It seems to be true in many cases, but I'd like it to at least be true in all cases where the if-constexpr thing is employed. Just like so we can get a static list of places to fix when A-or-B becomes A-or-B-or-C in the future.
src/bucket/Bucket.cpp
Outdated
releaseAssertOrThrow(protocolVersionStartsFrom( | ||
maxProtocolVersion, | ||
Bucket::FIRST_PROTOCOL_SUPPORTING_PERSISTENT_EVICTION)); | ||
meta.ext = oi.getMetadata().ext; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this not read meta.ext = ni.getMetadata().ext
? Like take from ni
not oi
?
src/bucket/Bucket.cpp
Outdated
// well | ||
if (ni.getMetadata().ext.v() == 1) | ||
{ | ||
releaseAssertOrThrow(protocolVersionStartsFrom( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a vague recollection that the maxProtocolVersion threaded into this function can sometimes be higher or lower than one might expect, due to some approximation in it's derivation, but I can't remember the details; I would just suggest researching it if you haven't yet to be sure this assert is going to always hold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the relevant function here is calculateMergeProtocolVersion
. The current protocol version always gets fed in as maxProtocolVersion
in Bucket::merge
(the relevant entry points are the FutureBucket
constructions in BucketLevel<BucketT>::prepare
). calculateMergeProtocolVersion
just takes the max protocol of the input buckets, then makes sure it's not greater than maxProtocolVersion
. The funkiness comes in when shadows are involved, which will never happen in the archival buckets.
src/bucket/BucketInputIterator.h
Outdated
static_assert(std::is_same_v<BucketT, LiveBucket> || | ||
std::is_same_v<BucketT, HotArchiveBucket>); | ||
|
||
using BucketEntryT = std::conditional_t<std::is_same_v<BucketT, LiveBucket>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could be accomplished slightly less-awkwardly by defining BucketEntryT as a member-type of the type parameter? Or have I forgotten how templates even work?
src/bucket/BucketInputIterator.cpp
Outdated
{ | ||
ZoneScoped; | ||
if (mIn.readOne(mEntry)) | ||
{ | ||
mEntryPtr = &mEntry; | ||
if (mEntry.type() == METAENTRY) | ||
bool isMeta; | ||
if constexpr (std::is_same_v<BucketEntryT, BucketEntry>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again this feels a bit like it could be a member of BucketEntryT?
@@ -80,7 +80,8 @@ IndexBucketsWork::IndexWork::postWork() | |||
|
|||
if (!self->mIndex) | |||
{ | |||
self->mIndex = BucketIndex::createIndex( | |||
// TODO: Fix this when archive BucketLists assume state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not entirely sure what this TODO means but .. should it be done still? If not, perhaps file a followup bug or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that we need to pipe the Bucket type into IndexBucketsWork
so we can replace this LiveBucket
with the proper templated BucketT
. The problem is, IndexBucketsWork
gets called on raw HAS reference files. The HAS doesn't currently support HotArchiveBucketList, so we can't pipe in the types yet. The umbrella issue is #4398.
Sorry if the outstanding TODOs aren't super clear, they're intended to be reminders for me in the not-so-obvious places I'll need to fix when I work on the assumeState
path later.
src/bucket/BucketManagerImpl.cpp
Outdated
// Because BucketManger has an impl class, no public templated functions | ||
// can be declared. This means we have to manually enforce types via | ||
// `getLiveBucketByHash` and `getHotBucketByHash`, leading to this ugly | ||
// cast. | ||
auto ret = std::dynamic_pointer_cast<BucketT>(i->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha! Is the need for the amount of explicit code duplication (rather than type-parametricity) all related to the impl-class pattern? This might actually answer 9/10 of my comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also .. is this safe? I mean: you could in theory wind up with the same bucket file (identified by hash) in both lists, and thereby mis-identify the type of the wrapper class owning it, thinking it was a live bucket when it is actually an archive bucket? I guess it's nearly impossible since the bits would differ in structural ways. But I wonder if it might be better/safer (and avoid the cast) to have two separate mSharedBuckets
variables, one per bucket list type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah unfortunately templated virtual functions are against the c++ spec, so the impl-class pattern is inconvenient here. I templated the inner BucketManagerImpl version, but we can only either have explicit return types with a messy dynamic cast, or just return a BucketT
type in the BucketManager
interface and have to cast on the caller side. I think I might be able to get by with a BucketT
level interface for futures and get rid of some of the constexpr branching, but we definitely need to be able to return LiveBucket
explicitly in the getBucketByHash
case.
I'm fairly confident this is safe, since we have the Bucket type as part of meta so we're guaranteed to not have hash collision (except for empty buckets, but that should be fine). All that being said, I think separate mSharedBuckets
would be useful so we could at least get rid of the dynamic cast, but we're stuck with the split getLiveBucketByHash
interface I'm afraid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm so I can't actually use a BucketT
interface for futures either, but I can still have separate storage types in BucketManager
to remove the cast. I think I can get rid of all the constexpr branching by changing the getBucket and getFutures interface to static getter that takes a BucketManager&
. It's not a very elegant interface, but avoids removing the impl construct and removes most constexpr branching, so it's probably the least evil approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on splitting shared buckets to ensure safety. Regarding the bucketlist/manager interface, do we actually need the impl-class pattern and the abstract BucketManager? I'm just wondering if the old patterns we're stuck with might be causing more harm than good at this point. Like, can we just have concrete bm/bl templated classes? yes we'd lose some encapsulation and probably degrade compile-time perf, but maybe that's a lesser evil? Right now the interface if quite difficult to follow, and I'm afraid maintenance is going to be a burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this approach might be a bit annoying in tests, if we implement BucketList stub subclasses and such, but probably still worth exploring
src/bucket/BucketManagerImpl.cpp
Outdated
// Because BucketManger has an impl class, no public templated functions | ||
// can be declared. This means we have to manually enforce types via | ||
// leading to this ugly variadic get that throws if the type is not correct. | ||
return std::get<std::shared_future<std::shared_ptr<BucketT>>>(i->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Likewise here, would this go away if you had two separate sets of futures, separated by bucket type?
src/bucket/BucketListSnapshot.cpp
Outdated
} | ||
|
||
std::optional<std::vector<HotArchiveBucketEntry>> | ||
SearchableHotArchiveBucketListSnapshot::loadKeysFromLedger( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, is the duplication necessary on this one (relative to the earlier very-similar live-bucketlist version of loadKeysFromLedger?)
src/bucket/BucketList.cpp
Outdated
std::vector<LedgerEntry> const& initEntries, | ||
std::vector<LedgerEntry> const& liveEntries, | ||
std::vector<LedgerKey> const& deadEntries) | ||
HotArchiveBucketList::addBatch(Application& app, uint32_t currLedger, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only differences between the two variants of this function are the presence of shadows, and how they pass their 3 arguments to the fresh-bucket routine. Do you think these differences could be factored out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. A note for the reader, I used a parameter pack for the base class addBatch
. While this isn't really necessary right now since both the HotArchiveBucketList
and LiveBucketList
have 3 input vector types, this is not true for the upcoming ColdArchiveBucketList::addBatch
variant.
bm, getAppLedgerVersion(app), {}, | ||
LedgerTestUtils::generateValidUniqueLedgerEntriesWithExclusions( | ||
{CONFIG_SETTING}, 8), | ||
LedgerTestUtils::generateValidUniqueLedgerEntries(8), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the excluded CONFIG_SETTINGS not excluded anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can't have CONFIG_SETTINGS
types as a deadEntry
or initEntry
due to invariants. CONFIG_SETTING
can't be deleted, and since it is guaranteed to already exist, it can't be initialized either. However, it can (and does) get updated periodically, so I adjusted the test accordingly for slightly better coverage.
fd3f34e
to
7295f2a
Compare
I should have addressed all the comments. I was able to refactor out almost all the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checkpointing here, as I'm still reviewing the change. Overall, really cool to see us getting closer to state archival! Thanks for putting it together. I left a few comments, mostly echoing @graydon's concerns about potential maintenance burden, and code readability. I think it's important we agree on the interface before we land this change.
src/bucket/BucketOutputIterator.cpp
Outdated
bme.metaEntry() = mMeta; | ||
put(bme); | ||
} | ||
else |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This worries me a bit, since there could be other Bucket types. Can we explicitly require hot archive bucket here?
@@ -37,34 +42,93 @@ BucketOutputIterator::BucketOutputIterator(std::string const& tmpDir, | |||
|
|||
if (protocolVersionStartsFrom( | |||
meta.ledgerVersion, | |||
Bucket::FIRST_PROTOCOL_SUPPORTING_INITENTRY_AND_METAENTRY)) | |||
LiveBucket::FIRST_PROTOCOL_SUPPORTING_INITENTRY_AND_METAENTRY)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition should not be the case for hot archive buckets, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but HotArchiveBucket
only occurs post p23 anyway. I don't think this check hurts us in any way and simplifies the code with one less if constexpr
. I don't think it's worth special casing backwards compatibility code used only in reply for the new Bucket types.
TESTING_STARTING_EVICTION_SCAN_LEVEL = | ||
readInt<uint32_t>(item, 1, BucketList::kNumLevels - 1); | ||
TESTING_STARTING_EVICTION_SCAN_LEVEL = readInt<uint32_t>( | ||
item, 1, LiveBucketList::kNumLevels - 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the archive bucketlist have a different number of levels?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So in future work this will be necessary, but for now they actually can be identical, so I simplified the assignment a little.
src/bucket/BucketList.h
Outdated
@@ -398,11 +398,15 @@ class BucketListDepth | |||
friend class testutil::BucketListDepthModifier; | |||
}; | |||
|
|||
class BucketList | |||
class BucketListBase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please try to extract classes into separate files, as bucket-related classes are getting pretty bloated
std::vector<BucketLevel> mLevels; | ||
|
||
public: | ||
// Trivial pure virtual destructor to make this an abstract class | ||
virtual ~BucketListBase() = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
= default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No this was intended. I want the base BucketList class to be pure virtual, but I define all the functions it declares. so I define a trivial pure virtual destructor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, I didn't realize this was the only pure virtual function in this class. This feels a bit hacky to me. An abstract class implies there's an interface subclasses should be implementing, or they can re-use the default implementation in the base class (which means base class functions would be virtual). Looking at the interface, why is addBatch
not pure virtual for example? Forcing subclasses to implement destructors seems a bit weird, since it's a default implementation anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So subclasses actually don't have to implement the destructor (and in the case of bucketlist don't) since the default destructor will fulfill the implementation requirement. According to stack overflow, this seems to be a pretty standard way of making a class abstract when it doesn't actually have any pure virtual functions already.
The reason we do this is that there are no functions that the derived BucketList should implement, but we still want an interface where the base class can't be instantiated directly. I agree that addBatch
is the only candidate function that could be virtual. The issue is, we add batches of different entry types for the BucketList, so addBatch
needs to be templated (see addBatchInternal
). Virtual templated classes aren't allowed in the C++ spec, so I think we have to go with the pure virtual destructor. While hacky, it seems common design and doesn't impose any additional requirements on derived classes.
src/bucket/BucketManagerImpl.cpp
Outdated
// Because BucketManger has an impl class, no public templated functions | ||
// can be declared. This means we have to manually enforce types via | ||
// `getLiveBucketByHash` and `getHotBucketByHash`, leading to this ugly | ||
// cast. | ||
auto ret = std::dynamic_pointer_cast<BucketT>(i->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 on splitting shared buckets to ensure safety. Regarding the bucketlist/manager interface, do we actually need the impl-class pattern and the abstract BucketManager? I'm just wondering if the old patterns we're stuck with might be causing more harm than good at this point. Like, can we just have concrete bm/bl templated classes? yes we'd lose some encapsulation and probably degrade compile-time perf, but maybe that's a lesser evil? Right now the interface if quite difficult to follow, and I'm afraid maintenance is going to be a burden.
src/bucket/FutureBucket.cpp
Outdated
@@ -412,20 +473,48 @@ FutureBucket::makeLive(Application& app, uint32_t maxProtocolVersion, | |||
auto& bm = app.getBucketManager(); | |||
if (hasOutputHash()) | |||
{ | |||
auto b = bm.getBucketByHash(hexToBin256(getOutputHash())); | |||
std::shared_ptr<BucketT> b; | |||
if constexpr (std::is_same_v<BucketT, LiveBucket>) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, if possible, we should try to make the interface as readable and easy to reason about as possible
src/bucket/BucketManagerImpl.cpp
Outdated
uint256 const& hash, MergeKey* mergeKey, | ||
std::unique_ptr<BucketIndex const> index) | ||
{ | ||
auto& bmImpl = static_cast<BucketManagerImpl&>(bm); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we wouldn't need to do this if we didn't follow the impl-class pattern right?
src/bucket/BucketManagerImpl.cpp
Outdated
// Because BucketManger has an impl class, no public templated functions | ||
// can be declared. This means we have to manually enforce types via | ||
// `getLiveBucketByHash` and `getHotBucketByHash`, leading to this ugly | ||
// cast. | ||
auto ret = std::dynamic_pointer_cast<BucketT>(i->second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
note that this approach might be a bit annoying in tests, if we implement BucketList stub subclasses and such, but probably still worth exploring
79ad957
to
36fc351
Compare
36fc351
to
3f365af
Compare
Description
Resolves #4394
Depends on XDR changes in stellar/stellar-xdr#200.
This change refactors the BucketList so that we can have multiple Bucket Lists simultaneously. This is necessary for State Archival, where validators will maintain a live BucketList, hot archive BucketList, and cold archive BucketList. This change does not yet support writing the new archival BucketLists to history,
Each BucketList has small differences wrt to the entry types it stores, merge logic, and how lookups are done, but the overall BucketList level logic is the same. Because of this, it seemed easiest to template the Bucket related classes and specialize a few functions. The difference are as follows (I'll be updating the Archival CAP with this info soon)
Entry Types:
Live BucketList:
BucketEntry
Hot Archive BucketList:
HotArchiveBucketEntry
.This change was necessary due to how LedgerKeys are stored. In the Live BucketList, only
DEADENTRY
store plainLedgerKey
.DEADENTRY
is equivalent to null, so these LedgerKeys are dropped at the bottom level bucket.Hot Archives require two types that store LedgerKeys:
HA_LIVE
andHA_DELETED
.HA_LIVE
is equivalent to the "null" type in the Live BucketList, and is dropped in the bottom bucket. However,HA_DELETED
needs to be persisted.Merging:
In the hot archive,
LedgerEntry
are never updated. They may be overwritten by anotherHotArchiveBucketEntry
type, but theLedgerEntry
contents itself never change. This means that theINITENTRY
,LIVEENTRY
, andDEADENTRY
abstraction doesn't really make sense. This makes the merge logic for Hot Archive buckets very simple, the newer entry always overwrites the older entry.This PR adds the concept of a "Tombstone" entry. A tombstone entry essentially represents null in the given BucketList and can be dropped once it reaches the bottom bucket. On the live BucketList, DEADENTRY is the tombstone type. On the Hot Archive BucketList,
HA_LIVE
is the tombstone.BucketListDB lookup:
The live BucketList only ever returns
LedgerEntry
types when queried. Any DEADENTRY LedgerKey is simply omitted from the return, as this is the "null" tombstone type.Hot archive lookups must return both LedgerKey and LedgerEntry types. HA_LIVE entries are of type LedgerKey and can be omitted from the return since they are the tombstone type. However, HA_ARCHIVED entries are not tombstone entries and must be returned when found. Do to this, Hot Archive lookups return
HotArchiveBucketEntry
instead ofLedgerEntry
when queried.Future updates will add more functionality to Hot Archives, further distinguishing it from the live BucketList, and add a third BucketList type called the cold archive. This cold archive will add a third BucketEntry type and have different merge logic as well, so I think the templated Bucket classes, while verbose, are warranted.
This is currently a draft until the XDR changes are merged.
Checklist
clang-format
v8.0.0 (viamake format
or the Visual Studio extension)