-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rename ObjPtr<T> to DiskAddress #204
Conversation
eb56deb
to
f2adba2
Compare
The name "Object Pointer" implies an in-memory pointer, whereas what it is actually being used for is a reference to an object stored on disk ObjPtr had a generic type T used to indicate the type of object stored at that location. This had a lot of cognitive overhead and I didn't find this all that useful, particularly since there were only a few types of T that could be stored. Almost always this is a `Node`, but it could also be a `CompactSpaceHeader` for free space within a file. DiskAddress is implemented as a Option<NonZeroUsize>, which required a partial change from u64 addresses to usize types. This could use more work; it would be better to perhaps use DiskAddress types down at the IO layers. I avoided this and stopped just short of that, adding a bunch of `as usize` and `as u64` to avoid blowing up this diff too much. DiskAddress of 0 is invalid -- there is a header stored there that is not used by the caching system. However, the code had some "bugs" where it was reading unused data at address 0 (related to the blob deprecation for eth) as well as some test cases that don't initialize the database correctly, leading to objects at address 0. These were not addressed in this fix, but left with TODOs to cause panics when operating at address 0. Also noticed that merkle.rs uses Option<DiskAddress> which is (and probably was) redundant, since DiskAddress(0) isn't valid for anything that we want to cache. There is a file header there but it's handled separately already. The basics of this change were: - Change all ObjPtr<T> to DiskAddress - Change all ObjPtr::new_from_addr(a) to DiskAddress::from(a) - Change all ObjPtr::null() to DiskAddress::null() - Implement basic math operations on DiskAddress - std::ops::{Add, Sub, AddAssign, SubAssign} etc - allow adding/subtracting DiskAddress as well as usize types - Implement serialization and deserialization to/from [u8; 8] - added to_le_bytes() for serialization - added From<[u8; 8]> and From<&[u8]> for deserialization - Change ObjPtr::addr() to DiskAddress::get() - This should panic if DiskAddress is `None`, but doesn't currently
f2adba2
to
a6d971b
Compare
shale/src/diskaddress.rs
Outdated
|
||
/// The virtual disk address of an object | ||
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq)] | ||
pub struct DiskAddress(pub Option<NonZeroUsize>); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pub
? I feel like DerefMut
is enough, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can at least be pub(crate)
but you can't even reference the object if it isn't pub
of some type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a tough change, so I'd like to defer it. The trouble is that so many other things are marked as pub since this shale and friends used to be in different crates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comment was bad I think. Did you try to make the internal Option
private? Happy to defer the change regardless
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no I misunderstood! Arguably DiskAddress shouldn't leak outside of firewood though.
shale/src/diskaddress.rs
Outdated
/// serialization from disk | ||
impl From<&[u8]> for DiskAddress { | ||
fn from(value: &[u8]) -> Self { | ||
let bytes: [u8; 8] = value.try_into().unwrap(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔, wouldn't we rather implement TryFrom
instead of having an unwrap
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's possible, but From
is much more convenient! I'm going to guess that if someone passes something other than 8 bytes into this function, it's going to be quickly found out here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we are going to want our codebase to be void of unwrap
or at least it should be extremely rare. It's fine to leave this as is for now, (I'm looking at the conversation tab, I haven't seen all of the changes yet), just to get the PR landed. But in general, the further up the callstack that we can push the unwrap
s, the easier it will be to remove them entirely.
shale/src/diskaddress.rs
Outdated
fn from(value: DiskAddress) -> usize { | ||
value.get() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not only have the logic live here instead of From::from
delegating to DiskAddress::get
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of the TODO that says we shouldn't map None to 0 in many cases.
shale/src/diskaddress.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, do you think there are any tests that would be worth adding here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this, but I didn't see much value and saw more maintenance, particularly since I plan on changing it so that None is really "none" and you can't read an object at location 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about this
Good, that's all I was checking, lol. I took a look and also couldn't see any tests that would really be valuable so I figured I would double-check your opinion too.
All the use cases seem to be in the implementation of Debug for most other types that embed this, with the exception of dump. I switched dump to use Debug as well as it uses Debug for most other things it dumps anyway and it seems to mostly be for debugging.
The name "Object Pointer" implies an in-memory pointer, whereas what it is actually being used for is a reference to an object stored on disk
ObjPtr had a generic type T used to indicate the type of object stored at that location. This had a lot of cognitive overhead and I didn't find this all that useful, particularly since there were only a few types of T that could be stored. Almost always this is a
Node
, but it could also be aCompactSpaceHeader
for free space within a file.DiskAddress is implemented as a Option, which required a partial change from u64 addresses to usize types. This could use more work; it would be better to perhaps use DiskAddress types down at the IO layers. I avoided this and stopped just short of that, adding a bunch of
as usize
andas u64
to avoid blowing up this diff too much.DiskAddress of 0 is invalid -- there is a header stored there that is not used by the caching system.
Also noticed that merkle.rs uses Option which is (and probably was) redundant, since DiskAddress(0) isn't valid for anything that we want to cache. There is a file header there but it's handled separately already.
The basics of this change were:
None
; see above for why it doesn't yet.