Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename ObjPtr<T> to DiskAddress #204

Merged
merged 4 commits into from
Aug 21, 2023
Merged

Conversation

rkuris
Copy link
Collaborator

@rkuris rkuris commented Aug 18, 2023

The name "Object Pointer" implies an in-memory pointer, whereas what it is actually being used for is a reference to an object stored on disk

ObjPtr had a generic type T used to indicate the type of object stored at that location. This had a lot of cognitive overhead and I didn't find this all that useful, particularly since there were only a few types of T that could be stored. Almost always this is a Node, but it could also be a CompactSpaceHeader for free space within a file.

DiskAddress is implemented as a Option, which required a partial change from u64 addresses to usize types. This could use more work; it would be better to perhaps use DiskAddress types down at the IO layers. I avoided this and stopped just short of that, adding a bunch of as usize and as u64 to avoid blowing up this diff too much.

DiskAddress of 0 is invalid -- there is a header stored there that is not used by the caching system.

Also noticed that merkle.rs uses Option which is (and probably was) redundant, since DiskAddress(0) isn't valid for anything that we want to cache. There is a file header there but it's handled separately already.

The basics of this change were:

  • Change all ObjPtr to DiskAddress
  • Change all ObjPtr::new_from_addr(a) to DiskAddress::from(a)
  • Change all ObjPtr::null() to DiskAddress::null()
  • Implement basic math operations on DiskAddress
    • std::ops::{Add, Sub, AddAssign, SubAssign} etc
    • allow adding/subtracting DiskAddress as well as usize types
  • Implement serialization and deserialization to/from [u8; 8]
    • added to_le_bytes() for serialization
    • added From<[u8; 8]> and From<&[u8]> for deserialization
  • Change ObjPtr::addr() to DiskAddress::get()
    • This should panic if DiskAddress is None; see above for why it doesn't yet.

@rkuris rkuris force-pushed the rkuris/rename-objptr-to-diskaddress branch 4 times, most recently from eb56deb to f2adba2 Compare August 18, 2023 22:54
@rkuris rkuris marked this pull request as ready for review August 18, 2023 23:12
The name "Object Pointer" implies an in-memory pointer, whereas what
it is actually being used for is a reference to an object stored on
disk

ObjPtr had a generic type T used to indicate the type of object stored
at that location. This had a lot of cognitive overhead and I didn't
find this all that useful, particularly since there were only a few
types of T that could be stored. Almost always this is a `Node`, but
it could also be a `CompactSpaceHeader` for free space within a file.

DiskAddress is implemented as a Option<NonZeroUsize>, which required
a partial change from u64 addresses to usize types. This could use
more work; it would be better to perhaps use DiskAddress types down
at the IO layers. I avoided this and stopped just short of that, adding
a bunch of `as usize` and `as u64` to avoid blowing up this diff too
much.

DiskAddress of 0 is invalid -- there is a header stored there that is
not used by the caching system. However, the code had some "bugs" where
it was reading unused data at address 0 (related to the blob deprecation
for eth) as well as some test cases that don't initialize the database
correctly, leading to objects at address 0. These were not addressed in
this fix, but left with TODOs to cause panics when operating at address
0.

Also noticed that merkle.rs uses Option<DiskAddress> which is (and
probably was) redundant, since DiskAddress(0) isn't valid for anything
that we want to cache. There is a file header there but it's handled
separately already.

The basics of this change were:
 - Change all ObjPtr<T> to DiskAddress
 - Change all ObjPtr::new_from_addr(a) to DiskAddress::from(a)
 - Change all ObjPtr::null() to DiskAddress::null()
 - Implement basic math operations on DiskAddress
   - std::ops::{Add, Sub, AddAssign, SubAssign} etc
   - allow adding/subtracting DiskAddress as well as usize types
 - Implement serialization and deserialization to/from [u8; 8]
   - added to_le_bytes() for serialization
   - added From<[u8; 8]> and From<&[u8]> for deserialization
 - Change ObjPtr::addr() to DiskAddress::get()
   - This should panic if DiskAddress is `None`, but doesn't currently
@rkuris rkuris force-pushed the rkuris/rename-objptr-to-diskaddress branch from f2adba2 to a6d971b Compare August 18, 2023 23:34
shale/src/diskaddress.rs Outdated Show resolved Hide resolved

/// The virtual disk address of an object
#[derive(Debug, Copy, Clone, Eq, Hash, Ord, PartialOrd, PartialEq)]
pub struct DiskAddress(pub Option<NonZeroUsize>);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why pub? I feel like DerefMut is enough, isn't it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can at least be pub(crate) but you can't even reference the object if it isn't pub of some type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a tough change, so I'd like to defer it. The trouble is that so many other things are marked as pub since this shale and friends used to be in different crates.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My comment was bad I think. Did you try to make the internal Option private? Happy to defer the change regardless

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no I misunderstood! Arguably DiskAddress shouldn't leak outside of firewood though.

shale/src/diskaddress.rs Outdated Show resolved Hide resolved
shale/src/diskaddress.rs Outdated Show resolved Hide resolved
/// serialization from disk
impl From<&[u8]> for DiskAddress {
fn from(value: &[u8]) -> Self {
let bytes: [u8; 8] = value.try_into().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔, wouldn't we rather implement TryFrom instead of having an unwrap here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's possible, but From is much more convenient! I'm going to guess that if someone passes something other than 8 bytes into this function, it's going to be quickly found out here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we are going to want our codebase to be void of unwrap or at least it should be extremely rare. It's fine to leave this as is for now, (I'm looking at the conversation tab, I haven't seen all of the changes yet), just to get the PR landed. But in general, the further up the callstack that we can push the unwraps, the easier it will be to remove them entirely.

fn from(value: DiskAddress) -> usize {
value.get()
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not only have the logic live here instead of From::from delegating to DiskAddress::get?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of the TODO that says we shouldn't map None to 0 in many cases.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do you think there are any tests that would be worth adding here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this, but I didn't see much value and saw more maintenance, particularly since I plan on changing it so that None is really "none" and you can't read an object at location 0.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this

Good, that's all I was checking, lol. I took a look and also couldn't see any tests that would really be valuable so I figured I would double-check your opinion too.

All the use cases seem to be in the implementation of Debug for most
other types that embed this, with the exception of dump. I switched dump
to use Debug as well as it uses Debug for most other things it dumps
anyway and it seems to mostly be for debugging.
@rkuris rkuris merged commit 5d57d07 into main Aug 21, 2023
5 checks passed
@rkuris rkuris deleted the rkuris/rename-objptr-to-diskaddress branch August 21, 2023 17:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants