Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: use smallvec for keys #31

Merged
merged 6 commits into from
Sep 17, 2024
Merged

Conversation

cchudant
Copy link
Member

@cchudant cchudant commented May 3, 2024

I have a few things to note here and questions

-> I don't like the SByteVec name, but I don't have a better one
-> do we keep the BonsaiDb api unchanged, with Vec<u8>s?
if we do, we don't have to expose SByteVec to the api at all actually

-> smallvec => maybe just having type SByteVec = [u8; 32]; would be better but we would have to pad the data in some cases

for background, a standard vec basically a tuple (capacity, len, ptr), a smallvec is (capacity, len, union { inline: [u8; 32], inheap: ptr }) and they know if it's one of the two variants by checking if capacity >= 32
I dont think we would have that much of a difference in perf when fixing N to 32 for every key or even part of the api

-> I have changed some stuff like

let _: Vec<u8> = [
        id.to_bytes().as_slice(),
        &[KEY_SEPARATOR],
        key.as_slice(),
        &[key.into()],
        &[OLD_VALUE],
    ]
    .concat();

to iter based methods

let _: SByteVec = id.to_bytes().into_iter()
    .chain(iter::once(KEY_SEPARATOR))
    .chain(key.as_slice().iter().copied())
    .chain(iter::once(key.into()))
    .chain(iter::once(OLD_VALUE))
    .collect()

That kind of looks bad, using iterators like this for individual bytes may mean this doesn't lower to a bunch of optimized copies
especially as we are using smallvec which is out of std
I don't see it in my profiler, so I assume this is fine and llvm inlines everything correctly but I havent checked the assembly

-> there are still lots of copies when converting bitvecs to smallvecs which could be fixed
Optimizing these is not necessarily worth the effort it just bothers me a little

Results

image
The benches use HashMap db and they are handcrafted, they may not accurately translate to the real world

The drop storage bench is not really relevant for real world, i've just added it because on my profiler it shows libc free was actually one of the most time consuming functions of my binary

Other

There should only be very minor conflicts between this PR and #30

@cchudant
Copy link
Member Author

cchudant commented May 3, 2024

@AurelienFT

Copy link
Collaborator

@AurelienFT AurelienFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we change the APIs like this we have to edit all the examples and I think it might be less usable. Can we keep Vec and convert them directly ?
Otherwise i'm ok with the PR.

@cchudant
Copy link
Member Author

cchudant commented May 7, 2024

If we change the APIs like this we have to edit all the examples and I think it might be less usable. Can we keep Vec and convert them directly ? Otherwise i'm ok with the PR.

The only API that is changed here is the BonsaiDb trait, you only implement that trait if the default Rocksdb impl isnt sufficient for your usecase. I think it's ok to have SByteVec here

To be fair, this is probably not worth making the api more complex but I kind of aspire to have absolutely no useless copies in the whole crate

That aspiration probably clashes at times with the goal of making the crate as simple as possible ahah

@AurelienFT I think I'll remove the api change for now it can always be added later

@AurelienFT
Copy link
Collaborator

The only API that is changed here is the BonsaiDb trait, you only implement that trait if the default Rocksdb impl isnt sufficient for your usecase. I think it's ok to have SByteVec here

It also change the API of the main structure for get_keys etc

@AurelienFT
Copy link
Collaborator

@AurelienFT I think I'll remove the api change for now it can always be added later

Yeah I think also

@cchudant cchudant mentioned this pull request Sep 16, 2024
@cchudant cchudant merged commit 0128877 into madara-alliance:oss Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants