Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: matching merkle db encoding #258

Closed
wants to merge 3 commits into from
Closed

feat: matching merkle db encoding #258

wants to merge 3 commits into from

Conversation

xinifinity
Copy link
Contributor

@xinifinity xinifinity commented Sep 11, 2023

No description provided.

@xinifinity xinifinity marked this pull request as ready for review September 13, 2023 20:57
firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
firewood/tests/codec.rs Show resolved Hide resolved
trailing: Vec<u8>,
}

pub fn encode_int(val: i64) -> Result<Vec<u8>, CodecError> {
Copy link
Collaborator

@rkuris rkuris Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub => pub(crate)

Please consider making these methods on Data:

impl Data {
    fn encode_int(val: i64) -> Result<Self, CodecError> {
        // TODO: construct Data{}
    }
}

I'm not sure how these get used, but if you're just appending to this trailing vec every time, then these methods should probably be called append_int or something, and can take &mut self instead.

firewood/Cargo.toml Show resolved Hide resolved
@rkuris
Copy link
Collaborator

rkuris commented Sep 18, 2023

https://docs.rs/varint/latest/varint/trait.VarintWrite.html can be used for varint protobuf like encoding. This does mean unpacking nodes will be less efficient. I wonder if this should be a feature flag we can enable or disable -- it would be so much faster to avoid all this encoding

Copy link
Contributor

@richardpringle richardpringle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yikes... whirlwind of a review. Let me know if you need clarification anywhere (I'm sure you will)

firewood/src/codec.rs Show resolved Hide resolved

#[serde(skip)]
trailing: Vec<u8>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs? Seems weird to serde(skip) here...

firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
pub fn encode_str(val: &[u8]) -> Result<Vec<u8>, CodecError> {
let res = encode_int(val.len() as i64)?;
Ok([&res[..], val].concat())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 , have you looked at the serde::Serializer implementation for strings? https://docs.rs/bincode/latest/src/bincode/ser/mod.rs.html#121-124

It looks like the length is already encoded...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difference now, my bad.

I still think we should use marker traits here (see my comment on this file).

use serde::Serializer;
use std::io::Write;

trait Encode: Serialize {
    fn encode(&self) -> Result<bincode::Serializer::Ok, bincode::Serializer::Error> {
        DefaultOptions::new().serialize(self)
    }
}

impl Encode for str {
    fn encode(&self) -> Result<S::Ok, S::Error> {
        let mut data = Vec::with_capacity(s.len() + 1);
        DefaultOptions::new().serialize_into(&mut data, s.len() as i64)?;

        data.write(self.as_bytes());

        Ok(data)
    }
}

If we just start with str, it's not really beneficial to have Encode: Serialize, but that would come in super handy for all the other primitives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUT... They might potentially make some changes in MerkleDB so we could potentially just use the default implementation.

firewood/src/codec.rs Show resolved Hide resolved
firewood/src/codec.rs Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the whole file, I think everything can be greatly simplified:

use bincode::{Options, Error}; 
use serde::{Serialize, Deserialize};

pub(crate) fn encode<T: Serialize>(t: &T) -> Result<Vec<u8>, Error> {
    DefaultOptions::new().serialize(t)
}

pub(crate) fn decode<'de, T: Deserialize<'de>>(bytes: &'de [u8]) -> Result<T, Error> {
    DefaultOptions::new().deserialize(bytes)
}

If we want to add further constraints on types that are encodable/decodable, we can have the following traits:

trait Encode: Serialize {}
trait Decode<'de>: Deserialize<'de> {}

// can make this easier with a macro
impl Encode for String {}
impl Decode<'_> for String {}

impl Encode for i64 {}
impl Decode<'_> for i64 {}

And bound the encode and decode functions by these traits instead.

firewood/tests/codec.rs Show resolved Hide resolved
@xinifinity xinifinity marked this pull request as draft September 19, 2023 06:08
@xinifinity
Copy link
Contributor Author

Closed as it may no longer applicable ATM, will reopen if needed.

@xinifinity xinifinity closed this Sep 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

3 participants