Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: matching merkle db encoding #258

Closed
wants to merge 3 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions firewood/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ sha3 = "0.10.2"
thiserror = "1.0.38"
tokio = { version = "1.21.1", features = ["rt", "sync", "macros"] }
typed-builder = "0.16.0"
bincode = "1.3.3"
xinifinity marked this conversation as resolved.
Show resolved Hide resolved

[dev-dependencies]
criterion = "0.5.1"
Expand Down
60 changes: 60 additions & 0 deletions firewood/src/codec.rs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reviewing the whole file, I think everything can be greatly simplified:

use bincode::{Options, Error}; 
use serde::{Serialize, Deserialize};

pub(crate) fn encode<T: Serialize>(t: &T) -> Result<Vec<u8>, Error> {
    DefaultOptions::new().serialize(t)
}

pub(crate) fn decode<'de, T: Deserialize<'de>>(bytes: &'de [u8]) -> Result<T, Error> {
    DefaultOptions::new().deserialize(bytes)
}

If we want to add further constraints on types that are encodable/decodable, we can have the following traits:

trait Encode: Serialize {}
trait Decode<'de>: Deserialize<'de> {}

// can make this easier with a macro
impl Encode for String {}
impl Decode<'_> for String {}

impl Encode for i64 {}
impl Decode<'_> for i64 {}

And bound the encode and decode functions by these traits instead.

Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
// Copyright (C) 2023, Ava Labs, Inc. All rights reserved.
// See the file LICENSE.md for licensing terms.

use bincode::Options;
use thiserror::Error;

const MIN_BYTES_LEN: usize = 1;

#[derive(Debug, Error)]
pub enum CodecError {
#[error("bincode error")]
BincodeError(#[from] bincode::Error),
xinifinity marked this conversation as resolved.
Show resolved Hide resolved
#[error("no such node")]
UnexpectedEOFError,
xinifinity marked this conversation as resolved.
Show resolved Hide resolved
#[error("from utf8 error")]
FromUtf8Error(#[from] std::string::FromUtf8Error),
xinifinity marked this conversation as resolved.
Show resolved Hide resolved
}

use serde::Deserialize;
xinifinity marked this conversation as resolved.
Show resolved Hide resolved

#[derive(Deserialize, Debug)]
struct Data {
len: i64,

#[serde(skip)]
trailing: Vec<u8>,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docs? Seems weird to serde(skip) here...


pub fn encode_int(val: i64) -> Result<Vec<u8>, CodecError> {
Copy link
Collaborator

@rkuris rkuris Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pub => pub(crate)

Please consider making these methods on Data:

impl Data {
    fn encode_int(val: i64) -> Result<Self, CodecError> {
        // TODO: construct Data{}
    }
}

I'm not sure how these get used, but if you're just appending to this trailing vec every time, then these methods should probably be called append_int or something, and can take &mut self instead.

Ok(bincode::DefaultOptions::new().serialize(&val)?)
}
xinifinity marked this conversation as resolved.
Show resolved Hide resolved

pub fn decode_int(src: &[u8]) -> Result<i64, CodecError> {
Ok(bincode::DefaultOptions::new().deserialize(src)?)
}

pub fn encode_str(val: &[u8]) -> Result<Vec<u8>, CodecError> {
let res = encode_int(val.len() as i64)?;
Ok([&res[..], val].concat())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 , have you looked at the serde::Serializer implementation for strings? https://docs.rs/bincode/latest/src/bincode/ser/mod.rs.html#121-124

It looks like the length is already encoded...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see the difference now, my bad.

I still think we should use marker traits here (see my comment on this file).

use serde::Serializer;
use std::io::Write;

trait Encode: Serialize {
    fn encode(&self) -> Result<bincode::Serializer::Ok, bincode::Serializer::Error> {
        DefaultOptions::new().serialize(self)
    }
}

impl Encode for str {
    fn encode(&self) -> Result<S::Ok, S::Error> {
        let mut data = Vec::with_capacity(s.len() + 1);
        DefaultOptions::new().serialize_into(&mut data, s.len() as i64)?;

        data.write(self.as_bytes());

        Ok(data)
    }
}

If we just start with str, it's not really beneficial to have Encode: Serialize, but that would come in super handy for all the other primitives.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUT... They might potentially make some changes in MerkleDB so we could potentially just use the default implementation.


pub fn decode_str(src: &[u8]) -> Result<String, CodecError> {
if src.len() < MIN_BYTES_LEN {
return Err(CodecError::UnexpectedEOFError);
}
xinifinity marked this conversation as resolved.
Show resolved Hide resolved

let mut cursor = src;
let mut data: Data = bincode::DefaultOptions::new().deserialize_from(&mut cursor)?;
data.trailing = cursor.to_owned();

let len = data.len;
if len < 0 {
return Err(CodecError::UnexpectedEOFError);
} else if len == 0 {
return Ok(String::default());
} else if len as usize > src.len() {
return Err(CodecError::UnexpectedEOFError);
}
xinifinity marked this conversation as resolved.
Show resolved Hide resolved
Ok(String::from_utf8(data.trailing)?)
}
3 changes: 3 additions & 0 deletions firewood/src/config.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// Copyright (C) 2023, Ava Labs, Inc. All rights reserved.
// See the file LICENSE.md for licensing terms.

pub use crate::storage::{buffer::DiskBufferConfig, WalConfig};
use typed_builder::TypedBuilder;

Expand Down
1 change: 1 addition & 0 deletions firewood/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -186,6 +186,7 @@
//! No change is required for other historical ghost space instances. Finally, we can phase out
//! some very old ghost space to keep the size of the rolling window invariant.
//!
pub mod codec;
pub mod db;
pub(crate) mod file;
pub mod merkle;
Expand Down
18 changes: 18 additions & 0 deletions firewood/tests/codec.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
use firewood::codec::{decode_int, decode_str, encode_int, encode_str};
xinifinity marked this conversation as resolved.
Show resolved Hide resolved

#[test]
fn test_bincode() {
let val: i64 = 11;
let encoded = encode_int(val).unwrap();
println!("encoded: {:?}", encoded);

let decoded: i64 = decode_int(&encoded[..]).unwrap();
assert_eq!(val, decoded);

let val = "hello world";
let encoded = encode_str(val.as_bytes()).unwrap();
println!("encoded: {:?}", encoded);

let decoded: String = decode_str(&encoded[..]).unwrap();
assert_eq!(val, decoded);
}