Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

types: serialize: constrain the new serialization traits to make them easier and safer to use #855

Merged
merged 3 commits into from
Nov 24, 2023

Conversation

piodul
Copy link
Collaborator

@piodul piodul commented Oct 26, 2023

Currently, SerializeRow and SerializeCql traits are just given a mutable reference to a Vec and asked to append their CQL representation to the end. While simple, there are some issues with the interface:

  • The serialize method has access to the serialized representation of the values that were appended before it. It's not necessary for a correct implementation to have access to it.
  • Implementors technically can append any byte sequence to the end, but actually are expected to produce a CQL [value] containing the serialized value.

While the SerializeRow and SerializeCql traits are not generally meant to be manually implemented by the users, we can make the interface easier to use and harder to misuse by making it append-only, restricting what the users are allowed to append and requiring the users to append anything by using a dash of type-level magic.

Introduce RowWriter and CellWriter traits which satisfy the above wishes and constraints, and pass them instead of Vec in SerializeRow and SerializeCql.

The new traits have two implementations - a Vec backed one that actually appends the bytes given to it, and a usize-backed one which just measures the length of the output without writing anything. Passing the latter before doing the actual serialization will allow to preallocate the right amount of bytes and then serialize without reallocations. It should be measured whether the reallocation cost always outweighs the calculation cost before implementing this optimization.

Refs: #801

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

@piodul piodul requested a review from Lorak-mmk October 26, 2023 09:56
@piodul piodul force-pushed the adjust-interface branch 2 times, most recently from 281cabd to 25677ba Compare October 27, 2023 07:25
@piodul
Copy link
Collaborator Author

piodul commented Nov 16, 2023

@Lorak-mmk review ping

@piodul
Copy link
Collaborator Author

piodul commented Nov 17, 2023

v2:

  • More docstrings
  • RowWriter now does not prepend what it wrote with a u16 value indicating the count of the values written
  • Some items that were accidentally imported via _macro_internal path in tests are now imported through a proper path

@piodul piodul force-pushed the adjust-interface branch 2 times, most recently from e672d2f to 7038cbc Compare November 17, 2023 14:33
@piodul
Copy link
Collaborator Author

piodul commented Nov 17, 2023

v2.1:

  • Rebased
  • Addressed clippy's complaints about needless std::mem::drop

Introduce the `read_value` function which is able to read a [value], as
specified in the CQL protocol. It will be used in the next commit, in
order to make the interface of the SerializedValue iterators more
correct.
Currently, the SerializedValues' `iter()` method treats both null and
unset values as None, and `iter_name_value_pairs()` just assumes that
values are never null/unset and panics if they are.

Make the interface more correct by adjusting both methods to return
RawValue. The iterators will be used in the next commit to implement the
fallback that allows to implement `SerializeRow`/`SerializeCql` via
legacy `ValueList`/`Value` traits.
@Lorak-mmk
Copy link
Collaborator

Actually I have one question: is mod.rs the best place for those new traits and structs? I don't really know Rust conventions in this matter, but it is quite a lot of code, wouldn't it be better to make a new file for it?

@piodul
Copy link
Collaborator Author

piodul commented Nov 23, 2023

Actually I have one question: is mod.rs the best place for those new traits and structs? I don't really know Rust conventions in this matter, but it is quite a lot of code, wouldn't it be better to make a new file for it?

Makes sense, I can move the CellWriter and friends to a separate module.

@piodul
Copy link
Collaborator Author

piodul commented Nov 23, 2023

v3: moved the newly introduced types and traits to a separate module (they are reexported from the place they were defined previously)

… interfaces

Currently, `SerializeRow` and `SerializeCql` traits are just given a
mutable reference to a Vec<u8> and asked to append their CQL
representation to the end. While simple, there are some issues with the
interface:

- The serialize method has access to the serialized representation of
  the values that were appended before it. It's not necessary for a
  correct implementation to have access to it.
- Implementors technically can append any byte sequence to the end, but
  actually are expected to produce a CQL [value] containing the
  serialized value.

While the `SerializeRow` and `SerializeCql` traits are not generally
meant to be manually implemented by the users, we can make the interface
easier to use and harder to misuse by making it append-only, restricting
what the users are allowed to append and requiring the users to append
anything by using a dash of type-level magic.

Introduce `RowWriter` and `CellWriter` traits which satisfy the above
wishes and constraints, and pass them instead of Vec<u8> in
`SerializeRow` and `SerializeCql`.

The new traits have two implementations - a Vec<u8> backed one that
actually appends the bytes given to it, and a usize-backed one which
just measures the length of the output without writing anything. Passing
the latter before doing the actual serialization will allow to
preallocate the right amount of bytes and then serialize without
reallocations. It should be measured whether the reallocation cost
always outweighs the calculation cost before implementing this
optimization.
@piodul
Copy link
Collaborator Author

piodul commented Nov 23, 2023

v3.1: fixed links in the docs

Copy link
Collaborator

@Lorak-mmk Lorak-mmk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is nothing blocking this, do you think we can merge it?

@piodul
Copy link
Collaborator Author

piodul commented Nov 24, 2023

I think there is nothing blocking this, do you think we can merge it?

Yes, I'll go ahead and merge it.

@piodul piodul merged commit 46e33c9 into scylladb:main Nov 24, 2023
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants