Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialization refactor: introduce new deserialization traits #970

Merged
merged 41 commits into from
Jun 13, 2024

Conversation

wprzytula
Copy link
Collaborator

@wprzytula wprzytula commented Mar 24, 2024

Fixes #961
Fixes #963

Huge credits to @piodul for creating most of the new deserialization framework!

The new traits

This PR introduces the new deserialization traits, DeserializeValue and DeserializeRow, which are going to replace FromCqlVal and FromRow, respectively.

If a type implements DeserializeValue<'f>, this means that it can take a buffer of lifetime 'f and deserialize itself, treating the contents of the buffer as a CQL value. Analogously, DeserializeRow<'f> allows for deserialization of types that are supposed to represent a whole row returned in the results.

Deserialization is now split into two phases: type checking and actual deserialization. Both traits have two methods: type_check and deserialize. The idea behind this is to validate the column types of the response only once, after the response is received. The deserialize method is then called for each row/value to perform the actual parsing: that method is allowed to assume that type_check was called and may skip some type checking for performance (although it cannot use that assumption to perform unsafe operations, it is not an unsafe method).

The previous framework only supported deserialization of types that exclusively own their data, for example String or Vec<u8>. However, the new framework allows for types that borrow or co-own the frame buffer:
- Types that borrow the buffer are &'f str and &'f [u8]. They just point to the raw data from the serialized response. Rust's lifetime system makes sure that the the user doesn't deallocate the serialized response until using the deserialized types that point to the buffer are dropped. Apart from the aforementioned types, a bunch of iterator types have been introduced that allow consuming collections without allocations.
- Serialized frame is represented with bytes::Bytes. It is possible to create a subslice (also of type Bytes) that keep the original alive. The type being deserialized can obtain access to such subslice. Like in the case of e.g. &str vs. String, keeping Bytes allows to avoid an allocation, but it is also easier to handle as deserialized type doesn't have to be bound by any lifetime. It important to be careful when handling such types as they keep whole serialized frame alive, even if they only point to a subslice. This can lead to a space leak and more memory being used than necessary.

Implementation of the new traits for all supported types

All the previously supported types (i.e. those that could be deserialized using the old framework) are supported now as well.
There is only one exception: CQL List is no longer deserializable into a set (HashSet or BTreeSet), because such conversion could be lossy, which is an unwanted property for deserialization.

Errors

The error framework is analogous to one used in the serialization framework: DeserializationError is added that resembles SerializationError and is type-erased as well.

As the new deserialization framework has separate type_check() and deserialize() function (contrary to the serialization framework, which has only one serialize() method), it makes sense for them to return distinct errors.
That's why a new TypeCheckError is added, too, which is type-erased as well.

Testing

  • All new types are unit tested.
  • Additionally, a decent test suite is added to verify compatibility of the old and the new framework. As in a later commit the old framework (deser_cql_value) is refactored to use the new one, the compatibility tests are then removed.
  • Another test suite is added to check that serialization composed with deserialization yields identity.
  • New errors are tested, too, although not exhaustively. I'd prefer not to bog down too much in testing them thoroughly.

Pre-review checklist

  • I have split my patch into logically separate commits.
  • All commit messages clearly explain what they change and why.
  • I added relevant tests for new features and bug fixes.
  • All commits compile, pass static checks and pass test.
  • PR description sums up the changes and reasons why they should be introduced.
  • I have provided docstrings for the public items that I want to introduce.
  • I have adjusted the documentation in ./docs/source/.
  • I added appropriate Fixes: annotations to PR description.

Copy link

github-actions bot commented Mar 24, 2024

cargo semver-checks detected some API incompatibilities in this PR.
Checked commit: 30d5149

See the following report for details:

cargo semver-checks output
./scripts/semver-checks.sh --baseline-rev 3d3c35392250c98ee9ad5c0d23e71b9ec42892ca
+ cargo semver-checks -p scylla -p scylla-cql --baseline-rev 3d3c35392250c98ee9ad5c0d23e71b9ec42892ca
     Cloning 3d3c35392250c98ee9ad5c0d23e71b9ec42892ca
     Parsing scylla v0.13.0 (current)
      Parsed [  23.587s] (current)
     Parsing scylla v0.13.0 (baseline)
      Parsed [  21.139s] (baseline)
    Checking scylla v0.13.0 -> v0.13.0 (no change)
     Checked [   0.523s] 72 checks; 72 passed, 0 unnecessary
    Finished [  45.300s] scylla
     Parsing scylla-cql v0.2.0 (current)
      Parsed [   9.810s] (current)
     Parsing scylla-cql v0.2.0 (baseline)
      Parsed [   9.627s] (baseline)
    Checking scylla-cql v0.2.0 -> v0.2.0 (no change)
     Checked [   0.297s] 72 checks; 71 passed, 1 failed, 0 unnecessary

--- failure enum_variant_added: enum variant added on exhaustive enum ---

Description:
A publicly-visible enum without #[non_exhaustive] has a new variant.
        ref: https://doc.rust-lang.org/cargo/reference/semver.html#enum-variant-new
       impl: https://github.com/obi1kenobi/cargo-semver-checks/tree/v0.31.0/src/lints/enum_variant_added.ron

Failed in:
  variant ParseError:DeserializationError in /home/runner/work/scylla-rust-driver/scylla-rust-driver/scylla-cql/src/frame/frame_errors.rs:43
     Summary semver requires new major version: 1 major and 0 minor checks failed
    Finished [  19.775s] scylla-cql
make: *** [Makefile:61: semver-rev] Error 1

@wprzytula wprzytula marked this pull request as ready for review March 24, 2024 14:52
@wprzytula wprzytula force-pushed the value_tests branch 2 times, most recently from bed4295 to a1cf055 Compare March 24, 2024 15:14
@wprzytula wprzytula self-assigned this Mar 26, 2024
@wprzytula wprzytula added the performance Improves performance of existing features label Mar 26, 2024
scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/mod.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/row.rs Show resolved Hide resolved
scylla-cql/src/types/deserialize/row.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
@wprzytula wprzytula added the waiting-on-author Waiting for a response from issue/PR author label May 7, 2024
@wprzytula wprzytula marked this pull request as draft May 8, 2024 10:07
@github-actions github-actions bot added the semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes label May 8, 2024
@wprzytula wprzytula force-pushed the value_tests branch 3 times, most recently from faa92aa to 8f09a90 Compare May 10, 2024 08:25
@wprzytula wprzytula removed the waiting-on-author Waiting for a response from issue/PR author label May 10, 2024
@wprzytula wprzytula marked this pull request as ready for review May 10, 2024 08:26
@wprzytula wprzytula added this to the 0.14.0 milestone May 10, 2024
@wprzytula
Copy link
Collaborator Author

v2:

  • addressed most review comments (though DeserializeCql -> DeserializeValue rename has not yet been done),
  • set up and used an error framework similar to that used in serialization.

While reviewing, please pay attention to the error types used, especially to the use of GenericParseError variant (which I hate, but before errors refactor I can do little about this).

@wprzytula wprzytula requested a review from piodul May 10, 2024 08:36
Copy link
Contributor

@muzarski muzarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional question: do we want to expose all of the iterator types to the users? I'm aware that UdtIterator might be useful to implement deserialization for custom user types. What about other iterators such as ListLikeIterator or MapIterator? Do you see any reason for exposing these?

scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
scylla-cql/src/types/deserialize/value.rs Outdated Show resolved Hide resolved
@wprzytula
Copy link
Collaborator Author

Additional question: do we want to expose all of the iterator types to the users? I'm aware that UdtIterator might be useful to implement deserialization for custom user types. What about other iterators such as ListLikeIterator or MapIterator? Do you see any reason for exposing these?

  1. exposing them might be useful for cpp-rust-driver (but in this aspect it's better to start with private visilibity and make it public only if really needed there),
  2. it's not scylla-cql crate that should control visibility of the exports; it's scylla crate's responsibility to organise exports, just like you've recently done wrt some other exports from scylla-cql. Ideally, only leaf items (e.g. structs, consts, enums, etc.) are re-exported by scylla from scylla-cql, no node items (i.e. modules) - this gives the finest control over visibility.

wprzytula and others added 8 commits June 13, 2024 11:24
Co-authored-by: Piotr Dulikowski <[email protected]>
There is a purposeful change from the previous framework: CQL List can
no longer be deserialized straight to a HashSet or a BTreeSet. Such
deserialization would be lossy, which is a property we don't want in our
framework.

Co-authored-by: Piotr Dulikowski <[email protected]>
@wprzytula
Copy link
Collaborator Author

Rebased on main.

wprzytula and others added 13 commits June 13, 2024 11:32
This implementation is important for two reasons:
1. It enables using the upper layers of the old framework over the new
   one, which makes the transition smoother.
2. Some users (perhaps ORM users?) are going to need the dynamic
   capabilities that the previous framework offered: receiving rows
   consisting of arbitrary number of columns of arbitrary types.
   This is a perfect use case for Row.

Co-authored-by: Piotr Dulikowski <[email protected]>
This is an iterator over rows, allowing lazy and flexible
deserialization. Returns ColumnIterator for each row.

Co-authored-by: Piotr Dulikowski <[email protected]>
This iterator wraps over RowIterator and for each row consumes the
ColumnIterator and deserializes the given type.

Co-authored-by: Piotr Dulikowski <[email protected]>
In the future, we will probably deprecate and remove `deser_cql_value`
altogether. For now, let's make it at least less bloaty.

To reduce code duplication, `deser_cql_value()` now uses
DeserializeValue impls for nearly all of the deserialized types.
Two notable exceptions are:
1. CQL Map - because it is represented as Vec<(CqlValue, CqlValue)>
   in CqlValue, and Vec<T> is only deserializable from CQL Set|Map.
   Therefore, MapIterator is deserialized using its DeserializeValue
   impl, and then collected into Vec.
2. CQL Tuple - because it is represented in CqlValue much differently
   than in DeserializeValue impls: Vec<CqlValue> vs (T1, T2, ..., Tn).
   Therefore, it's similarly to how it was before, just style is changed
   from imperative to iterator-based, and DeserializeValue impl
   is called instead of `deser_cql_value` there.

As a bonus, we get more descriptive error messages (as compared to old
`ParseError::BadIncomingData` ones).
In a manner similar to the previous commit, old imperative logic
in `deser_row` is replaced with new iterator-based one, which uses
the new deserialization framework.

As a bonus, we get more descriptive error messages (as compared to
old `ParseError::BadIncomingData` ones).
As `deser_cql_value` was made use the new deserialization framework,
these tests no longer test anything. Therefore, they are deleted. Their
presence in the previous commits is useful, though, to prove that
compatibility.
It is worth noting that now, with `deser_cql_value` using the new
framework, tests there in frame/response/result.rs now are used to test
the deserialization implementation in the new framework.
A suite of tests is added, which assert that serialization composed with
deserialization yields identity.
@wprzytula
Copy link
Collaborator Author

v6.3:

  • changed UdtIterator Item a bit (will be useful for derive macros in the follow up)
  • added test cases that checks errors when collections are attempted to be deserialized from null.

@wprzytula wprzytula merged commit ab9a80e into scylladb:main Jun 13, 2024
11 checks passed
@wprzytula wprzytula deleted the value_tests branch June 13, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Improves performance of existing features semver-checks-breaking cargo-semver-checks reports that this PR introduces breaking API changes
Projects
None yet
4 participants