Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File Sink Client Typing #849

Merged
merged 28 commits into from
Aug 26, 2024
Merged

File Sink Client Typing #849

merged 28 commits into from
Aug 26, 2024

Conversation

michaeldjeffrey
Copy link
Contributor

@michaeldjeffrey michaeldjeffrey commented Jul 30, 2024

Typing a FileSInkClient

FileSinkClient is widely used in this codebase, it will currently write anything to a file that is Vec<u8>.
Some parts of Oracles take more than a few handles to FileSinkClient, and the only way to know what is being written is by the variable name, and following it's usage to where an actual .write() happens.

Written filenames are determined by a provided FileType, but there is no guarantee that the message being written matches the FileType or the variable name. An example where this could go wrong is during a refactor of a component that takes multiple FileSinkClients, and the argument order of a constructor is changed. Being hidden in a larger diff block, it would be a very difficult thing to find. Or adding a file sink to a component, and the argument provided is not in the same position as the argument added.

With a FileSinkClient<T>, there's less to find. Even if the variable is named incorrectly the compiler would shout at you if the wrong message type was being written.

FileSinkWriteExt trait

An example of this trait

impl_file_sink!(
    packet_verifier::InvalidPacket,    // proto type to be written
    FileType::InvalidPacket.to_str(),  // filename is `&'static str`, does not include file ext.
    "invalid_packets"                  // metric suffix
);

and usage

//  (FileSinkClient<InvalidPacket>, FileSink<InvalidPacket>)
let (invalid_packets, invalid_packets_server) = InvalidPacket::file_sink(
            store_base_path,
            file_upload.clone(),
            Some(DEFAULT_ROLL_TIME),
            env!("CARGO_PKG_NAME"), // metric prefix
        )
        .await?;

FileSinks and prost::Message

impl<T: prost::Message> FileSinkWriteExt for T {} seemed like a good idea until it was made abundantly clear that prost::Message has an impl for basically every type there is. Which makes things like having a FileSinkClient<String> or FileSinkClient<Vec<u8>> very difficult, as they would use prost::Message and the values you get out of the file would be tainted with proto encoding.

MsgBytes

The FileSinkWriteExt trait sits on top of MsgBytes, anything that implements MsgBytes can be used with a FileSink. Implementations for String, Vec<u8>, and bytes::Bytes are provided already in case someone wants to use a FileSink without a proto message.

Assumptions

From what I understand...
auto_commit saves a file for every write.
roll_time attempts to save a file every so often if there is something to save.

It doesn't seem to make sense to me to see if there's something to save and save after every write. The helper combines these two flags behind an Option.

@michaeldjeffrey michaeldjeffrey marked this pull request as ready for review August 3, 2024 20:26
Instead of dealing with bundles of Vec<u8> each file sink and client
will be paired over any type that can be turned into bytes.
…ost traits

To keep from dealing with Vec<u8> and String auto implementations for
prost::Message that disallows using a file sync with anything but a
prost::Message.

We're going to follow the other Msg* trait conventions where we
implement them only for the types we are using.

This allows someone to make a file sync for whatever type they may have,
and convenience for proto types we write reports about.
The abstraction comes in two parts.

The first we can get a file sink from a type if it implements the
FileSinkWrite trait.

The second is a type being able to be written to a file sink if it can
be turned into bytes.

This separates the convenience from the actual functionality.
this allows us to attach the metric name to the file type
implementation, and only have to pass in the calling crate for the
prefix.

FileSinks are really only created once, and there are not so many of
them that owning a string introduces a crazy amount of overhead.
The FileSinkWrite trait is a convenience method for creating file sinks,
but it should not be tied to being able to use a file sink. The only
requirement for that is being able to turn your message into bytes.
This is a specialized helper function, every crate using it was only
using these two options. We can provide the base functionality with a
single option.

Ideally the Builder would be updated to take some sort of enum for the
commit type as it doesn't make a lot of sense to auto_commit on every
write _and_ roll the file at a certain interval.
- include new verified mapper code
- update deps
- fix botched rebase conflicts
Taking a static string allows FileSinkWriteExt to be implemented by
users outside the oracles codebase, and starts the journey of moving
away from the FileType enum.
Copy link
Contributor

@jeffgrunewald jeffgrunewald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

@michaeldjeffrey michaeldjeffrey merged commit ca12bb6 into main Aug 26, 2024
17 checks passed
@michaeldjeffrey michaeldjeffrey deleted the mj/file-sink-client-typing branch August 26, 2024 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants