You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In ingest, various preprocessing steps are defined and run as part of dataset ingestion. The inputs to these are defined, but it would also be useful to see the effects of them logged somehow in the Config object that gets stored with every processed and archived dataset. A few use cases are
filter_rows: number of rows filtered out (or both number of rows before filter and number of rows after)
append_prev: the version that was actually appended to
Currently, each ingest processor takes a dataframe and some amount of kwargs and returns a dataframe. There are definitely a few ways this could be handled. It could be a general approach that records some statistics before and after each processing step. But it seems like it needs to be more specific to the steps - something like "append_prev" really should log the version appended to, and there's no generalized way to log that info outside of the steps themselves
The text was updated successfully, but these errors were encountered:
In ingest, various preprocessing steps are defined and run as part of dataset ingestion. The inputs to these are defined, but it would also be useful to see the effects of them logged somehow in the
Config
object that gets stored with every processed and archived dataset. A few use cases arefilter_rows
: number of rows filtered out (or both number of rows before filter and number of rows after)append_prev
: the version that was actually appended toCurrently, each ingest processor takes a dataframe and some amount of kwargs and returns a dataframe. There are definitely a few ways this could be handled. It could be a general approach that records some statistics before and after each processing step. But it seems like it needs to be more specific to the steps - something like "append_prev" really should log the version appended to, and there's no generalized way to log that info outside of the steps themselves
The text was updated successfully, but these errors were encountered: