Convert this asset to use current TS processor API #26

kstaken · 2019-03-07T22:29:46Z

This is still using the old style processor APis and should be updated at some point.

macgyver603 · 2019-08-28T00:29:32Z

This is mostly done with the compressed_file_reader being the only processor left to convert. The slicer is substantially different than the file_reader, so I think the biggest question here would be whether to modernize the compressed_file_reader or to add it as part of the file_reader.

Currently, the processor will uncompress files to a separate working directory before slicing them for processing. Once the last slice of a file is processed, an archive mechanism (known slice order issue in #17) moves it to an "archive" directory. Also, there is a timer mechanism in the slicer to check the specified directory at some interval for new files if the job is a persistent job. Finally, the processor also maintains an on-disk state for each file being processed (I think this should be removed in favor of just logging file statuses where applicable).

In adding compression as an option for the file_reader, I imagine it would have a compression_type option with this schema:

compression_type: {
    doc: 'Determines whether or not to uncompress files',
    default: 'uncompressed',
    format: ['uncompressed', 'lz4',...]
}

For the decompression jobs, the slicer could just decompress the files in place and add both the compressed path and the uncompressed path as metadata for each record. For now, the files would be left on-disk as-is for cleanup after the job by an operator or some other program. Adding this to the file_reader should be fairly straightforward since it would just be a matter of adding the compression utilities to the slicer. The next question would be whether or not to preserve the persistent job logic. I think all of the file reader jobs I have encountered so far were once jobs, but if there is a need for persistent file reader jobs, this functionality should at least be extended to the file_reader as well.

kstaken assigned macgyver603 Mar 7, 2019

godber unassigned macgyver603 Mar 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convert this asset to use current TS processor API #26

Convert this asset to use current TS processor API #26

kstaken commented Mar 7, 2019

macgyver603 commented Aug 28, 2019

Convert this asset to use current TS processor API #26

Convert this asset to use current TS processor API #26

Comments

kstaken commented Mar 7, 2019

macgyver603 commented Aug 28, 2019