Develop best-practice for linking PosePipeline to analysis schemas #4

peabody124 · 2021-03-24T17:03:58Z

Right now the analysis pipeline is standalone, which is somewhat of a strength. However, videos from experiments have their own organization structure that should be reflected and will benefit from using DataJoint.

Approach 1) One option is to use the upcoming Deferred schemas feature https://github.com/datajoint/datajoint-elements/blob/main/DesignPrinciples.md. This would in principle allow creating a modular PosePipeline under each analysis schema, with foreign keys indicating all the data links. However, the down side is the computational framework will need to run on and populate each of these instances of the pipeline. Depending on the infrastructure, this could just reflect adding additional tasks to run and check for Video in each analysis schema to then process.

Two other limitations that might be more technical barriers of the datajoint-elements approach is:

currently in my current analysis paradigm there are different nodes from which the videos depend upon (for example videos collected from a cell phone, versus from stationary cameras) even though I want them to run through the same "modular" analysis pipeline -- would this require expanding the hierarchy twice??? I suspect, this can't even be done.
it appears from the Deferred schemas document the linking (i.e. parent) schema is determined at run time. However, it is likely the parent node name must be determined in the definition, so would need to be forced the same amongst different analysis pipelines.

Approach 2) The alternative, and what I'm currently doing, is having the Video have a primary key that consists of (project, filename), where each analysis pipeline then uses at least one unique project name to isolate their videos. Each analysis pipeline also has a node that contains the filename thus using the join:

pose_pipeline.Video (filename, video_project) * analysis_pipeline.VideoLinkNode (filename, video_project)

Allows connecting between the two. By overriding the key source in downstream nodes of VideoLink to include the relevant join, it populates the right data. e.g.:

@property
def key_source(self):
    return LinkNode & pose_pipeline.ExposePerson

The big down side is that deleting VideoLinkNode doesn't delete the Video (and actually blocks inserting it again if you delete it without manually deleting the corresponding Videos), and if you repopulate the analysis on that Video it won't correctly.

Ideally, there would be a way to get the best of both worlds - to have the foreign key benefits for data integrity but have some Videos point to one schema for their foreign keys and others point to a different one. This is still technically a DAG, but I don't think can be done with MySQL.

The text was updated successfully, but these errors were encountered:

peabody124 added documentation Improvements or additions to documentation enhancement New feature or request labels Mar 24, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Develop best-practice for linking PosePipeline to analysis schemas #4

Develop best-practice for linking PosePipeline to analysis schemas #4

peabody124 commented Mar 24, 2021 •

edited

Loading

Develop best-practice for linking PosePipeline to analysis schemas #4

Develop best-practice for linking PosePipeline to analysis schemas #4

Comments

peabody124 commented Mar 24, 2021 • edited Loading

peabody124 commented Mar 24, 2021 •

edited

Loading