Streaming Heterogeneous Event Data
- The tooling for the event model management should be as transparent and
small as possible.
- shed-streaming accomplishes this by having only two additional nodes
FromEventStream
andToEventStream
, which convert data from the event model to base types/numpy and from base types/numpy to the event model - Everything else will be handled by
streamz
nodes operating on base types and numpy
- shed-streaming accomplishes this by having only two additional nodes
- We should track the data provenance with as little burden on the user
as possible.
- Since the users have agreed to be part of our
streamz
based ecosystem we should track data provenance without any additional work on the user's part. - This is accomplished by having the translation nodes keep track of the
- source of the data coming into the graph
- when the data entered the graph
- the graph itself
- Data provenance should support:
- Replaying data analysis
- Env tracking
- Playing new data through old analysis
- Editing analysis and replaying
- Since the users have agreed to be part of our
- Data should be stored via a
DataBroker
, which has a similar structure to the experimental data.