Skip to content

Roadmap

MariusDanner edited this page Mar 25, 2020 · 6 revisions

Though the pipeline is fully usable as-is, there are some features in our backlog that could serve as ideas for future enhancements. Here they are, in no particular order:

  • Backend
    • C++ as execution environment. This would involve creating an equivalent to mpci_utils.r for C++ and set up a Dockerfile according to the requirements laid out here.
      • Once C++ support is implemented, it would be nice to add a faster, parallelized version of PC from this repo. Ideally one would first implement discrete conditional independence tests for this project and embed it as linked library.
    • It would be nice to implement support for prior knowledge using the fixed_edges and fixed_gaps parameters of pcalg. This would include the feature to create a new experiment based on the annotations of the validated experiment. Therefore the EdgeInformations missing and approved need to converted to fixed_edges and declined to fixed_gaps.
    • To allow a different type of prior knowledge, it might be interesting to group different nodes together. This would allow custom edge orientation rules, e.g., if a certain group of nodes cannot be the effect of a different group of nodes because it always happened before.
    • There is an existing endpoint picking out notable edges according to edge weight, finding notable paths would be interesting as well.
    • One could periodically save intermediate results for long computations. For example, run pcalg with m.max=1 and return intermediate result, than run it with m.max=2 and the already known fixed_gaps, and so on.
    • Merge the redundant features of is_ground_truth and EdgeInformation. EdgeInformation offers more functionality and it might make sense to remove the is_ground_truth property of Edges and integrate the Ground Truth upload and the comparison metrics into the EdgeInformation feature
    • The caching of dataset metadata could be done as properties of the dataset model
    • Make scheduling more advanced. Currently, sequential jobs can only run, if no other job is currently running in the whole environment (kubernetes namespace). This has several drawbacks: When there is more than one server, there could still be only one job at a time. There can be jobs running on the server in other namespaces. So make the scheduling server specific and try to find a way to block the server for all namespaces.
  • Frontend
    • When viewing datasets, one could display a preview of the observation matrix by loading the first N rows of a dataset and putting them into a table element. Then it might make sense to allow the user to group the different columns (Nodes) together, to help him navigating in the graph exploration and defining prior knowledge (edge orientation rules).
    • When displaying interventions of relationships with bidirectional confounders, one valid set of parents within the equivalence class is selected to perform the intervention. One could try to think of ways to display these sets intuitively in order to perform separate interventions.