-
Notifications
You must be signed in to change notification settings - Fork 0
Testing
Gaurav Vaidya edited this page Sep 16, 2016
·
2 revisions
- Prevent regressions: once a phyloreference works (identifies the correct nodes on a phylogenetic tree), it should keep working, however we change our ontology in the future.
- Test suite for all phyloreferences: our test suite could be used to validate any alternate way of calculating phyloreferences.
- Determining limits: particularly complicated phyloreferences could be added to the test suite to ensure that they work, and where logical reasoning might fail to function correctly.
- Comparing representations: we can try representing the same phyloreference in different ways and identify the best representation for (1) speed in computer reasoning, and (2) in human understanding.
- Our scripts: ensure that all our scripts provide predictable outputs to provided inputs.
- Data representation: ensure that ontologies and phyloreferences are clearly and completely represented, with as much data being retained from the input and with no nodes being represented incorrectly.
- Reasoning output: once phylogeny nodes have been categorized by the reasoner, we need to test to ensure that they are categorized correctly.
- pytest: http://www.pytest.org/
- rdflib: https://rdflib.readthedocs.io
- Comparisons of text files:
- Many files can be easily compared to pre-generated files. If the output file has changed, a developer can compare the file using pytest (which can provide a diff of the two files: http://docs.pytest.org/en/latest/assert.html#making-use-of-context-sensitive-comparisons) and simply replace the pre-generated file if the change is an improvement.
- This may require scripts to remove changeable parts of a file (e.g. date of creation) before comparisons, but this should be easy to avoid or work around.
- Comparison of RDF files as triples:
- rdflib can load the major RDF representations (XML, Turtle) as a triple-store.
- This can then be tested in four ways:
- Lists of expected objects can be checked to ensure that they exist and are of the right type.
- A list of expected triples can be checked to ensure that they have been generated correctly.
- Requirements could be expressed in Python, such as statements like "every instance in the Node class should have a sibling or a parent node".
- Two RDF/XML files can be compared at the level of triples, ensuring that they are identical (comparing them as text files will likely be easier).
- Validation of triples
- We can use Shape Expressions (ShEx: http://shex.io/), SHACL (https://www.w3.org/TR/shacl/) or write tests in SPARQL (http://ceur-ws.org/Vol-952/paper_2.pdf).
- Currently, I’ve managed to get ShEx working with a lot of hacking on https://github.com/hsolbrig/shexypy -- this is probably not something we want to use unless it’s a big improvement over something simpler, like SPARQL.
- Example ShExML file: https://github.com/gaurav/phylo2owl/blob/fc32a7e8d81ac994afc5af35717dfd5d77a8d3fa/tests/shex/Node.shexml
- Hacky script to use shexypy library to test nodes against ShExML: https://github.com/gaurav/phylo2owl/blob/fc32a7e8d81ac994afc5af35717dfd5d77a8d3fa/tests/shex/shex.py
- Validation of reasoning
-
Any of the techniques listed above for testing RDF or triples can also be applied to the output from a reasoner. Our primary goal will probably be to test whether a particular phyloreference matches the correct nodes in a phylogeny without matching any incorrect nodes.
-
This means we need three things:
- An input ontology (as Newick or RDF),
- An input phyloreference (as OWL), and
- A list of expected nodes that should be matched (as RDF)
-
These could be coded as Python tests using rdflib, as in the example above, but based on the reasoned triples.
Funded by the US National Science Foundation through collaborative grants DBI-1458484 and DBI-1458604. See Funding for details.