Utilites for ontology development using linked data tables (LDTab)
In addition to the requirements in requirements.txt
, gadget
also needs wiring.py
for export
and tree
. Follow the instructions for wiring.py
setup, then install these requirements in the same virtual environment.
To install wiring.py
outside of a virtual environment:
- Install
maturin
(pip install -U pip maturin
) - Navigate to
wiring.py/wiring.rs
- Build the wheel (
maturin build
) - Install the wheel (
pip install --force-reinstall target/wheels/wiring_rs-*.whl
)
gadget
operates on a linked data table (LDTab).
Each gadget
module uses a SQL database version of an RDF or OWL ontology to create outputs. SQL database inputs should be created from OWL using LDTab to ensure they are in the right format. LDTab creates SQLite databases, but we also support PostgreSQL. The database is specified by -d
/--database
.
When loading from a SQLite database, use the path to the database (foo.db
). When loading a PostgreSQL database, use a path to a configuration file ending in .ini
(e.g., conf.ini
) with a [postgresql]
section. We recommend that the configuration file contain at least database
, user
, and password
fields. An example configuration file with all optional fields looks like:
[postgresql]
host = 127.0.0.1
database = my_ontology
user = postgres
password = secret_password
port = 5432
The following prefixes are required to be defined in the prefix
table for the gadget
commands to work:
owl
:http://www.w3.org/2002/07/owl#
rdf
:http://www.w3.org/1999/02/22-rdf-syntax-ns#
rdfs
:http://www.w3.org/2000/01/rdf-schema#
When using gadget
as a Python module, all operations accept a sqlalchemy
Connection object. For details on the Connection, see Working with Engines and Connections. We currently support SQLite and PostgreSQL. Using other connections may result in unanticipated errors due to slight variations in syntax. Note that if you use a PostgreSQL database, you must install or include psycopg2
in your requirements.
This view creates a table containing ID -> search name. This view is only required for gadget.search
. The search name is formatted as "LABEL - SYNONYM [TERM ID]". Be sure to replace PREDICATE_ID
with the predicate used for synonyms, e.g. IAO:0000118
(alternative term), and STATEMENT
should be replaced with the name of your statement table. You can add additional synonyms to the synonyms
CTE with one or more UNION
statements: UNION SELECT DISTINCT subject, object FROM STATEMENT WHERE predicate = 'PREDICATE_ID'
CREATE VIEW STATEMENT_search_view AS
WITH term_ids AS (
SELECT * FROM (
SELECT DISTINCT subject AS subject FROM STATEMENT
UNION
SELECT DISTINCT predicate FROM STATEMENT
)
),
labels AS (
SELECT DISTINCT subject, object
FROM [STATEMENT] WHERE predicate = 'rdfs:label'
),
synonyms AS (
SELECT * FROM (
SELECT DISTINCT subject, object FROM STATEMENT
WHERE predicate = 'PREDICATE_ID'
)
)
SELECT
t.subject AS subject,
COALESCE(l.object, "") || COALESCE(" - " || s.object, "") || " [" || t.subject || "]" AS label
FROM term_ids t
LEFT JOIN labels l ON t.subject = l.subject
LEFT JOIN synonyms s ON t.subject = s.subject;
The following modules are also supported as CLI commands:
Each command expects a -d <database>
/--database <database>
option that provides the path to the database to operate on. You can also provide -S <statement>
/--statement <statement>
to specify the name of your ontology statement table, if it is not the default: statement
.
The term or terms as CURIEs or labels are specified with -t <term>
/--term <term>
. You may also specify multiple terms with -T <file>
/--terms <file>
, where the file contains a list of CURIEs.
You may also specify which predicates you would like to include with -p <term>
/--predicate <term>
or -P <file>
/--predicates <file>
, where the file contains a list of predicate CURIEs or labels. Otherwise, the output includes all predicates.
The export
module creates a table (default TSV) output containing the terms and their predicates written to stdout.
python3 -m gadget.export -d [path-to-database] -S [statement-table] -t [term] > [output-tsv]
Export options:
-f <fmt>
/--format <fmt>
: specify the format of the export (TSV, CSV, or JSON)-s <char>
/--split <char>
: specify the character to split multiple cell values on (|
)-V <values>
/--values <values>
: specify the default value format, i.e., how cell values are rendered (CURIE, IRI, or LABEL)- Note: if you do not specify any predicates to include, the default value format option for each term will be included as the first column
-a
/--include-annotation
: include annotations as additional columns followiing the predicate when present
The extract
module creates a new table called extract
containing the term(s) and term ancestors up to owl:Thing
as a new extracted module.
python3 -m gadget.extract -d [path-to-database] -S [statement-table] -t [term]
This table can then be exported to TTL using LDTab:
ldtab export -t extract [path-to-database] [output-ttl]
Extract options:
-e <table>
/--extract-table <table>
: name of extract table to create-n
/--no-hierarchy
: do not include ancestors of the input term/s-I <intermediates>
/--intermediates <intermedaites>
: specify intermediate nodes to include (all or none)-m <ontology-IRI>
/--imported-from <ontology-IRI>
: annotate extracted terms with a source ontology IRI-M <predicate>
/--imported-from-property <predicate>
: property to use forimported-from
annotation ('imported from')-i <file>
/--imports <file>
: see below, Creating Import Modules-s <source>
/--source <source>
: see below, Creating Import Modules
gadget.extract
can also be used with import configuration files (-i <file>
/--imports <file>
):
python3 -m gadget.extract -d [path-to-database] -i [path-to-imports]
These files contain the terms you wish to include in your import module, along with some other details. The required headers are:
- ID: The term ID to include
- Label: optional; the term's label for documentation
- Parent ID: optional; a term ID of a parent to assert
- Parent Label: optional; the parent's label for documentation
- Related: optional; related entities to include
- Source: optional; a short ontology ID that specifies the source
Including the source can be useful when you have one file that you are using to create more than one import module. When you specify the Source column, you can use the -s <source>
/--source <source>
option in the command line. For example, if one of the sources in your import config is obi
:
python3 -m gadget.extract -d obi.db -i imports.tsv -s obi
The Related column should be a space-separated list that can use zero or more of the following. When included, all terms that match the relationship from the input database will be included in the output:
ancestors
: Get all intermediate terms between the given term and it's first ancestor that is included in the input terms. If--intermediates none
, get the first included ancestor OR the top-level ancestor if an ancestor is not included in the input terms.children
: Include the direct children of the term, regardless of if they are included in the input terms or not.descendants
: Get all intermediate terms between the given term and the lowest-level term (terms that do not have children). If--intermediates none
, get all bottom-level terms only.parents
: Include the direct parents of the term, regardless of if they are included in the input terms or not.
You can also pass a source configuration file that contains the options for each source ontology used in your imports file using -c <file>
/--config <file>
. Note that with this option, a --source
is always required:
python3 -m gadget.extract -d obi.db -i imports.tsv -c config.tsv -s obi
This is a TSV or CSV with the following headers:
- Source: a short ontology ID that specifies the source; matches a source in the imports file
- IRI: optional; the IRI of the ontology to be added to each term as the value of an 'imported from' statement
- Intermediates: optional; an
--intermediates
option:all
ornone
- Predicates: optional; a space-separated list of predicate IDs to include from the import
The config file can be useful for handling multiple imports with different options in a Makefile
. If your imports all use the same --intermediates
option and the same predicates, there is no need to specify a config file.