Skip to content

tilde-lab/OWL2Vec-Star

 
 

Repository files navigation

OWL2Vec-Star

pypi Documentation Status

OWL2Vec*: Embedding OWL ontologies

Features

OWL2Vec* v0.2.0 exposes a CLI with two subcommands after installation, which allows you to perform two main programs. You can also run the two original python programs without installation (see the requirements in setup.py).

Installation command:

$ make install

Standalone

This command will embed one ontology. It can be configured by the configuration file default.cfg. See the examples and comments in default.cfg for the usage.

Running command:

$ owl2vec_star standalone --config_file default.cfg

Running program:

$ python OWL2Vec_Standalone.py --config_file default.cfg

Note: Different from the experimental codes, the standalone command has implemented all OWL ontology relevant procedures in python with Owlready, but it also allows the user to use pre-calculated annotations/axioms/entities/projection to generate the corpus.

Standalone Multi

This command will embed multiple ontologies into one embedding model, where the documents from multiple ontologies will be merged. One use case example is embedding all the conference relevant ontologies of the OAEI conference track at once.

Running command:

$ owl2vec_star standalone-multi --config_file default_multi.cfg

Running program:

$ python OWL2Vec_Standalone_Multi.py --config_file default_multi.cfg

Note: Different from the standalone command, this command for multiple ontologies does NOT allow the pre-calculated or external annotations/axioms/entities/projection.

Accessing Embeddings

The embedding model is saved in $embedding_dir (or $cache_dir/output if $embedding_dir is not set). The class IRI vector can be accessed:

>> import gensim
>> from owlready2 import *
>> model = gensim.models.Word2Vec.load(word2vec_embedding_file)
>> onto = get_ontology(onto_file).load()
>> classes = list(onto.classes())
>> c = classes[0]
>> c.iri in model.wv.index_to_key
>> iri_v = model.wv.get_vector(c.iri)

The class word vector (of words of the class label defined by e.g., rdfs:label) can be accessed in a similar way with averaging:

>> from nltk import word_tokenize
>> from numpy as np
>> label = c.label[0]
>> text = ' '.join([re.sub(r'https?:\/\/.*[\r\n]*', '', w, flags=re.MULTILINE) for w in label.lower().split()])
>> words = [token.lower() for token in word_tokenize(text) if token.isalpha()]
>> n = 0
>> word_v = np.zeros(model.vector_size)
>> for word in words:
       if word in model.wv.index_to_key:
           word_v += model.wv.get_vector(word)
           n += 1
>> word_v = word_v / n if n > 0 else word_v

Note: the class IRI vector and the class word vector can be independently used, or concatenated.

Publications

Main Reference

  • Jiaoyan Chen, Pan Hu, Ernesto Jimenez-Ruiz, Ole Magnus Holter, Denvar Antonyrajah, and Ian Horrocks. OWL2Vec*: Embedding of OWL ontologies. Machine Learning, Springer, 2021. [PDF] [@Springer] [Collection] [Codes in package or folder]

Applications with OWL2Vec*

  • Jiaoyan Chen, Ernesto Jimenez-Ruiz, Ian Horrocks, Denvar Antonyrajah, Ali Hadian, Jaehun Lee. Augmenting Ontology Alignment by Semantic Embedding and Distant Supervision. European Semantic Web Conference, ESWC 2021. [PDF] [LogMap Matcher work]
  • Ashley Ritchie, Jiaoyan Chen, Leyla Jael Castro, Dietrich Rebholz-Schuhmann, Ernesto Jiménez-Ruiz. Ontology Clustering with OWL2Vec*. DeepOntonNLP ESWC Workshop 2021. [PDF]

Preliminary Publications

  • Ole Magnus Holter, Erik Bryhn Myklebust, Jiaoyan Chen and Ernesto Jimenez-Ruiz. Embedding OWL ontologies with OWL2Vec. International Semantic Web Conference. Poster & Demos. 2019. [PDF]
  • Ole Magnus Holter. Semantic Embeddings for OWL 2 Ontologies. MSc thesis, University of Oslo. 2019. [PDF] [GitLab]

Case Studies

Data and codes for class membership prediction on the Healthy Lifestyles (HeLis) ontology, and class subsumption prediction on the food ontology FoodOn and the Gene Ontology (GO), are under the folder case_studies/.

Credits

Code under owl2vec_star/rdf2vec/, which mainly implement walking strategies over RDF graphs, is derived from pyRDF2Vec (version 0.0.3, last access: 03/2020) with revision.

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template. Many thanks to Vincenzo Cutrona for preparing this package.

Releases

No releases published

Packages

No packages published

Languages

  • Python 80.3%
  • Java 11.0%
  • Jupyter Notebook 5.4%
  • Shell 2.6%
  • Makefile 0.7%