Skip to content

OLS 4 Executive overview

henrietteharmse edited this page Jan 24, 2023 · 9 revisions

OLS 4 is available at www.ebi.ac.uk/ols4/. Note that you need the slash at end.

Why

The key reasons for the redesign of OLS are:

  1. Data releases started to take longer and longer. Where we used to able to have a new data release every 24 hours a few years ago, we found in the last year that often releases ran for 72 hours or even for more than a week. When it runs for more than a week, the cluster terminates the job which corrupts the database. The long time required for data releases meant that we could not consider any new use cases that may require re-indexing of all the ontologies hosted on OLS.
  2. Even when the data release job ran for less than a week, we still had frequent corruptions of the database. In most cases we were able to catch these corruptions before they were propagated to OLS production, and thus our users experienced few outages. However, our users have been directly affected by delays in updates and indirectly effected by the team doing busy work (that is, fire fighting to keep the service up) rather than adding useful use cases.
  3. OLS 3 did not index all information available in .owl file of an ontology. An example of information that is not indexed in OLS3 is annotations on annotations. I.e., for synonyms you may want to capture additional metadata stating citation information.

How

OLS 4 implements a number of technical improvements. Here only key changes from a user perspective are highlighted.

  1. The root cause of the longer and longer data releases is related to OLS 3 using a reasoner and storing the complete ontology in memory. This resulted in OLS 3 requiring 150GB on the cluster to index. This huge memory requirement often meant that the OLS indexing job waited potentially for days before a node with that amount of memory is available. OLS4 assumes that ontologies are pre-reasoned and thus does not do any reasoning on the ontologies we are indexing which means there is no reason to load the complete ontology into memory. This allows OLS4 to make use of streaming and hence the memory footprint of OLS4 is small.

  2. OLS 4 makes use of an external database and no longer uses an embedded database. This embedded database was the root cause of many of the data corruption issues we experienced.

  3. We understand that many of our users rely on the OLS API for the implementation of their pipelines. For this reason we aimed at full backward compatibility of the OLS 4 API with OLS 3 to limit the effect migrating from OLS 3 to OLS 4 will have on our users.

Impact

The benefits we expect OLS 4 (once stabilised) will have for our users are the following:

  1. Predictable daily data releases. This means your new version of your ontology should be available on OLS within 24-48 hours.

  2. Because we can now index all ontologies in a few hours, we can consider more interesting new use cases that may affect indexing of all the ontologies. Once OLS 4 is stabilised we will have an OLS user day to discuss planned new use cases to be implemented.

When

Here is our tentative roadmap. We will keep this updated as the rollout of OLS 4 progresses. Roadmap