Skip to content

Commit

Permalink
Proposal: Different languages for model specification (#538)
Browse files Browse the repository at this point in the history
# Motivation

There are a number of formats for specifying models in systems biology, each with their specific strengths and weaknesses. PEtab version 1.0.0 only allows  Systems Biology Markup Language (SBML) models. While SBML is supported by a large number of tools, there are good reasons to use other formats. For example, rule-based model formats (e.g., BioNetGenLanguage) permit more abstract and compact specification of models based on rules, which are generalisations of reactions. Therefore, and based on user request (#436), we propose to lift PEtab’s restriction to SBML models and allow arbitrary model formats.

# Proposed changes

* Changes to the PEtab YAML file:
  * Change `sbml_files` to `models`
  * `models` entries will be model IDs (following the existing conventions for PEtab IDs) mapping to:
    * `location`: path / URL to the model
    * `language`: model format
      Initial set of model format identifiers (to be extended as needed):
      * SBML: `sbml`
      * CellML: `cellml`
      * BNGL: `bngl`
      * PySB: `pysb`
  * An additional entry for mapping tables (see below) is added

  Example:

  **Before:**
  ```yaml
  format_version: 1
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    sbml_files:
    - model1.xml
  ```

  **After:**
  ```yaml
  format_version: 2.0.0
  parameter_file: parameters.tsv
  problems:
  - condition_files:
    - conditions.tsv
    measurement_files:
    - measurements.tsv
    observable_files:
    - observables.tsv
    mapping_file: mappings.tsv # optional 
    models:
      id_for_model1:
        location: model1.xml
        language: sbml
  ```



* Changes to the format of existing tables/files:
  * Condition/Observable/Parameter Table
    All symbols that previously referenced the ID of SBML entities, such as parameter IDs or compartment IDs, now refer to (globally unique) named entities in the model, such as parameters, observables, expressions. For example, condition table columns may correspond to parameters, states, species of the referenced model. 
    For species, assignments in the condition table set the initial value at the beginning of the simulation for that condition, potentially replacing the initialization from preequilibration. For all other entities, values are statically replaced at all time points. For entities that assign values to other entities, such as SBML AssignmentRules, the value of the target of that rule is statically replaced at all time points.    
* Additional files
  * Mapping Table: 
    Mapping PEtab entity IDs to entity IDs in the model. This optional file may be used to reference model entities in PEtab files where the ID in the model would not be a valid identifier in PEtab (e.g., due to containing blanks, dots, or other special characters).
    The tsv file has two mandatory columns: `petabEntityId`, `modelEntityId`. Additional columns are allowed. modelEntityIds must be unique identifiers in the model. The mapping table must not map modelEntityIds to petabEntityIds that are also defined in any other part of the PEtab problem. modelEntityId may not refer to other petabEntityIds, including those defined in the mapping table. petabEntityIds defined in the mapping table may be referenced in condition, measurement, parameter and observable tables, but cannot be referenced in the model itself.
    For example, in SBML, local parameters may be referenced as `$reactionId.$localParameterId`, which are not valid PEtab IDs as they contain a `.` character. Similarly, this table may be used to reference specific species in a BGNL model which may contain many unsupported characters such as `,`, `(` or `.`. However, please note that IDs must exactly match the species names in the BNGL generated network file and no pattern matching will be performed. 

# Implications

* Tools need to check the model format and provide an informative message if the given format cannot be handled
* Validators will skip model-dependent validation when encountering unknown model types - ideally there would be some plugin mechanisms to provide validation

--- 

Co-authored by @FFroehlich @fbergmann. Also thanks to everybody participating in these discussions during the last COMBINE meeting.

---------



Co-authored-by: FFroehlich <[email protected]>
Co-authored-by: Dilan Pathirana <[email protected]>
Co-authored-by: Frank T. Bergmann <[email protected]>
  • Loading branch information
4 people committed Jul 3, 2024
1 parent 80b1f35 commit 298ed8b
Show file tree
Hide file tree
Showing 2 changed files with 124 additions and 44 deletions.
37 changes: 29 additions & 8 deletions doc/_static/petab_schema.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,26 @@ properties:
files and optional visualization files.
properties:

sbml_files:
type: array
description: List of PEtab SBML files.

items:
type: string
description: PEtab SBML file name or URL.
model_files:
type: object
description: One or multiple models

# the model ID
patternProperties:
"^[a-zA-Z_]\\w*$":
type: object
properties:
location:
type: string
description: Model file name or URL
language:
type: string
description: |
Model language, e.g., 'sbml', 'cellml', 'bngl', 'pysb'
required:
- location
- language
additionalProperties: false

measurement_files:
type: array
Expand Down Expand Up @@ -78,8 +91,16 @@ properties:
type: string
description: PEtab visualization file name or URL.

mapping_files:
type: array
description: List of PEtab mapping files.

items:
type: string
description: PEtab mapping file name or URL.

required:
- sbml_files
- model_files
- observable_files
- measurement_files
- condition_files
Expand Down
131 changes: 95 additions & 36 deletions doc/documentation_data_format.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ PEtab data format specification
===============================


Format version: 1
Format version: 2.0.0

This document explains the PEtab data format.

Expand Down Expand Up @@ -41,12 +41,11 @@ Overview
---------

The PEtab data format specifies a parameter estimation problem using a number
of text-based files (`Systems Biology Markup Language (SBML) <http://sbml.org>`_
and
of text-based files (
`Tab-Separated Values (TSV) <https://www.iana.org/assignments/media-types/text/tab-separated-values>`_)
(Figure 2), i.e.

- An SBML model [SBML]
- A model

- A measurement file to fit the model to [TSV]

Expand All @@ -67,6 +66,9 @@ and
- (optional) A visualization file, which contains specifications how the data
and/or simulations should be plotted by the visualization routines [TSV]

- (optional) A mapping file, which allows mapping PEtab entity IDs to entity
IDs in the model, which might not have valid PEtab IDs themselves [TSV]

.. figure:: gfx/petab_files.png
:alt: Files constituting a PEtab problem

Expand All @@ -91,11 +93,11 @@ problem as such.
- Fields in "[]" are optional and may be left empty.


SBML model definition
---------------------

The model must be specified as valid SBML. There are no further restrictions.
Model definition
----------------

PEtab 2.0.0 is agnostic of specific model formats. A model file is referenced
in the PEtab problem description (YAML) via its file name or a URL.

Condition table
---------------
Expand All @@ -107,7 +109,7 @@ different experimental conditions).
This is specified as a tab-separated value file in the following way:

+--------------+------------------+------------------------------------+-----+---------------------------------------+
| conditionId | [conditionName] | parameterOrSpeciesOrCompartmentId1 | ... | parameterOrSpeciesOrCompartmentId${n} |
| conditionId | [conditionName] | modelEntityId1 | ... | modelEntityId${n} |
+==============+==================+====================================+=====+=======================================+
| STRING | [STRING] | NUMERIC\|STRING | ... | NUMERIC\|STRING |
+--------------+------------------+------------------------------------+-----+---------------------------------------+
Expand Down Expand Up @@ -140,32 +142,44 @@ Detailed field description
Condition names are arbitrary strings to describe the given condition.
They may be used for reporting or visualization.

- ``${parameterOrSpeciesOrCompartmentId1}``

Further columns may be global parameter IDs, IDs of species or compartments
as defined in the SBML model. Only one column is allowed per ID.
Values for these condition parameters may be provided either as numeric
values, or as IDs defined in the SBML model, the parameter table or both.

- ``${parameterId}``

The values will override any parameter values specified in the model.

- ``${speciesId}``

If a species ID is provided, it is interpreted as the initial
condition of that species (as amount if `hasOnlySubstanceUnits` is set to `True`
for the respective species, as concentration otherwise) and will override the
initial condition given in the SBML model or given by a preequilibration
condition. If no value is provided for a condition, the result of the
preequilibration (or initial condition from the SBML model, if
no preequilibration is defined) is used.

- ``${compartmentId}``

If a compartment ID is provided, it is interpreted as the initial
compartment size.

- ``${modelEntityId}``

Further columns may be the IDs of model entities that have globally unique
IDs, such as parameters, species or compartments defined in the model to set
condition-specific values. Only one column is allowed per ID.
Values for these entities may be provided either as numeric values, or as IDs
of globally unique entity IDs as defined in the model, the mapping table or
the parameter table.

Any non-``NaN`` value will override the original values of the model, or if
preequilibration was used, they will override the value obtained from
preequilibration. A ``NaN`` value indicates that the original value of the
model is to be used (when used in the preequilibration condition, or in the
simulation condition if no preequilibration is used) or that the result of
preequilibration is to be used (when used in the simulation condition after
preequilibration).

The value in the condition table either replaces the initial value or the
value at all timepoints based on whether the model entity has a rate law
assigned or not:

* For model entities that have constant algebraic assignments
(but not necessarily constant values), i.e, that do not have a rate of
change with respect to time assigned and that are not subject to event
assignments, the algebraic assignment is replaced statically at all
timepoints. Examples for such model entities are the targets of SBML
`AssignmentRules`.

* For all other entities, e.g., those that are assigned by SBML `RateRules`,
only the initial value can be assigned in the condition table. If an
assignment of the rate of change with respect to time or event assignment
is desired, the values of model entities that are used to define rate of
change or event assignments must be assigned in the condition table.
If no such model entities exist, assignment is not possible.

If the model has a concept of species and a species ID is provided, its
value is interpreted as amount or concentration in the same way as anywhere
else in the model.

Measurement table
-----------------
Expand Down Expand Up @@ -705,6 +719,49 @@ Detailed field description
legend and which defaults to the value in ``datasetId``.


Mapping table
-------------

Mapping PEtab entity IDs to entity IDs in the model. This optional file may be
used to reference model entities in PEtab files where the ID in the model would
not be a valid identifier in PEtab (e.g., due to inclusion of blanks, dots, or
other special characters).

The TSV file has two mandatory columns, ``petabEntityId`` and
``modelEntityId``. Additional columns are allowed.

+---------------+---------------+
| petabEntityId | modelEntityId |
+===============+===============+
| STRING | STRING |
+---------------+---------------+
| reaction1_k1 | reaction1.k1 |
+---------------+---------------+


Detailed field description
~~~~~~~~~~~~~~~~~~~~~~~~~~

- ``petabEntityId`` [STRING, NOT NULL]

A valid PEtab identifier that is not defined in any other part of the PEtab
problem. This identifier may be referenced in condition, measurement,
parameter and observable tables, but cannot be referenced in the model
itself.

- ``modelEntityId`` [STRING, NOT NULL]

A globally unique identifier defined in the model,
*that is not a valid PEtab ID* (see :ref:`identifiers`).

For example, in SBML, local parameters may be referenced as
``$reactionId.$localParameterId``, which are not valid PEtab IDs as they
contain a ``.`` character. Similarly, this table may be used to reference
specific species in a BNGL model that may contain many unsupported
characters such as ``,``, ``(`` or ``.``. However, please note that IDs must
exactly match the species names in the BNGL-generated network file, and no
pattern matching will be performed.

Extensions
~~~~~~~~~~

Expand Down Expand Up @@ -743,7 +800,7 @@ Parameter estimation problems combining multiple models
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Parameter estimation problems can comprise multiple models. For now, PEtab
allows to specify multiple SBML models with corresponding condition and
allows one to specify multiple models with corresponding condition and
measurement tables, and one joint parameter table. This means that the parameter
namespace is global. Therefore, parameters with the same ID in different models
will be considered identical.
Expand Down Expand Up @@ -1070,6 +1127,8 @@ float values are demoted to boolean values. For example, in ``1 + true``,
the expression is interpreted as ``true && true = true``.


.. _identifiers:

Identifiers
-----------

Expand Down

0 comments on commit 298ed8b

Please sign in to comment.