Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encourage the use of COMBINE archives as exchange format for the model and its execution #18

Open
draeger opened this issue Aug 13, 2020 · 6 comments
Labels
modelFormat Requests for model file formats

Comments

@draeger
Copy link
Collaborator

draeger commented Aug 13, 2020

Description of the issue:

The SBML is a declarative file format that specifies model components, structure, and interaction of those components. But it does not directly specify how to run that model or how to directly reproduce the figures in a scientific paper from the model. Depending on which solver is used to run a model or in which framework a model is interpreted, the results may diverge.

By using the additional format SED-ML (Simulation Experiment Description Markup Language), it becomes possible to specify how to interpret and run a model, including the typical steps in a simulation life cycle.

To make the use of two separate files less cumbersome for the user, the COMBINE archive format allows wrapping both in a ZIP-based archive together with a manifest file that specifies the relationship between model and SED-ML script. Further data can be added to that archive, e.g., annotation glossaries, original publications, image files with pathways, or SBGNML files for defining pathway maps.

Expected feature/value/output:

Instead of SBML, the exchange format of COBRA tools would become a COMBINE archive file (typically with extension OMEX). It would contain the SBML file with the model, possibly annotations in a separate file, a SED-ML file that specifies how to execute the model, and perhaps more.

Current feature/value/output:

The steps to run the model would be encoded in the SED-ML file allowing third-party software to execute the same steps, hence improving the interoperability of various software and reproducibility of the results.

Reproducing these results:

There are implementations available in Python and other languages to access content within COMBINE archives and to read/write the manifest file.

@haowang-bioinfo haowang-bioinfo added the modelFormat Requests for model file formats label Aug 14, 2020
@mihai-sysbio
Copy link
Member

mihai-sysbio commented Aug 14, 2020

Interesting idea @draeger.

As a concept, a COMBINE archive is a great step forward to solve problems in modelling. However, being a ZIP limits what it can achieve when compared to versioning (git) and infrastructure (GitHub). I see some advantages if there would be a way to combine (no pun intended) the two approaches.

For situations like these, I default to the 6 thinking hats method. It's easier in person, but in my experience it works well in writing too.

White hat - facts:

  • the COMBINE archive is a ZIP
  • a COMBINE archive needs hosting
  • the SBML format is XML based
  • the SED-ML format is XML based
  • XML formats can be versioned by git
  • the SBML format is a requirement of standard-GEM
  • a release on GitHub is a zip (of the repository state)
  • hosting a release on GitHub is free

Red hat - emotion:

  • standard-GEM is very lightweight, the addition of COMBINE might be too much

Black hat - judgement:

  • git is not meant for versioning binaries (zip)
  • git LFS can version binaries but it adds requirements/complexity
  • releases on GitHub are not permanent (but with Zenodo they could be Require tagged releases to Zenodo #14)

Contributions are need; it would be great if you could label ideas with a hat color, too.

@Midnighter
Copy link
Collaborator

You can create additional artefacts that can become part of a release. I could envision each release (tag that is also on Zenodo then) to provide the following separately:

  • Stand alone model as SBML (plus whatever formats are desired)
  • COMBINE archive that wraps the model, key data (such as growth and essentiality), and instructions for reproducing key simulation results
  • A zip of the repository state at release

@mihai-sysbio
Copy link
Member

Green hat - possibilities (building on what @Midnighter described above):

  • presently, the standard requires the use of a model/ folder as the location for various model files, including SBML, but this could be expanded to include other files that would normally belong in a COMBINE archive, particularly SED-ML
  • when creating a new release on GitHub, a COMBINE archive could be supplied as an additional artifact, which would be automatically published on Zenodo

@mihai-sysbio
Copy link
Member

Looking at the contents of the COMBINE archive (section 3.3), Table 1 in the showcase and the example repository, the archive consists of:

  1. manifest.xml
    This file contains essentially a listing of the file tree with the file formats. standard-GEM imposes a requirement regarding the main directories, extensions and some file names. Adopting a similar manifest in standard-GEM would be redundant.
  2. authorship information
    In any git-based versioning system, this information is provided by author or committer, and is deeply embedded on platforms such as GitHub. Moreover, as models are curated over time, a list of authors/contributors would not be rich enough to be linked to actual contributions (commits).
  3. fixed file tree
    There is some overlap here, and we should aim to increase the compatibility if possible. The directories specified by COMBINE are:
    3.1. documentation/ : files that describe and document the model and/or experiment
    In standard-GEM, documentation is provided more closely with the element it documents, ie within data/ and code/ folders.
    3.2 model/ : files that encode and visualise the biological system
    Essentially the same approach here.
    3.3 experiment/ : files that encode the in silico setup of the experiment
    3.4 result/ : files that result from running the experiment

Like mentioned in the previous post, I think something should be done regarding 3.3 and 3.4. @yahanma has taken a similar approach by creating an analysis/ directory over at vna-GEM.

Also a follow-up on the idea of automatically creating COMBINE archives, it feels like work in this direction is already started through CombineArchiveWeb, where instead of uploading file by file, one could point directly to a repository that follows standard-GEM.

@mihai-sysbio
Copy link
Member

mihai-sysbio commented Jul 28, 2021

Following up on the CombineArchiveWeb idea, it looks like it is possible to create archives from a Git repository:


image
image


Here is what I think would need to be done in order to close the issue:

  • create an empty folder called analysis with a Readme saying that the folder is meant to contain experiments and results
  • add a reference (can) to .standard-GEM.md in the Releases section to encourage attaching a COMBINE archive to a release

@draeger what else would you recommend so this issue can be resolved?
Are there any thoughts from the watchers of this issue?

@draeger
Copy link
Collaborator Author

draeger commented Aug 17, 2021

I think, this is very nice. Have you tried it out? Possibly, a build script could also wrap a bunch of files in an archive and write the manifest file during a local execution. But a webservice can certainly do the same (note: it will require data transmission).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
modelFormat Requests for model file formats
Projects
Development

No branches or pull requests

4 participants