Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

mmattano · 2022-10-11T16:15:50Z

MS2 spectra matching for metabolite identification

Abstract

One major open topic in untargeted metabolomics is identifying unknown compounds from mass spectra. As MS1 comparisons can be ambiguous (especially for small molecules), we need to look at MS2 spectra, and compare them to public MS2 databases, to differentiate compounds in the same mass range.
Currently, the best performing methods for compound identification are GNPS and Sirius. They provide a user with a list of potential compounds, but in some cases the uncertainty is very high or multiple candidates are suggested, making the downstream analysis labor intensive. GNPS improves their predictions by using molecular networks and taking biological information into account. Sirius improves their predictions by comparing structural similarity of the compounds.
We would like to set up a novel system, with modular parts that can be tested separately. Each aspect of the pipeline can be improved/modified individually, and multiple methods can be combined as an ensemble. In doing so, this can also serve as a benchmark of existing scoring and matching functions and a testing playground for novel ideas.

Project Plan

The general purpose is to have a(n automated) workflow for MS2 spectra matching that does not just rely on cosine similarity scoring. Subsequently, we would like to

Automatically clean MS2 spectra
- Pick the MS2-spectrum with the highest precursor ion intensity or with the highest total ion current for each LC-MS feature;
- remove peaks with relative intensities below 0.01 compared to the highest intensity peak
- remove the peaks outside set m/z window
- remove the peaks outside set intensity window
- (optional) calculate neutral losses within each MS2 spectrum and compare across multiple spectra to identify similar functional groups (that might have been lost)
Optimize pipeline. Additional steps (e.g., QC-points, interactive visualization)
Optimize scoring. As a starting point, commonly used scoring functions can be compared, e.g., a normalized dot product (cosine score), spec2Vec (inspired by Word2Vec) – structure similarity, …
Ranking of results of identifications. We would like to not just rely on the score as this can be misleading. One way would be to include information that helps us exclude/lower the weight on unlikely candidates.

Technical Details

Main language: Python

Packages:
- matchms (cosine score, modified cosine score)
- spec2vec
  Workflow includes: GNPS, SIRIUS
  GitHub

Contact Information

Members of the metabolomics research group lead by Thomas Moritz at the NNF Center for Basic Metabolic Research, Faculty of Health Research, University of Copenhagen

Matthias Mattanovich ([email protected])
Muyao Xi ([email protected])
Lawrence Egyir ([email protected])

tobiasko · 2022-11-08T08:42:14Z

Dear @mmattano,

I am happy to inform you that your proposal has been selected for the DevMeeting2023! Participants will decide which hackathon to join after the pitch on Monday.

Best,
Tobi

tobiasko · 2022-11-08T08:44:27Z

Maybe Muyao and Lawrence could leave a short comment here so they also become participant of this issue! THX!

lawtrea · 2022-11-08T20:52:01Z

Thank you @tobiasko

MuyaoXi9271 · 2022-11-09T08:26:47Z

Thanks @tobiasko
Please add me in👍

tobiasko · 2022-12-21T15:02:21Z

Hello everyone,

I just created a slack workspace for the DevMeeting and a channel named metabolomics for this hack. You should receive an invite to join by email.

Best,
Tobi

mmattano · 2023-01-25T16:14:07Z

Summary paragraph

During the metabolomics related hackathon, spectral similarity scoring was explored. In order to identify a metabolite from an MS1 or MS2 spectrum, different scores are applied to match the spectrum in question to a database entry or, more commonly, an in-house library. Currently in the field, the cosine similarity score is most frequently used. Here, we set up a pipeline to compare multiple different ways to score spectral similarity and an array of variations or their respective input parameters. The data that was specifically prepared for the hackathon also allowed for statistics on false positives, false negatives, etc. Furthermore, we set up systems to test the robustness of these scores to intensity perturbations, which is very common when dealing with biological samples, and tested a possible correlation between structural- and spectral similarity.

tobiasko added the selected label Nov 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

mmattano commented Oct 11, 2022

tobiasko commented Nov 8, 2022

tobiasko commented Nov 8, 2022

lawtrea commented Nov 8, 2022

MuyaoXi9271 commented Nov 9, 2022

tobiasko commented Dec 21, 2022

mmattano commented Jan 25, 2023

Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

Metabolomics hackathon: MS2 spectra matching for metabolite identification #13

Comments

mmattano commented Oct 11, 2022

MS2 spectra matching for metabolite identification

Abstract

Project Plan

Technical Details

Contact Information

tobiasko commented Nov 8, 2022

tobiasko commented Nov 8, 2022

lawtrea commented Nov 8, 2022

MuyaoXi9271 commented Nov 9, 2022

tobiasko commented Dec 21, 2022

mmattano commented Jan 25, 2023