Skip to content

Commit

Permalink
Add documentation.
Browse files Browse the repository at this point in the history
  • Loading branch information
LLautenbacher committed Oct 21, 2024
1 parent 5c02d14 commit a8907c9
Show file tree
Hide file tree
Showing 2 changed files with 68 additions and 0 deletions.
34 changes: 34 additions & 0 deletions models/3dmolms/3dmolms_orbitrap/notes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
description: |
The 3DMolMS model is a deep neural network that predicts tandem mass spectrometry (MS/MS) spectra from the 3D conformations of chemical compounds. The goal of the model is to improve compound identification in untargeted metabolomics.
The model was trained using approximately 70,000 MS/MS spectra from the Agilent Personal Compound Database and Library (PCDL) and the NIST20 spectral library. These libraries were chosen because they contain spectra from a large number of compounds acquired using similar instruments and settings. The model was then tested on spectra from the same libraries, as well as on an independent test set from the MassBank of North America (MoNA) library. The MoNA library contains spectra acquired from a variety of Q-TOF instruments, allowing for evaluation of the model’s generalizability to different instruments.
The 3DMolMS model consists of an encoder and a decoder. The encoder takes as input a point set representation of the 3D conformation of a compound. This point set is generated by using the ETKDG algorithm to generate a 3D conformer of the compound and then encoding each atom in the conformer as a point. Each point is represented by a vector of 21 dimensions, which includes the x, y, and z coordinates of the atom, as well as a number of other attributes, such as the atom type, the number of neighbors, and the atomic mass. The encoder then uses six 3DMolConv-based hidden layers to extract features from the input point set. The decoder takes the output of the encoder and uses five fully connected layers to predict the MS/MS spectrum of the compound.
The performance of the 3DMolMS model was evaluated using the cosine similarity between the predicted and experimental spectra. The model achieved high cosine similarities for both the positive and negative ion modes, indicating that it can accurately predict MS/MS spectra. Notably, the model outperformed existing spectra prediction algorithms, including CFM-ID, NEIMS, and MassFormer. In addition, the molecular representation learned by 3DMolMS was successfully transferred to predict other chemical properties, such as the elution time in liquid chromatography and the collision cross section in ion mobility spectrometry.
citation: |
3DMolMS: prediction of tandem mass spectra from 3D molecular conformations,
Yuhui Hong, Sujun Li, Christopher J Welch, Shane Tichy, Yuzhen Ye, Haixu Tang,
Bioinformatics, Volume 39, Issue 6, June 2023, btad354, https://doi.org/10.1093/bioinformatics/btad354
tag: "Metabolomics fragment intenstiy"
examples:
inputs:
[
{
"name": "smiles",
"httpdtype": "BYTES",
"data": '["CN1C=NC2=C1C(=O)N(C(=O)N2C)C"]',
"shape": "[1,1]"
},
{
"name": "precursor_types",
"httpdtype": "BYTES",
"data": '["[M+H]+"]',
"shape": "[1,1]"
},
{
"name": "collision_energies",
"httpdtype": "FP32",
"data": '[20]',
"shape": "[1,1]"
}
]
34 changes: 34 additions & 0 deletions models/3dmolms/3dmolms_qtof/notes.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
description: |
The 3DMolMS model is a deep neural network that predicts tandem mass spectrometry (MS/MS) spectra from the 3D conformations of chemical compounds. The goal of the model is to improve compound identification in untargeted metabolomics.
The model was trained using approximately 70,000 MS/MS spectra from the Agilent Personal Compound Database and Library (PCDL) and the NIST20 spectral library. These libraries were chosen because they contain spectra from a large number of compounds acquired using similar instruments and settings. The model was then tested on spectra from the same libraries, as well as on an independent test set from the MassBank of North America (MoNA) library. The MoNA library contains spectra acquired from a variety of Q-TOF instruments, allowing for evaluation of the model’s generalizability to different instruments.
The 3DMolMS model consists of an encoder and a decoder. The encoder takes as input a point set representation of the 3D conformation of a compound. This point set is generated by using the ETKDG algorithm to generate a 3D conformer of the compound and then encoding each atom in the conformer as a point. Each point is represented by a vector of 21 dimensions, which includes the x, y, and z coordinates of the atom, as well as a number of other attributes, such as the atom type, the number of neighbors, and the atomic mass. The encoder then uses six 3DMolConv-based hidden layers to extract features from the input point set. The decoder takes the output of the encoder and uses five fully connected layers to predict the MS/MS spectrum of the compound.
The performance of the 3DMolMS model was evaluated using the cosine similarity between the predicted and experimental spectra. The model achieved high cosine similarities for both the positive and negative ion modes, indicating that it can accurately predict MS/MS spectra. Notably, the model outperformed existing spectra prediction algorithms, including CFM-ID, NEIMS, and MassFormer. In addition, the molecular representation learned by 3DMolMS was successfully transferred to predict other chemical properties, such as the elution time in liquid chromatography and the collision cross section in ion mobility spectrometry.
citation: |
3DMolMS: prediction of tandem mass spectra from 3D molecular conformations,
Yuhui Hong, Sujun Li, Christopher J Welch, Shane Tichy, Yuzhen Ye, Haixu Tang,
Bioinformatics, Volume 39, Issue 6, June 2023, btad354, https://doi.org/10.1093/bioinformatics/btad354
tag: "Metabolomics fragment intenstiy"
examples:
inputs:
[
{
"name": "smiles",
"httpdtype": "BYTES",
"data": '["CN1C=NC2=C1C(=O)N(C(=O)N2C)C"]',
"shape": "[1,1]"
},
{
"name": "precursor_types",
"httpdtype": "BYTES",
"data": '["[M+H]+"]',
"shape": "[1,1]"
},
{
"name": "collision_energies",
"httpdtype": "FP32",
"data": '[20]',
"shape": "[1,1]"
}
]

0 comments on commit a8907c9

Please sign in to comment.