Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MS2PIP model descriptions #129

Merged
merged 1 commit into from
Oct 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion models/ms2pip/ms2pip_CID_TMT/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,11 +1,21 @@
description: |
This model was trained on observed spectrum intensities from 72,138 unique TMT-labeled peptides.
MS2 spectra were acquired ion the ion trap with CID fragmentation (trap-type CID). Raw rain/test
and evaluation data are available via PRIDE, with the identifiers PXD041002 and PXD005890,
respectively. Processed data is available at https://doi.org/10.5281/zenodo.7833635.

Predicted intensities will always assume TMT labeling, regardless of the input modification
state. Modifications on the input peptide are only considered for the MS2 peak m/z values.
Prediction accuracy for peptides with other modifications may vary and should be evaluated on a
case-by-case basis.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.

If you use predictions generated by this model please cite the following paper.


citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

Expand Down
15 changes: 14 additions & 1 deletion models/ms2pip/ms2pip_HCD2021/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
description: |
This model was trained on observed spectrum intensities from six projects, all using HCD
(beam-type CID) fragmentation and Orbitrap acquisition. These projects contain immunopeptides
(PXD012308, PXD006939, PXD009925, PXD000394, and PXD004894), and peptides from a chymotrypsin
digest (PXD010154). The model was evaluated on four distinct datasets with HLA-I immunopeptides
(PXD005231), HLA-II immunopeptides (PXD020011), chymotrypsin peptides (PXD010154), and
trypsin peptides (PXD008034), respectively.

The model can be applied to any peptide, regardless of digestion enzyme, with lengths between
7 and 40 amino acids. The modification state is not considered for intensity predictions, only
for the m/z values of the MS2 peaks. The model was trained on peptides with oxidation of
methionine and fixed or variable carbamidomethylation of cysteine. Prediction accuracy for
peptides with other modifications may vary and should be evaluated on a case-by-case basis.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.

If you use predictions generated by this model please cite the following paper.

citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

Expand Down
8 changes: 7 additions & 1 deletion models/ms2pip/ms2pip_Immuno_HCD/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,10 +1,16 @@
description: |
This model was trained on observed spectrum intensities from five projects, all using HCD
(beam-type CID) fragmentation and Orbitrap acquisition of immunopeptides (PXD012308, PXD006939,
PXD009925, and PXD000394). The model was evaluated on four distinct datasets with HLA-I
immunopeptides (PXD005231), HLA-II immunopeptides (PXD020011), chymotrypsin peptides (PXD010154),
and trypsin peptides (PXD008034), respectively.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.

If you use predictions generated by this model please cite the following paper.

citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

Expand Down
12 changes: 8 additions & 4 deletions models/ms2pip/ms2pip_TTOF5600/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,17 @@
description: |
Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
This model was trained on 215 713 unique peptides acquired on a TripleTOF 5600+ mass spectrometer
in beam-type CID mode (PXD000954). It was evaluated on 15 111 unique peptides from PXD001587.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
the following publication:
Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research doi:10.1093/nar/gkz299

If you use predictions generated by this model please cite the following paper.

citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

tag: "Intensity"
examples:
inputs:
Expand Down
16 changes: 13 additions & 3 deletions models/ms2pip/ms2pip_iTRAQphospho/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
description: |
Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
This model was trained on 183 383 unique peptides from the NIST "Human Orbitrap - HCD iTRAQ-4
Phospho" spectral library. It was evaluated on 9088 unique peptides from PXD001189.

Predicted intensities will always assume iTRAQ labeling and phosphorylations, regardless of the
input modification state. Modifications on the input peptide are only considered for the MS2 peak
m/z values. Prediction accuracy for peptides with other modifications may vary and should be
evaluated on a case-by-case basis.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
the following publication:
Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research doi:10.1093/nar/gkz299

If you use predictions generated by this model please cite the following paper.

citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

tag: "Intensity"
examples:
inputs:
Expand Down
26 changes: 24 additions & 2 deletions models/ms2pip/ms2pip_timsTOF2023/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,34 @@
description: |
Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
The peak intensity models were trained using timsTOF data from two different labs. This dataset
includes peptides from JY (trypsin and elastase digests) and HeLa (trypsin digests), as well as
HLA class I immunoprecipitation-enriched peptides from JY, HeLa, SK-MEL-37, and HL60 samples,
with multiple collision energy settings applied.

In total, 251,149 unique peptidoforms, considering sequence, charge, and modifications, were
used for model training, ensuring comprehensive coverage of various peptide types. For each
unique peptidoform, the highest-scoring PSM was retained for training, while 10,045 peptides
were set aside for evaluation purposes.

The data used for model training can be accessed through the following dataset identifiers. Data
from JY immunopeptidomics used for training can be accessed with the dataset identifiers
PXD043026 for ProteomeXchange and JPST002158 for jPOST. Data from the Carapito lab have been
deposited to the ProteomeXchange repositories PXD046535 for HL60 immunopeptidomics and PXD046543
for HeLa tryptic proteomics files.

This model can be applied to tryptic, elastase, and HLA class I immunopeptide spectra acquired
on timsTOF instruments.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
the following publication:
Gomez-Zepeda, D., Arnold-Schild, D., Beyrle, J., et al. (2024). Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS2Rescore with MS2PIP timsTOF fragmentation prediction model. Nature Communications, 15, 2288. https://doi.org/10.1038/s41467-024-46380-y

If you use predictions generated by this model please cite the following paper.

citation: |
Updated MS²PIP web server supports cutting-edge proteomics applications.
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

tag: "Intensity"
examples:
inputs:
Expand Down
40 changes: 26 additions & 14 deletions models/ms2pip/ms2pip_timsTOF2024/notes.yaml
Original file line number Diff line number Diff line change
@@ -1,27 +1,39 @@
description: |
Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
The training data for this model builds upon the set from the timsTOF 2023 model. It
includes trypsin, elastase, and class I immunopeptide data (PXD046535 and PXD040385), expanded
with class II immunopeptides from Hoenisch Gravel et al. (PXD038782). A total of 505,289
highest-scoring peptidoforms were selected across all datasets, accounting for precursor charge
as part of the peptidoform. These peptidoforms were then divided into a training set
(480,024 peptidoforms) and a test set (25,265 peptidoforms) using a stratified split based on
dataset origin to ensure balanced representation of class I, class II, trypsin-digested, and
elastase-digested peptides in both subsets. All processed data is publicly available on
Zenodo at <a href="https://doi.org/10.5281/zenodo.11277943">10.5281/zenodo.11277943</a>.

Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
the following publication:
Declercq, A., Devreese, R., Scheid, J., Jachmann, C., Van Den Bossche, T., Preikschat, A., Gomez-Zepeda, D., Rijal, J. B., Hirschler, A., Krieger, J. R., Srikumar, T., Rosenberger, G., Trede, D., Carapito, C., Tenzer, S., Walz, J. S., Degroeve, S., Bouwmeester, R., Martens, L., & Gabriels, R. (2024). TIMS2Rescore: A DDA-PASEF optimized data-driven rescoring pipeline based on MS2Rescore. bioRxiv. https://doi.org/10.1101/2024.05.29.596400

If you use predictions generated by this model please cite the following paper.

citation: |
TIMS2Rescore: A DDA-PASEF optimized data-driven rescoring pipeline based on MS2Rescore
Arthur Declercq, Robbe Devreese, Jonas Scheid, et al.
bioRxiv 2024.05.29.596400; doi: https://doi.org/10.1101/2024.05.29.596400
Updated MS²PIP web server supports cutting-edge proteomics applications.
Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
Nucleic Acids Research doi:10.1093/nar/gkad335

tag: "Intensity"
examples:
inputs:
[
{
"name": "peptide_sequences",
"httpdtype": "BYTES",
"data": '["ACDEK", "AAAAAAAAAAAAA"]',
"shape": "[2,1]"
"name": "peptide_sequences",
"httpdtype": "BYTES",
"data": '["ACDEK", "AAAAAAAAAAAAA"]',
"shape": "[2,1]",
},
{
"name": "precursor_charges",
"httpdtype": "INT32",
"data": '[2, 3]',
"shape": "[2,1]"
}
"name": "precursor_charges",
"httpdtype": "INT32",
"data": "[2, 3]",
"shape": "[2,1]",
},
]
Loading