wilhelm-lab · LLautenbacher · Oct 21, 2024 · Oct 15, 2024
diff --git a/models/ms2pip/ms2pip_CID_TMT/notes.yaml b/models/ms2pip/ms2pip_CID_TMT/notes.yaml
@@ -1,11 +1,21 @@
 description: |
+  This model was trained on observed spectrum intensities from 72,138 unique TMT-labeled peptides.
+  MS2 spectra were acquired ion the ion trap with CID fragmentation (trap-type CID). Raw rain/test
+  and evaluation data are available via PRIDE, with the identifiers PXD041002 and PXD005890,
+  respectively. Processed data is available at https://doi.org/10.5281/zenodo.7833635.
+
+  Predicted intensities will always assume TMT labeling, regardless of the input modification
+  state. Modifications on the input peptide are only considered for the MS2 peak m/z values.
+  Prediction accuracy for peptides with other modifications may vary and should be evaluated on a
+  case-by-case basis.
+
   Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
 
   If you use predictions generated by this model please cite the following paper.
 
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
 

diff --git a/models/ms2pip/ms2pip_HCD2021/notes.yaml b/models/ms2pip/ms2pip_HCD2021/notes.yaml
@@ -1,10 +1,23 @@
 description: |
+  This model was trained on observed spectrum intensities from six projects, all using HCD
+  (beam-type CID) fragmentation and Orbitrap acquisition. These projects contain immunopeptides
+  (PXD012308, PXD006939, PXD009925, PXD000394, and PXD004894), and peptides from a chymotrypsin
+  digest (PXD010154). The model was evaluated on four distinct datasets with HLA-I immunopeptides
+  (PXD005231), HLA-II immunopeptides (PXD020011), chymotrypsin peptides (PXD010154), and
+  trypsin peptides (PXD008034), respectively.
+
+  The model can be applied to any peptide, regardless of digestion enzyme, with lengths between
+  7 and 40 amino acids. The modification state is not considered for intensity predictions, only
+  for the m/z values of the MS2 peaks. The model was trained on peptides with oxidation of
+  methionine and fixed or variable carbamidomethylation of cysteine. Prediction accuracy for
+  peptides with other modifications may vary and should be evaluated on a case-by-case basis.
+
   Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
 
   If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
 

diff --git a/models/ms2pip/ms2pip_Immuno_HCD/notes.yaml b/models/ms2pip/ms2pip_Immuno_HCD/notes.yaml
@@ -1,10 +1,16 @@
 description: |
+  This model was trained on observed spectrum intensities from five projects, all using HCD
+  (beam-type CID) fragmentation and Orbitrap acquisition of immunopeptides (PXD012308, PXD006939,
+  PXD009925, and PXD000394). The model was evaluated on four distinct datasets with HLA-I
+  immunopeptides (PXD005231), HLA-II immunopeptides (PXD020011), chymotrypsin peptides (PXD010154),
+  and trypsin peptides (PXD008034), respectively.
+
   Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
 
   If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
 

diff --git a/models/ms2pip/ms2pip_TTOF5600/notes.yaml b/models/ms2pip/ms2pip_TTOF5600/notes.yaml
@@ -1,13 +1,17 @@
 description: |
-  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
+  This model was trained on 215 713 unique peptides acquired on a TripleTOF 5600+ mass spectrometer
+  in beam-type CID mode (PXD000954). It was evaluated on 15 111 unique peptides from PXD001587.
+
+  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
+  the following publication:
+  Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research doi:10.1093/nar/gkz299
 
-  If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
-  
+
 tag: "Intensity"
 examples:
   inputs:

diff --git a/models/ms2pip/ms2pip_iTRAQphospho/notes.yaml b/models/ms2pip/ms2pip_iTRAQphospho/notes.yaml
@@ -1,13 +1,23 @@
 description: |
-  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
+  This model was trained on 183 383 unique peptides from the NIST "Human Orbitrap - HCD iTRAQ-4
+  Phospho" spectral library. It was evaluated on 9088 unique peptides from PXD001189.
+
+  Predicted intensities will always assume iTRAQ labeling and phosphorylations, regardless of the
+  input modification state. Modifications on the input peptide are only considered for the MS2 peak
+  m/z values. Prediction accuracy for peptides with other modifications may vary and should be
+  evaluated on a case-by-case basis.
+
+  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
+  the following publication:
+  Gabriels, R., Martens, L., & Degroeve, S. (2019). Updated MS²PIP web server delivers fast and accurate MS2 peak intensity prediction for multiple fragmentation methods, instruments and labeling techniques. Nucleic Acids Research doi:10.1093/nar/gkz299
 
   If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
-  
+
 tag: "Intensity"
 examples:
   inputs:

diff --git a/models/ms2pip/ms2pip_timsTOF2023/notes.yaml b/models/ms2pip/ms2pip_timsTOF2023/notes.yaml
@@ -1,12 +1,34 @@
 description: |
-  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
+  The peak intensity models were trained using timsTOF data from two different labs. This dataset
+  includes peptides from JY (trypsin and elastase digests) and HeLa (trypsin digests), as well as
+  HLA class I immunoprecipitation-enriched peptides from JY, HeLa, SK-MEL-37, and HL60 samples,
+  with multiple collision energy settings applied.
+
+  In total, 251,149 unique peptidoforms, considering sequence, charge, and modifications, were
+  used for model training, ensuring comprehensive coverage of various peptide types. For each
+  unique peptidoform, the highest-scoring PSM was retained for training, while 10,045 peptides
+  were set aside for evaluation purposes.
+
+  The data used for model training can be accessed through the following dataset identifiers. Data
+  from JY immunopeptidomics used for training can be accessed with the dataset identifiers
+  PXD043026 for ProteomeXchange and JPST002158 for jPOST. Data from the Carapito lab have been
+  deposited to the ProteomeXchange repositories PXD046535 for HL60 immunopeptidomics and PXD046543
+  for HeLa tryptic proteomics files.
+
+  This model can be applied to tryptic, elastase, and HLA class I immunopeptide spectra acquired
+  on timsTOF instruments.
+
+  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
+  the following publication:
+  Gomez-Zepeda, D., Arnold-Schild, D., Beyrle, J., et al. (2024). Thunder-DDA-PASEF enables high-coverage immunopeptidomics and is boosted by MS2Rescore with MS2PIP timsTOF fragmentation prediction model. Nature Communications, 15, 2288. https://doi.org/10.1038/s41467-024-46380-y
 
   If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  Updated MS²PIP web server supports cutting-edge proteomics applications. 
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
   Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
   Nucleic Acids Research doi:10.1093/nar/gkad335
+
 tag: "Intensity"
 examples:
   inputs:

diff --git a/models/ms2pip/ms2pip_timsTOF2024/notes.yaml b/models/ms2pip/ms2pip_timsTOF2024/notes.yaml
@@ -1,27 +1,39 @@
 description: |
-  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a>.
+  The training data for this model builds upon the set from the timsTOF 2023 model. It
+  includes  trypsin, elastase, and class I immunopeptide data (PXD046535 and PXD040385), expanded
+  with class II immunopeptides from Hoenisch Gravel et al. (PXD038782). A total of 505,289
+  highest-scoring peptidoforms were selected across all datasets, accounting for precursor charge
+  as part of the peptidoform. These peptidoforms were then divided into a training set
+  (480,024 peptidoforms) and a test set (25,265 peptidoforms) using a stratified split based on
+  dataset origin to ensure balanced representation of class I, class II, trypsin-digested, and
+  elastase-digested peptides in both subsets. All processed data is publicly available on
+  Zenodo at <a href="https://doi.org/10.5281/zenodo.11277943">10.5281/zenodo.11277943</a>.
+
+  Find out more about this model <a href="https://github.com/compomics/ms2pip">here</a> and in
+  the following publication:
+  Declercq, A., Devreese, R., Scheid, J., Jachmann, C., Van Den Bossche, T., Preikschat, A., Gomez-Zepeda, D., Rijal, J. B., Hirschler, A., Krieger, J. R., Srikumar, T., Rosenberger, G., Trede, D., Carapito, C., Tenzer, S., Walz, J. S., Degroeve, S., Bouwmeester, R., Martens, L., & Gabriels, R. (2024). TIMS2Rescore: A DDA-PASEF optimized data-driven rescoring pipeline based on MS2Rescore. bioRxiv. https://doi.org/10.1101/2024.05.29.596400
 
   If you use predictions generated by this model please cite the following paper.
 
 citation: |
-  TIMS2Rescore: A DDA-PASEF optimized data-driven rescoring pipeline based on MS2Rescore 
-  Arthur Declercq, Robbe Devreese, Jonas Scheid, et al.
-  bioRxiv 2024.05.29.596400; doi: https://doi.org/10.1101/2024.05.29.596400
-  
+  Updated MS²PIP web server supports cutting-edge proteomics applications.
+  Declercq, A., Bouwmeester, R., Chiva, C., Sabidó, E., Hirschler, et al.
+  Nucleic Acids Research doi:10.1093/nar/gkad335
+
 tag: "Intensity"
 examples:
   inputs:
     [
       {
-          "name": "peptide_sequences",
-          "httpdtype": "BYTES",
-          "data": '["ACDEK", "AAAAAAAAAAAAA"]',
-          "shape": "[2,1]"
+        "name": "peptide_sequences",
+        "httpdtype": "BYTES",
+        "data": '["ACDEK", "AAAAAAAAAAAAA"]',
+        "shape": "[2,1]",
       },
       {
-          "name": "precursor_charges",
-          "httpdtype": "INT32",
-          "data": '[2, 3]',
-          "shape": "[2,1]"
-      }
+        "name": "precursor_charges",
+        "httpdtype": "INT32",
+        "data": "[2, 3]",
+        "shape": "[2,1]",
+      },
     ]