-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #46 from wilhelm-lab/prosit_2023_intensity_tof
Add Prosit 2023 intensity TOF model.
- Loading branch information
Showing
19 changed files
with
651 additions
and
71 deletions.
There are no files selected for viewing
Empty file.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
max_batch_size: 1000 | ||
platform: "ensemble" | ||
input [ | ||
{ | ||
name: 'peptide_sequences', | ||
data_type: TYPE_STRING, | ||
dims: [-1] | ||
}, | ||
{ | ||
name: 'precursor_charges', | ||
data_type: TYPE_INT32, | ||
dims: [1], | ||
}, | ||
{ | ||
name: 'collision_energies', | ||
data_type: TYPE_FP32, | ||
dims: [1], | ||
} | ||
] | ||
output [ | ||
{ | ||
name: 'intensities', | ||
data_type: TYPE_FP32, | ||
dims: [174] | ||
}, | ||
{ | ||
name: 'mz', | ||
data_type: TYPE_FP32, | ||
dims: [174] | ||
}, | ||
{ | ||
name: 'annotation', | ||
data_type: TYPE_STRING, | ||
dims: [174] | ||
} | ||
] | ||
|
||
ensemble_scheduling { | ||
step [ | ||
{ | ||
model_name: "Prosit_Preprocess_charge" | ||
model_version: 1 | ||
input_map { | ||
key: "precursor_charges" | ||
value: "precursor_charges" | ||
}, | ||
output_map { | ||
key: "precursor_charges_in:0" | ||
value: "precursor_charges_in_preprocessed:0" | ||
} | ||
}, | ||
{ | ||
model_name: "Prosit_Preprocess_peptide" | ||
model_version: 1 | ||
input_map { | ||
key: "peptide_sequences" | ||
value: "peptide_sequences" | ||
}, | ||
output_map { | ||
key: "peptides_in:0" | ||
value: "peptides_in:0" | ||
} | ||
}, | ||
{ | ||
model_name: "Prosit_Preprocess_collision_energy" | ||
model_version: 1 | ||
input_map { | ||
key: "raw_collision_energy" | ||
value: "collision_energies" | ||
}, | ||
output_map { | ||
key: "norm_collision_energy" | ||
value: "norm_collision_energy" | ||
} | ||
}, | ||
{ | ||
model_name: "Prosit_2023_intensity_TOF_core" | ||
model_version: 1 | ||
input_map { | ||
key: "peptides_in" | ||
value: "peptides_in:0" | ||
}, | ||
input_map { | ||
key: "collision_energy_in" | ||
value: "norm_collision_energy" | ||
}, | ||
input_map { | ||
key: "precursor_charge_in" | ||
value: "precursor_charges_in_preprocessed:0" | ||
} | ||
output_map { | ||
key: "out" | ||
value: "out/Reshape:0" | ||
} | ||
}, | ||
{ | ||
model_name: "Prosit_2019_intensity_postprocess" | ||
model_version: 1 | ||
input_map { | ||
key: "peptides_in:0" | ||
value: "peptide_sequences" | ||
}, | ||
input_map{ | ||
key: "precursor_charges_in:0" | ||
value: "precursor_charges_in_preprocessed:0" | ||
} | ||
input_map{ | ||
key: "peaks_in:0", | ||
value: "out/Reshape:0" | ||
} | ||
output_map { | ||
key: "intensities" | ||
value: "intensities" | ||
} | ||
output_map { | ||
key: "mz" | ||
value: "mz" | ||
} | ||
}, | ||
{ | ||
model_name: "Prosit_Helper_annotation" | ||
model_version: 1 | ||
input_map { | ||
key: "precursor_charges" | ||
value: "precursor_charges" | ||
}, | ||
output_map { | ||
key: "annotation" | ||
value: "annotation" | ||
} | ||
} | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
description: | | ||
The HCD Prosit 2020 model was fine-tuned using 277,781 MS/MS spectra of both tryptic and non-tryptic synthesized peptides measured on a timsTOF Pro. The model architecture remained unchanged. The data was split into three distinct sets with each peptide and subsequence of a peptide only included in one of the three: training (80%, 153,809 tryptic PSMs and 77,577 non-tryptic PSMs), validation (10%, 16,483 tryptic PSMs and 7,778 non-tryptic PSMs), and test (10%, 14,262 tryptic PSMs and 7,872 non-tryptic PSMs). | ||
For this project, over 300,000 non-tryptic peptides from the ProteomeTools project were measured. Our measurements encompassed a range of collision energies from 20.81 EV to 69.77 eV. The data was analyzed using MaxQuant version 2.1.2.0 with carbamidomethylated cysteine specified as a fixed modification and methionine oxidation as a variable modification. | ||
The HCD Prosit 2020 model was originally trained on approximately 30 million MS/MS spectra, consisting of 9 million MS/MS spectra of non-tryptic peptides and 21 million previously published tryptic MS/MS spectra. The comparison between the HCD Prosit 2020 model and the newly developed TOF Prosit 2023 model reveals a substantial improvement in normalized spectral contrast angle (SA) between predicted and experimental timsTOF MS/MS spectra for both non-tryptic and tryptic peptides. The TOF Prosit 2023 model achieved a SA ≥ 0.9 for 26.3% of non-tryptic spectra (compared to 2.4% with HCD Prosit 2020) and 42.1% of tryptic spectra (compared to 0.2% with HCD Prosit 2020). | ||
The TOF Prosit 2023 model demonstrates consistent performance across different precursor charges, peptide lengths, and collision energies, with minimal bias towards C- and N-terminal amino acids. Both the tryptic and non-tryptic timsTOF data are available via PRIDE, with the identifiers PXD019086 and PXD043844, respectively. | ||
citation: | | ||
Fragment ion intensity prediction improves the identification rate of non-tryptic peptides in TimsTOF | ||
Charlotte Adams, Wassim Gabriel, Kris Laukens, Mathias Wilhelm, Wout Bittremieux, Kurt Boonen | ||
bioRxiv 2023.07.17.549401; doi: https://doi.org/10.1101/2023.07.17.549401 | ||
tag: "Intensity" | ||
tag_url: "https://www.proteomicsdb.org/" | ||
examples: | ||
inputs: | ||
[ | ||
{ | ||
"name": "peptide_sequences", | ||
"httpdtype": "BYTES", | ||
"shape": "[2,1]", | ||
"data": '["AAAAAKAK", "AAAAAKAK"]' | ||
}, | ||
{ | ||
"name": "precursor_charges", | ||
"httpdtype": "INT32", | ||
"shape": "[2,1]", | ||
"data": '[1,2]' | ||
}, | ||
{ | ||
"name": "collision_energies", | ||
"httpdtype": "FP32", | ||
"shape": "[2,1]", | ||
"data": '[25, 25]' | ||
} | ||
] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
https://zenodo.org/record/8211811/files/model.savedmodel.zip?download=1 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
max_batch_size: 1000 |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,64 @@ | ||
from test.server_config import SERVER_GRPC, SERVER_HTTP | ||
import tritonclient.grpc as grpcclient | ||
import numpy as np | ||
from pathlib import Path | ||
import requests | ||
|
||
# To ensure MODEL_NAME == test_<filename>.py | ||
MODEL_NAME = Path(__file__).stem.replace("test_", "") | ||
|
||
|
||
def test_available_http(): | ||
req = requests.get(f"{SERVER_HTTP}/v2/models/{MODEL_NAME}", timeout=1) | ||
assert req.status_code == 200 | ||
|
||
|
||
def test_available_grpc(): | ||
triton_client = grpcclient.InferenceServerClient(url=SERVER_GRPC) | ||
assert triton_client.is_model_ready(MODEL_NAME) | ||
|
||
|
||
def test_inference(): | ||
SEQUENCES = np.load( | ||
"test/Prosit/arr_Prosit_2023_intensity_TOF_seq.npy", allow_pickle=True | ||
) | ||
charge = np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_charge.npy") | ||
ces = np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_ce.npy") | ||
|
||
triton_client = grpcclient.InferenceServerClient(url=SERVER_GRPC) | ||
|
||
in_pep_seq = grpcclient.InferInput("peptide_sequences", SEQUENCES.shape, "BYTES") | ||
in_pep_seq.set_data_from_numpy(SEQUENCES) | ||
|
||
in_charge = grpcclient.InferInput("precursor_charges", charge.shape, "INT32") | ||
in_charge.set_data_from_numpy(charge) | ||
|
||
in_ces = grpcclient.InferInput("collision_energies", ces.shape, "FP32") | ||
in_ces.set_data_from_numpy(ces) | ||
|
||
result = triton_client.infer( | ||
MODEL_NAME, | ||
inputs=[in_pep_seq, in_charge, in_ces], | ||
outputs=[ | ||
grpcclient.InferRequestedOutput("intensities"), | ||
grpcclient.InferRequestedOutput("mz"), | ||
grpcclient.InferRequestedOutput("annotation"), | ||
], | ||
) | ||
|
||
intensities = result.as_numpy("intensities") | ||
fragmentmz = result.as_numpy("mz") | ||
annotation = result.as_numpy("annotation") | ||
|
||
assert intensities.shape == (SEQUENCES.shape[0], 174) | ||
assert fragmentmz.shape == (SEQUENCES.shape[0], 174) | ||
assert annotation.shape == (SEQUENCES.shape[0], 174) | ||
|
||
# Assert intensities consistent | ||
assert np.allclose( | ||
intensities, | ||
np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_int.npy"), | ||
rtol=0, | ||
atol=1e-5, | ||
equal_nan=True, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
from test.server_config import SERVER_GRPC, SERVER_HTTP | ||
import tritonclient.grpc as grpcclient | ||
import numpy as np | ||
from pathlib import Path | ||
import requests | ||
|
||
# To ensure MODEL_NAME == test_<filename>.py | ||
MODEL_NAME = Path(__file__).stem.replace("test_", "") | ||
|
||
|
||
def test_available_http(): | ||
req = requests.get(f"{SERVER_HTTP}/v2/models/{MODEL_NAME}", timeout=1) | ||
assert req.status_code == 200 | ||
|
||
|
||
def test_available_grpc(): | ||
triton_client = grpcclient.InferenceServerClient(url=SERVER_GRPC) | ||
assert triton_client.is_model_ready(MODEL_NAME) | ||
|
||
|
||
def test_inference(): | ||
seq = np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_seq_encoding.npy") | ||
charge = np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_charge_onehot.npy") | ||
ces = np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_ce_norm.npy") | ||
|
||
triton_client = grpcclient.InferenceServerClient(url=SERVER_GRPC) | ||
|
||
in_pep_seq = grpcclient.InferInput("peptides_in", seq.shape, "INT32") | ||
in_pep_seq.set_data_from_numpy(seq) | ||
|
||
in_charge = grpcclient.InferInput("precursor_charge_in", charge.shape, "FP32") | ||
in_charge.set_data_from_numpy(charge) | ||
|
||
in_ces = grpcclient.InferInput("collision_energy_in", ces.shape, "FP32") | ||
in_ces.set_data_from_numpy(ces) | ||
|
||
result = triton_client.infer( | ||
MODEL_NAME, | ||
inputs=[in_pep_seq, in_charge, in_ces], | ||
outputs=[ | ||
grpcclient.InferRequestedOutput("out"), | ||
], | ||
) | ||
|
||
intensities = result.as_numpy("out") | ||
|
||
assert np.allclose( | ||
intensities, | ||
np.load("test/Prosit/arr_Prosit_2023_intensity_TOF_int_raw.npy"), | ||
rtol=0, | ||
atol=1e-4, | ||
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.