diff --git a/docs/tutorials/raw_readers.ipynb b/docs/tutorials/raw_readers.ipynb new file mode 100644 index 0000000..f33d9a2 --- /dev/null +++ b/docs/tutorials/raw_readers.ipynb @@ -0,0 +1,1482 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Built-in Raw data readers" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "AlphaRaw supports directly access Thermo's Raw data and Sciex's Wiff data by using PythonNet. PythonNet requires mono to be installed if the os is MacOS or Linux. See installation section of alpharaw (https://github.com/mannlabs/alpharaw). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Thermo Raw\n", + "\n", + "`alpharaw.thermo.ThermoRawData` contains all functionalities to load the Themro's Raw data. To enable fast data loading, alpharaw enables multiprocessing when `process_count` > 1. This reader can load different kinds of spectrum information into columns of `spectrum_df`. By default, the columns are:\n", + "\n", + "- `spec_idx`: the index of a spectrum in the raw file, it starts from zero. Its value is the `scan number - 1`.\n", + "- `peak_start_idx`: the start row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.\n", + "- `peak_stop_idx`: the stop row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.\n", + "- `rt`: retention time in minutes. We will use `rt_sec` for retention time in seconds in alphaX ecosystem.\n", + "- `precursor_mz`: the precursor m/z of the given MS2 scans. For an MS1 scan, the value is always -1. For DIA MS2, the default value will be the isolation center of the MS2. And for DDA MS2, `precursor_mz` may refer to the mono-isotope m/z of the precursor when `precursor_charge` is not 0, otherwise isolation center.\n", + "- `precursor_charge`: For DIA, this value is always 0. For DDA, it can be nonzero when the mono-isotope m/z is determined.\n", + "- `isolation_lower_mz`: the lower (or left) m/z boundary of the isolation window.\n", + "- `isolation_upper_mz`: the upper (or right) m/z boundary of the isolation window.\n", + "- `ms_level`: MS1, MS2, ... it starts from one.\n", + "- `nce`: normalized collision energy designed by Thermo.\n", + "\n", + "There are also some optional spectrum columns (auxiliary_item) that can be loaded into the `spectrum_df`:\n", + "\n", + "- `injection_time`: `Ion Injection Time (ms)` in the scan header.\n", + "- `cv`: source fragmentation CV???\n", + "- `max_ion_time`: `Max. Ion Time (ms)` in the scan header.\n", + "- `agc_target`: `AGC target` in the scan header.\n", + "- `energy_ev`: `HCD Energy V` in the scan header. This is the real EV of the collision energy.\n", + "- `injection_optics_settling_time`: `Injection Optics Settling Time (ms)` in the scan header.\n", + "- `funnel_rf_level`: `Funnel RF Level` in the scan header.\n", + "- `faims_cv`: `FAIMS CV` in the scan header.\n", + "- `activation`: activation type, for example, HCD, CID, ETD, ...\n", + "- `analyzer`: analyzer type, for example FTMS, Astral, ITMS, ...\n", + "- `activation_id`: Thermo's built-in IDs of `activation` types.\n", + "- `analyzer_id`: Thermo's built-in IDs of `analyzer` types." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + " | spec_idx | \n", + "peak_start_idx | \n", + "peak_stop_idx | \n", + "rt | \n", + "precursor_mz | \n", + "precursor_charge | \n", + "isolation_lower_mz | \n", + "isolation_upper_mz | \n", + "ms_level | \n", + "nce | \n", + "... | \n", + "max_ion_time | \n", + "agc_target | \n", + "energy_ev | \n", + "injection_optics_settling_time | \n", + "funnel_rf_level | \n", + "faims_cv | \n", + "activation | \n", + "analyzer | \n", + "activation_id | \n", + "analyzer_id | \n", + "
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | \n", + "0 | \n", + "0 | \n", + "254 | \n", + "0.002983 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
1 | \n", + "1 | \n", + "254 | \n", + "665 | \n", + "0.006392 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
2 | \n", + "2 | \n", + "665 | \n", + "1131 | \n", + "0.009808 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
3 | \n", + "3 | \n", + "1131 | \n", + "1663 | \n", + "0.013224 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
4 | \n", + "4 | \n", + "1663 | \n", + "2169 | \n", + "0.016641 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
3932 | \n", + "3932 | \n", + "1100271 | \n", + "1101512 | \n", + "5.994985 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
3933 | \n", + "3933 | \n", + "1101512 | \n", + "1101528 | \n", + "5.997334 | \n", + "362.537140 | \n", + "0 | \n", + "361.837140 | \n", + "363.237140 | \n", + "2 | \n", + "30.0 | \n", + "... | \n", + "28.0 | \n", + "100000 | \n", + "16.500000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "HCD | \n", + "FTMS | \n", + "5 | \n", + "4 | \n", + "
3934 | \n", + "3934 | \n", + "1101528 | \n", + "1102758 | \n", + "5.998843 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
3935 | \n", + "3935 | \n", + "1102758 | \n", + "1102771 | \n", + "6.001193 | \n", + "425.326569 | \n", + "0 | \n", + "424.626569 | \n", + "426.026569 | \n", + "2 | \n", + "30.0 | \n", + "... | \n", + "28.0 | \n", + "100000 | \n", + "18.690001 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "HCD | \n", + "FTMS | \n", + "5 | \n", + "4 | \n", + "
3936 | \n", + "3936 | \n", + "1102771 | \n", + "1103989 | \n", + "6.002935 | \n", + "-1.000000 | \n", + "0 | \n", + "-1.000000 | \n", + "-1.000000 | \n", + "1 | \n", + "0.0 | \n", + "... | \n", + "25.0 | \n", + "3000000 | \n", + "0.000000 | \n", + "0.0 | \n", + "40.0 | \n", + "0.0 | \n", + "MS1 | \n", + "FTMS | \n", + "255 | \n", + "4 | \n", + "
3937 rows × 22 columns
\n", + "\n", + " | spec_idx | \n", + "peak_start_idx | \n", + "peak_stop_idx | \n", + "rt | \n", + "ms_level | \n", + "precursor_mz | \n", + "precursor_charge | \n", + "isolation_lower_mz | \n", + "isolation_upper_mz | \n", + "nce | \n", + "
---|---|---|---|---|---|---|---|---|---|---|
0 | \n", + "0 | \n", + "0 | \n", + "100 | \n", + "0.000417 | \n", + "1 | \n", + "-1.00 | \n", + "0 | \n", + "-1.0 | \n", + "-1.0 | \n", + "0.0 | \n", + "
1 | \n", + "1 | \n", + "100 | \n", + "447 | \n", + "0.001133 | \n", + "2 | \n", + "403.55 | \n", + "0 | \n", + "399.5 | \n", + "407.6 | \n", + "19.0 | \n", + "
2 | \n", + "2 | \n", + "447 | \n", + "924 | \n", + "0.001383 | \n", + "2 | \n", + "411.25 | \n", + "0 | \n", + "406.6 | \n", + "415.9 | \n", + "20.0 | \n", + "
3 | \n", + "3 | \n", + "924 | \n", + "1286 | \n", + "0.001650 | \n", + "2 | \n", + "419.25 | \n", + "0 | \n", + "414.9 | \n", + "423.6 | \n", + "20.0 | \n", + "
4 | \n", + "4 | \n", + "1286 | \n", + "1943 | \n", + "0.001900 | \n", + "2 | \n", + "426.95 | \n", + "0 | \n", + "422.6 | \n", + "431.3 | \n", + "20.0 | \n", + "
... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "... | \n", + "
42232 | \n", + "42232 | \n", + "73839218 | \n", + "73841218 | \n", + "11.627550 | \n", + "2 | \n", + "715.15 | \n", + "0 | \n", + "711.1 | \n", + "719.2 | \n", + "34.0 | \n", + "
42233 | \n", + "42233 | \n", + "73841218 | \n", + "73843218 | \n", + "11.627817 | \n", + "2 | \n", + "722.30 | \n", + "0 | \n", + "718.2 | \n", + "726.4 | \n", + "35.0 | \n", + "
42234 | \n", + "42234 | \n", + "73843218 | \n", + "73845218 | \n", + "11.628067 | \n", + "2 | \n", + "729.70 | \n", + "0 | \n", + "725.4 | \n", + "734.0 | \n", + "35.0 | \n", + "
42235 | \n", + "42235 | \n", + "73845218 | \n", + "73847218 | \n", + "11.628317 | \n", + "2 | \n", + "737.35 | \n", + "0 | \n", + "733.0 | \n", + "741.7 | \n", + "35.0 | \n", + "
42236 | \n", + "42236 | \n", + "73847218 | \n", + "73849218 | \n", + "11.628583 | \n", + "2 | \n", + "745.05 | \n", + "0 | \n", + "740.7 | \n", + "749.4 | \n", + "36.0 | \n", + "
42237 rows × 10 columns
\n", + "\n", + " | spec_idx | \n", + "peak_start_idx | \n", + "peak_stop_idx | \n", + "rt | \n", + "precursor_mz | \n", + "precursor_charge | \n", + "isolation_lower_mz | \n", + "isolation_upper_mz | \n", + "ms_level | \n", + "
---|---|---|---|---|---|---|---|---|---|
0 | \n", + "0 | \n", + "0 | \n", + "10739 | \n", + "0.004935 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
1 | \n", + "1 | \n", + "10739 | \n", + "25554 | \n", + "0.007897 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
2 | \n", + "2 | \n", + "25554 | \n", + "26039 | \n", + "0.011218 | \n", + "810.79 | \n", + "0 | \n", + "810.29 | \n", + "811.29 | \n", + "2 | \n", + "
3 | \n", + "3 | \n", + "26039 | \n", + "27045 | \n", + "0.022838 | \n", + "837.34 | \n", + "0 | \n", + "836.84 | \n", + "837.84 | \n", + "2 | \n", + "
4 | \n", + "4 | \n", + "27045 | \n", + "27882 | \n", + "0.034925 | \n", + "725.36 | \n", + "0 | \n", + "724.86 | \n", + "725.86 | \n", + "2 | \n", + "
5 | \n", + "5 | \n", + "27882 | \n", + "28532 | \n", + "0.048620 | \n", + "558.87 | \n", + "0 | \n", + "558.37 | \n", + "559.37 | \n", + "2 | \n", + "
6 | \n", + "6 | \n", + "28532 | \n", + "29294 | \n", + "0.061923 | \n", + "812.33 | \n", + "0 | \n", + "811.83 | \n", + "812.83 | \n", + "2 | \n", + "
7 | \n", + "7 | \n", + "29294 | \n", + "37374 | \n", + "0.075015 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
8 | \n", + "8 | \n", + "37374 | \n", + "54285 | \n", + "0.077788 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
9 | \n", + "9 | \n", + "54285 | \n", + "54837 | \n", + "0.081203 | \n", + "810.75 | \n", + "0 | \n", + "810.25 | \n", + "811.25 | \n", + "2 | \n", + "
10 | \n", + "10 | \n", + "54837 | \n", + "55778 | \n", + "0.092903 | \n", + "837.96 | \n", + "0 | \n", + "837.46 | \n", + "838.46 | \n", + "2 | \n", + "
11 | \n", + "11 | \n", + "55778 | \n", + "56413 | \n", + "0.104803 | \n", + "644.06 | \n", + "0 | \n", + "643.56 | \n", + "644.56 | \n", + "2 | \n", + "
12 | \n", + "12 | \n", + "56413 | \n", + "57205 | \n", + "0.117215 | \n", + "725.23 | \n", + "0 | \n", + "724.73 | \n", + "725.73 | \n", + "2 | \n", + "
13 | \n", + "13 | \n", + "57205 | \n", + "57874 | \n", + "0.130022 | \n", + "559.19 | \n", + "0 | \n", + "558.69 | \n", + "559.69 | \n", + "2 | \n", + "
14 | \n", + "14 | \n", + "57874 | \n", + "66994 | \n", + "0.143452 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
15 | \n", + "15 | \n", + "66994 | \n", + "81922 | \n", + "0.146408 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
16 | \n", + "16 | \n", + "81922 | \n", + "82501 | \n", + "0.149755 | \n", + "811.41 | \n", + "0 | \n", + "810.91 | \n", + "811.91 | \n", + "2 | \n", + "
17 | \n", + "17 | \n", + "82501 | \n", + "83417 | \n", + "0.161442 | \n", + "837.36 | \n", + "0 | \n", + "836.86 | \n", + "837.86 | \n", + "2 | \n", + "
18 | \n", + "18 | \n", + "83417 | \n", + "84087 | \n", + "0.173370 | \n", + "643.80 | \n", + "0 | \n", + "643.30 | \n", + "644.30 | \n", + "2 | \n", + "
19 | \n", + "19 | \n", + "84087 | \n", + "84761 | \n", + "0.186658 | \n", + "558.94 | \n", + "0 | \n", + "558.44 | \n", + "559.44 | \n", + "2 | \n", + "
20 | \n", + "20 | \n", + "84761 | \n", + "85652 | \n", + "0.200695 | \n", + "725.14 | \n", + "0 | \n", + "724.64 | \n", + "725.64 | \n", + "2 | \n", + "
21 | \n", + "21 | \n", + "85652 | \n", + "94665 | \n", + "0.213673 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
22 | \n", + "22 | \n", + "94665 | \n", + "105242 | \n", + "0.216747 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
23 | \n", + "23 | \n", + "105242 | \n", + "105821 | \n", + "0.220073 | \n", + "810.84 | \n", + "0 | \n", + "810.34 | \n", + "811.34 | \n", + "2 | \n", + "
24 | \n", + "24 | \n", + "105821 | \n", + "106759 | \n", + "0.232923 | \n", + "837.42 | \n", + "0 | \n", + "836.92 | \n", + "837.92 | \n", + "2 | \n", + "
25 | \n", + "25 | \n", + "106759 | \n", + "107548 | \n", + "0.244745 | \n", + "674.64 | \n", + "0 | \n", + "674.14 | \n", + "675.14 | \n", + "2 | \n", + "
26 | \n", + "26 | \n", + "107548 | \n", + "108235 | \n", + "0.259172 | \n", + "643.74 | \n", + "0 | \n", + "643.24 | \n", + "644.24 | \n", + "2 | \n", + "
27 | \n", + "27 | \n", + "108235 | \n", + "109100 | \n", + "0.272663 | \n", + "725.36 | \n", + "0 | \n", + "724.86 | \n", + "725.86 | \n", + "2 | \n", + "
28 | \n", + "28 | \n", + "109100 | \n", + "119764 | \n", + "0.285483 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
29 | \n", + "29 | \n", + "119764 | \n", + "133663 | \n", + "0.288898 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
30 | \n", + "30 | \n", + "133663 | \n", + "134601 | \n", + "0.303703 | \n", + "837.39 | \n", + "0 | \n", + "836.89 | \n", + "837.89 | \n", + "2 | \n", + "
31 | \n", + "31 | \n", + "134601 | \n", + "135254 | \n", + "0.315650 | \n", + "643.80 | \n", + "0 | \n", + "643.30 | \n", + "644.30 | \n", + "2 | \n", + "
32 | \n", + "32 | \n", + "135254 | \n", + "135956 | \n", + "0.328527 | \n", + "558.75 | \n", + "0 | \n", + "558.25 | \n", + "559.25 | \n", + "2 | \n", + "
33 | \n", + "33 | \n", + "135956 | \n", + "136543 | \n", + "0.342915 | \n", + "882.45 | \n", + "0 | \n", + "881.95 | \n", + "882.95 | \n", + "2 | \n", + "
34 | \n", + "34 | \n", + "136543 | \n", + "145312 | \n", + "0.358558 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
35 | \n", + "35 | \n", + "145312 | \n", + "156612 | \n", + "0.361428 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
36 | \n", + "36 | \n", + "156612 | \n", + "157184 | \n", + "0.364755 | \n", + "810.73 | \n", + "0 | \n", + "810.23 | \n", + "811.23 | \n", + "2 | \n", + "
37 | \n", + "37 | \n", + "157184 | \n", + "158248 | \n", + "0.376578 | \n", + "837.35 | \n", + "0 | \n", + "836.85 | \n", + "837.85 | \n", + "2 | \n", + "
38 | \n", + "38 | \n", + "158248 | \n", + "158901 | \n", + "0.388673 | \n", + "643.73 | \n", + "0 | \n", + "643.23 | \n", + "644.23 | \n", + "2 | \n", + "
39 | \n", + "39 | \n", + "158901 | \n", + "159776 | \n", + "0.401962 | \n", + "725.68 | \n", + "0 | \n", + "725.18 | \n", + "726.18 | \n", + "2 | \n", + "
40 | \n", + "40 | \n", + "159776 | \n", + "160530 | \n", + "0.415132 | \n", + "674.70 | \n", + "0 | \n", + "674.20 | \n", + "675.20 | \n", + "2 | \n", + "
41 | \n", + "41 | \n", + "160530 | \n", + "178264 | \n", + "0.428483 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
42 | \n", + "42 | \n", + "178264 | \n", + "193614 | \n", + "0.433222 | \n", + "-1.00 | \n", + "0 | \n", + "-1.00 | \n", + "-1.00 | \n", + "1 | \n", + "
43 | \n", + "43 | \n", + "193614 | \n", + "194225 | \n", + "0.436567 | \n", + "810.82 | \n", + "0 | \n", + "810.32 | \n", + "811.32 | \n", + "2 | \n", + "
44 | \n", + "44 | \n", + "194225 | \n", + "195235 | \n", + "0.448320 | \n", + "837.78 | \n", + "0 | \n", + "837.28 | \n", + "838.28 | \n", + "2 | \n", + "
45 | \n", + "45 | \n", + "195235 | \n", + "195948 | \n", + "0.460565 | \n", + "674.84 | \n", + "0 | \n", + "674.34 | \n", + "675.34 | \n", + "2 | \n", + "
46 | \n", + "46 | \n", + "195948 | \n", + "196607 | \n", + "0.473103 | \n", + "558.90 | \n", + "0 | \n", + "558.40 | \n", + "559.40 | \n", + "2 | \n", + "
47 | \n", + "47 | \n", + "196607 | \n", + "197243 | \n", + "0.487237 | \n", + "882.54 | \n", + "0 | \n", + "882.04 | \n", + "883.04 | \n", + "2 | \n", + "