diff --git a/docs/tutorials/raw_readers.ipynb b/docs/tutorials/raw_readers.ipynb new file mode 100644 index 0000000..f33d9a2 --- /dev/null +++ b/docs/tutorials/raw_readers.ipynb @@ -0,0 +1,1482 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Built-in Raw data readers" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "AlphaRaw supports directly access Thermo's Raw data and Sciex's Wiff data by using PythonNet. PythonNet requires mono to be installed if the os is MacOS or Linux. See installation section of alpharaw (https://github.com/mannlabs/alpharaw). " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Thermo Raw\n", + "\n", + "`alpharaw.thermo.ThermoRawData` contains all functionalities to load the Themro's Raw data. To enable fast data loading, alpharaw enables multiprocessing when `process_count` > 1. This reader can load different kinds of spectrum information into columns of `spectrum_df`. By default, the columns are:\n", + "\n", + "- `spec_idx`: the index of a spectrum in the raw file, it starts from zero. Its value is the `scan number - 1`.\n", + "- `peak_start_idx`: the start row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.\n", + "- `peak_stop_idx`: the stop row index of peaks in `peak_df` (see `mzml_reader.peak_df` below) for the spectrum.\n", + "- `rt`: retention time in minutes. We will use `rt_sec` for retention time in seconds in alphaX ecosystem.\n", + "- `precursor_mz`: the precursor m/z of the given MS2 scans. For an MS1 scan, the value is always -1. For DIA MS2, the default value will be the isolation center of the MS2. And for DDA MS2, `precursor_mz` may refer to the mono-isotope m/z of the precursor when `precursor_charge` is not 0, otherwise isolation center.\n", + "- `precursor_charge`: For DIA, this value is always 0. For DDA, it can be nonzero when the mono-isotope m/z is determined.\n", + "- `isolation_lower_mz`: the lower (or left) m/z boundary of the isolation window.\n", + "- `isolation_upper_mz`: the upper (or right) m/z boundary of the isolation window.\n", + "- `ms_level`: MS1, MS2, ... it starts from one.\n", + "- `nce`: normalized collision energy designed by Thermo.\n", + "\n", + "There are also some optional spectrum columns (auxiliary_item) that can be loaded into the `spectrum_df`:\n", + "\n", + "- `injection_time`: `Ion Injection Time (ms)` in the scan header.\n", + "- `cv`: source fragmentation CV???\n", + "- `max_ion_time`: `Max. Ion Time (ms)` in the scan header.\n", + "- `agc_target`: `AGC target` in the scan header.\n", + "- `energy_ev`: `HCD Energy V` in the scan header. This is the real EV of the collision energy.\n", + "- `injection_optics_settling_time`: `Injection Optics Settling Time (ms)` in the scan header.\n", + "- `funnel_rf_level`: `Funnel RF Level` in the scan header.\n", + "- `faims_cv`: `FAIMS CV` in the scan header.\n", + "- `activation`: activation type, for example, HCD, CID, ETD, ...\n", + "- `analyzer`: analyzer type, for example FTMS, Astral, ITMS, ...\n", + "- `activation_id`: Thermo's built-in IDs of `activation` types.\n", + "- `analyzer_id`: Thermo's built-in IDs of `analyzer` types." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
spec_idxpeak_start_idxpeak_stop_idxrtprecursor_mzprecursor_chargeisolation_lower_mzisolation_upper_mzms_levelnce...max_ion_timeagc_targetenergy_evinjection_optics_settling_timefunnel_rf_levelfaims_cvactivationanalyzeractivation_idanalyzer_id
0002540.002983-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
112546650.006392-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
2266511310.009808-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
33113116630.013224-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
44166321690.016641-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
..................................................................
39323932110027111015125.994985-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
39333933110151211015285.997334362.5371400361.837140363.237140230.0...28.010000016.5000000.040.00.0HCDFTMS54
39343934110152811027585.998843-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
39353935110275811027716.001193425.3265690424.626569426.026569230.0...28.010000018.6900010.040.00.0HCDFTMS54
39363936110277111039896.002935-1.0000000-1.000000-1.00000010.0...25.030000000.0000000.040.00.0MS1FTMS2554
\n", + "

3937 rows × 22 columns

\n", + "
" + ], + "text/plain": [ + " spec_idx peak_start_idx peak_stop_idx rt precursor_mz \\\n", + "0 0 0 254 0.002983 -1.000000 \n", + "1 1 254 665 0.006392 -1.000000 \n", + "2 2 665 1131 0.009808 -1.000000 \n", + "3 3 1131 1663 0.013224 -1.000000 \n", + "4 4 1663 2169 0.016641 -1.000000 \n", + "... ... ... ... ... ... \n", + "3932 3932 1100271 1101512 5.994985 -1.000000 \n", + "3933 3933 1101512 1101528 5.997334 362.537140 \n", + "3934 3934 1101528 1102758 5.998843 -1.000000 \n", + "3935 3935 1102758 1102771 6.001193 425.326569 \n", + "3936 3936 1102771 1103989 6.002935 -1.000000 \n", + "\n", + " precursor_charge isolation_lower_mz isolation_upper_mz ms_level \\\n", + "0 0 -1.000000 -1.000000 1 \n", + "1 0 -1.000000 -1.000000 1 \n", + "2 0 -1.000000 -1.000000 1 \n", + "3 0 -1.000000 -1.000000 1 \n", + "4 0 -1.000000 -1.000000 1 \n", + "... ... ... ... ... \n", + "3932 0 -1.000000 -1.000000 1 \n", + "3933 0 361.837140 363.237140 2 \n", + "3934 0 -1.000000 -1.000000 1 \n", + "3935 0 424.626569 426.026569 2 \n", + "3936 0 -1.000000 -1.000000 1 \n", + "\n", + " nce ... max_ion_time agc_target energy_ev \\\n", + "0 0.0 ... 25.0 3000000 0.000000 \n", + "1 0.0 ... 25.0 3000000 0.000000 \n", + "2 0.0 ... 25.0 3000000 0.000000 \n", + "3 0.0 ... 25.0 3000000 0.000000 \n", + "4 0.0 ... 25.0 3000000 0.000000 \n", + "... ... ... ... ... ... \n", + "3932 0.0 ... 25.0 3000000 0.000000 \n", + "3933 30.0 ... 28.0 100000 16.500000 \n", + "3934 0.0 ... 25.0 3000000 0.000000 \n", + "3935 30.0 ... 28.0 100000 18.690001 \n", + "3936 0.0 ... 25.0 3000000 0.000000 \n", + "\n", + " injection_optics_settling_time funnel_rf_level faims_cv activation \\\n", + "0 0.0 40.0 0.0 MS1 \n", + "1 0.0 40.0 0.0 MS1 \n", + "2 0.0 40.0 0.0 MS1 \n", + "3 0.0 40.0 0.0 MS1 \n", + "4 0.0 40.0 0.0 MS1 \n", + "... ... ... ... ... \n", + "3932 0.0 40.0 0.0 MS1 \n", + "3933 0.0 40.0 0.0 HCD \n", + "3934 0.0 40.0 0.0 MS1 \n", + "3935 0.0 40.0 0.0 HCD \n", + "3936 0.0 40.0 0.0 MS1 \n", + "\n", + " analyzer activation_id analyzer_id \n", + "0 FTMS 255 4 \n", + "1 FTMS 255 4 \n", + "2 FTMS 255 4 \n", + "3 FTMS 255 4 \n", + "4 FTMS 255 4 \n", + "... ... ... ... \n", + "3932 FTMS 255 4 \n", + "3933 FTMS 5 4 \n", + "3934 FTMS 255 4 \n", + "3935 FTMS 5 4 \n", + "3936 FTMS 255 4 \n", + "\n", + "[3937 rows x 22 columns]" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from alpharaw.thermo import ThermoRawData\n", + "\n", + "raw_data = ThermoRawData(\n", + " process_count=1,\n", + " auxiliary_items=[\n", + " \"injection_time\", \"cv\",\n", + " \"max_ion_time\", \"agc_target\", \"energy_ev\",\n", + " \"injection_optics_settling_time\", \n", + " \"funnel_rf_level\", \"faims_cv\",\n", + " \"activation\", \"analyzer\",\n", + " \"activation_id\", \"analyzer_id\",\n", + " # \"multinotch\",\n", + " ]\n", + ")\n", + "raw_data.import_raw(\"../../nbs_tests/test_data/iRT.raw\")\n", + "raw_data.spectrum_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Sciex Wiff\n", + "\n", + "AlphaRaw can access basic scan (spectrum) information of Sciex Wiff data. And the peaks are usually not centroided." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
spec_idxpeak_start_idxpeak_stop_idxrtms_levelprecursor_mzprecursor_chargeisolation_lower_mzisolation_upper_mznce
0001000.0004171-1.000-1.0-1.00.0
111004470.0011332403.550399.5407.619.0
224479240.0013832411.250406.6415.920.0
3392412860.0016502419.250414.9423.620.0
44128619430.0019002426.950422.6431.320.0
.................................
4223242232738392187384121811.6275502715.150711.1719.234.0
4223342233738412187384321811.6278172722.300718.2726.435.0
4223442234738432187384521811.6280672729.700725.4734.035.0
4223542235738452187384721811.6283172737.350733.0741.735.0
4223642236738472187384921811.6285832745.050740.7749.436.0
\n", + "

42237 rows × 10 columns

\n", + "
" + ], + "text/plain": [ + " spec_idx peak_start_idx peak_stop_idx rt ms_level \\\n", + "0 0 0 100 0.000417 1 \n", + "1 1 100 447 0.001133 2 \n", + "2 2 447 924 0.001383 2 \n", + "3 3 924 1286 0.001650 2 \n", + "4 4 1286 1943 0.001900 2 \n", + "... ... ... ... ... ... \n", + "42232 42232 73839218 73841218 11.627550 2 \n", + "42233 42233 73841218 73843218 11.627817 2 \n", + "42234 42234 73843218 73845218 11.628067 2 \n", + "42235 42235 73845218 73847218 11.628317 2 \n", + "42236 42236 73847218 73849218 11.628583 2 \n", + "\n", + " precursor_mz precursor_charge isolation_lower_mz isolation_upper_mz \\\n", + "0 -1.00 0 -1.0 -1.0 \n", + "1 403.55 0 399.5 407.6 \n", + "2 411.25 0 406.6 415.9 \n", + "3 419.25 0 414.9 423.6 \n", + "4 426.95 0 422.6 431.3 \n", + "... ... ... ... ... \n", + "42232 715.15 0 711.1 719.2 \n", + "42233 722.30 0 718.2 726.4 \n", + "42234 729.70 0 725.4 734.0 \n", + "42235 737.35 0 733.0 741.7 \n", + "42236 745.05 0 740.7 749.4 \n", + "\n", + " nce \n", + "0 0.0 \n", + "1 19.0 \n", + "2 20.0 \n", + "3 20.0 \n", + "4 20.0 \n", + "... ... \n", + "42232 34.0 \n", + "42233 35.0 \n", + "42234 35.0 \n", + "42235 35.0 \n", + "42236 36.0 \n", + "\n", + "[42237 rows x 10 columns]" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from alpharaw.sciex import SciexWiffData\n", + "\n", + "wiff_data = SciexWiffData()\n", + "wiff_data.import_raw(\n", + " \"../../nbs_tests/test_data/02112022_Zeno1_TiHe_DIAMA_HeLa_200ng_EVO5_01.wiff\"\n", + ")\n", + "wiff_data.spectrum_df" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## mzML\n", + "\n", + "mzML is partially supported, the basic spectrum information is extracted." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
spec_idxpeak_start_idxpeak_stop_idxrtprecursor_mzprecursor_chargeisolation_lower_mzisolation_upper_mzms_level
000107390.004935-1.000-1.00-1.001
1110739255540.007897-1.000-1.00-1.001
2225554260390.011218810.790810.29811.292
3326039270450.022838837.340836.84837.842
4427045278820.034925725.360724.86725.862
5527882285320.048620558.870558.37559.372
6628532292940.061923812.330811.83812.832
7729294373740.075015-1.000-1.00-1.001
8837374542850.077788-1.000-1.00-1.001
9954285548370.081203810.750810.25811.252
101054837557780.092903837.960837.46838.462
111155778564130.104803644.060643.56644.562
121256413572050.117215725.230724.73725.732
131357205578740.130022559.190558.69559.692
141457874669940.143452-1.000-1.00-1.001
151566994819220.146408-1.000-1.00-1.001
161681922825010.149755811.410810.91811.912
171782501834170.161442837.360836.86837.862
181883417840870.173370643.800643.30644.302
191984087847610.186658558.940558.44559.442
202084761856520.200695725.140724.64725.642
212185652946650.213673-1.000-1.00-1.001
2222946651052420.216747-1.000-1.00-1.001
23231052421058210.220073810.840810.34811.342
24241058211067590.232923837.420836.92837.922
25251067591075480.244745674.640674.14675.142
26261075481082350.259172643.740643.24644.242
27271082351091000.272663725.360724.86725.862
28281091001197640.285483-1.000-1.00-1.001
29291197641336630.288898-1.000-1.00-1.001
30301336631346010.303703837.390836.89837.892
31311346011352540.315650643.800643.30644.302
32321352541359560.328527558.750558.25559.252
33331359561365430.342915882.450881.95882.952
34341365431453120.358558-1.000-1.00-1.001
35351453121566120.361428-1.000-1.00-1.001
36361566121571840.364755810.730810.23811.232
37371571841582480.376578837.350836.85837.852
38381582481589010.388673643.730643.23644.232
39391589011597760.401962725.680725.18726.182
40401597761605300.415132674.700674.20675.202
41411605301782640.428483-1.000-1.00-1.001
42421782641936140.433222-1.000-1.00-1.001
43431936141942250.436567810.820810.32811.322
44441942251952350.448320837.780837.28838.282
45451952351959480.460565674.840674.34675.342
46461959481966070.473103558.900558.40559.402
47471966071972430.487237882.540882.04883.042
\n", + "
" + ], + "text/plain": [ + " spec_idx peak_start_idx peak_stop_idx rt precursor_mz \\\n", + "0 0 0 10739 0.004935 -1.00 \n", + "1 1 10739 25554 0.007897 -1.00 \n", + "2 2 25554 26039 0.011218 810.79 \n", + "3 3 26039 27045 0.022838 837.34 \n", + "4 4 27045 27882 0.034925 725.36 \n", + "5 5 27882 28532 0.048620 558.87 \n", + "6 6 28532 29294 0.061923 812.33 \n", + "7 7 29294 37374 0.075015 -1.00 \n", + "8 8 37374 54285 0.077788 -1.00 \n", + "9 9 54285 54837 0.081203 810.75 \n", + "10 10 54837 55778 0.092903 837.96 \n", + "11 11 55778 56413 0.104803 644.06 \n", + "12 12 56413 57205 0.117215 725.23 \n", + "13 13 57205 57874 0.130022 559.19 \n", + "14 14 57874 66994 0.143452 -1.00 \n", + "15 15 66994 81922 0.146408 -1.00 \n", + "16 16 81922 82501 0.149755 811.41 \n", + "17 17 82501 83417 0.161442 837.36 \n", + "18 18 83417 84087 0.173370 643.80 \n", + "19 19 84087 84761 0.186658 558.94 \n", + "20 20 84761 85652 0.200695 725.14 \n", + "21 21 85652 94665 0.213673 -1.00 \n", + "22 22 94665 105242 0.216747 -1.00 \n", + "23 23 105242 105821 0.220073 810.84 \n", + "24 24 105821 106759 0.232923 837.42 \n", + "25 25 106759 107548 0.244745 674.64 \n", + "26 26 107548 108235 0.259172 643.74 \n", + "27 27 108235 109100 0.272663 725.36 \n", + "28 28 109100 119764 0.285483 -1.00 \n", + "29 29 119764 133663 0.288898 -1.00 \n", + "30 30 133663 134601 0.303703 837.39 \n", + "31 31 134601 135254 0.315650 643.80 \n", + "32 32 135254 135956 0.328527 558.75 \n", + "33 33 135956 136543 0.342915 882.45 \n", + "34 34 136543 145312 0.358558 -1.00 \n", + "35 35 145312 156612 0.361428 -1.00 \n", + "36 36 156612 157184 0.364755 810.73 \n", + "37 37 157184 158248 0.376578 837.35 \n", + "38 38 158248 158901 0.388673 643.73 \n", + "39 39 158901 159776 0.401962 725.68 \n", + "40 40 159776 160530 0.415132 674.70 \n", + "41 41 160530 178264 0.428483 -1.00 \n", + "42 42 178264 193614 0.433222 -1.00 \n", + "43 43 193614 194225 0.436567 810.82 \n", + "44 44 194225 195235 0.448320 837.78 \n", + "45 45 195235 195948 0.460565 674.84 \n", + "46 46 195948 196607 0.473103 558.90 \n", + "47 47 196607 197243 0.487237 882.54 \n", + "\n", + " precursor_charge isolation_lower_mz isolation_upper_mz ms_level \n", + "0 0 -1.00 -1.00 1 \n", + "1 0 -1.00 -1.00 1 \n", + "2 0 810.29 811.29 2 \n", + "3 0 836.84 837.84 2 \n", + "4 0 724.86 725.86 2 \n", + "5 0 558.37 559.37 2 \n", + "6 0 811.83 812.83 2 \n", + "7 0 -1.00 -1.00 1 \n", + "8 0 -1.00 -1.00 1 \n", + "9 0 810.25 811.25 2 \n", + "10 0 837.46 838.46 2 \n", + "11 0 643.56 644.56 2 \n", + "12 0 724.73 725.73 2 \n", + "13 0 558.69 559.69 2 \n", + "14 0 -1.00 -1.00 1 \n", + "15 0 -1.00 -1.00 1 \n", + "16 0 810.91 811.91 2 \n", + "17 0 836.86 837.86 2 \n", + "18 0 643.30 644.30 2 \n", + "19 0 558.44 559.44 2 \n", + "20 0 724.64 725.64 2 \n", + "21 0 -1.00 -1.00 1 \n", + "22 0 -1.00 -1.00 1 \n", + "23 0 810.34 811.34 2 \n", + "24 0 836.92 837.92 2 \n", + "25 0 674.14 675.14 2 \n", + "26 0 643.24 644.24 2 \n", + "27 0 724.86 725.86 2 \n", + "28 0 -1.00 -1.00 1 \n", + "29 0 -1.00 -1.00 1 \n", + "30 0 836.89 837.89 2 \n", + "31 0 643.30 644.30 2 \n", + "32 0 558.25 559.25 2 \n", + "33 0 881.95 882.95 2 \n", + "34 0 -1.00 -1.00 1 \n", + "35 0 -1.00 -1.00 1 \n", + "36 0 810.23 811.23 2 \n", + "37 0 836.85 837.85 2 \n", + "38 0 643.23 644.23 2 \n", + "39 0 725.18 726.18 2 \n", + "40 0 674.20 675.20 2 \n", + "41 0 -1.00 -1.00 1 \n", + "42 0 -1.00 -1.00 1 \n", + "43 0 810.32 811.32 2 \n", + "44 0 837.28 838.28 2 \n", + "45 0 674.34 675.34 2 \n", + "46 0 558.40 559.40 2 \n", + "47 0 882.04 883.04 2 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "from alpharaw.mzml import MzMLReader\n", + "\n", + "mzml_reader = MzMLReader()\n", + "mzml_reader.load_raw(\"../../nbs_tests/test_data/small.pwiz.1.1.mzML\")\n", + "mzml_reader.spectrum_df" + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "base", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.12.4" + } + }, + "nbformat": 4, + "nbformat_minor": 2 +}