diff --git a/docs/README.md b/docs/README.md index dbd3b63..6b2af8e 100644 --- a/docs/README.md +++ b/docs/README.md @@ -2,7 +2,10 @@ The `trgtools` repository contains a collection of tools and scripts to emulate, test and analyze the performance of trigger and trigger algorithms. +Use `pip install -r requirements.txt` to install all the Python packages necessary to run the `*_dump.py` scripts and the `trgtools.plot` submodule. + - `process_tpstream`: Example of a simple pipeline to process TPStream files (slice by slice) and apply a trigger activity algorithm. - `ta_dump.py`: Script that loads HDF5 files containing trigger activities and plots various diagnostic information. [Documentation](ta-dump.md). - `tc_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tc-dump.md). - `tp_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tp-dump.md). +- Python `trgtools` module: Reading and plotting module in that specializes in reading TP, TA, and TC fragments for a given HDF5. The submodule `trgtools.plot` has a common `PDFPlotter` that is used in the `*_dump.py` scripts. [Documentation](py-trgtools.md). diff --git a/docs/py-trgtools.md b/docs/py-trgtools.md new file mode 100644 index 0000000..6730431 --- /dev/null +++ b/docs/py-trgtools.md @@ -0,0 +1,83 @@ +# Python trgtools Module + +Reading a DUNE-DAQ HDF5 file for the TP, TA, and TC contents can be easily done using the `trgtools` Python module. + +# Example + +## Common Methods +```python +import trgtools + +tp_data = trgtools.TPReader(hdf5_file_name) + +# Get all the available paths for TPs in this file. +frag_paths = tp_data.get_fragment_paths() + +# Read all fragment paths. Appends results to tp_data.tp_data. +tp_data.read_all_fragments() + +# Read only one fragment. Return result and append to tp_data.tp_data. +frag0_tps = tp_data.read_fragment(frag_paths[0]) + +# Reset tp_data.tp_data. Keeps the current fragment paths. +tp_data.clear_data() + +# Reset the fragment paths to the initalized state. +tp_data.reset_fragment_paths() +``` + +## Data Accessing +```python +tp_data = trgtools.TPReader(hdf5_file_name) +ta_data = trgtools.TAReader(hdf5_file_name) +tc_data = trgtools.TCReader(hdf5_file_name) + +tp_data.read_all_fragments() +ta_data.read_all_fragments() +tc_data.read_all_fragments() + +# Primary contents of the fragments +# np.ndarray with each index as one T* +tp_data.tp_data +ta_data.ta_data +tc_data.tc_data + +# Secondary contents of the fragments +# List with each index as the TPs/TAs in the TA/TC +ta_data.tp_data +tc_data.ta_data + +ta0_contents = ta_data.tp_data[0] +tc0_contents = tc_data.ta_data[0] +``` +Data accessing follows a very similar procedure between the different readers. The TAReader and TCReader also contain the secondary information about the TPs and TAs that formed the TAs and TCs, respectively. For the `np.ndarray` objects, one can also specify the member data they want to access. For example, +```python +ta_data.ta_data['time_start'] # Returns a np.ndarray of the time_starts for all read TAs +``` +The available data members for each reader can be used (and shown) with `tp_data.tp_dt`, `ta_data.ta_dt` and `ta_data.tp_dt`, and `tc_data.tc_dt` and `tc_data.ta_dt`. + +Look at the contents of `*_dump.py` for more detailed examples of data member usage. + +While using interactive Python, one can do `help(tp_data)` and `help(tp_data.read_fragment)` for documentation on their usage (and similarly for the other readers and plotters). + +# Plotting +There is also a submodule `trgtools.plot` that features a class `PDFPlotter`. This class contains common plotting that was repeated between the `*_dump.py`. Loading this class requires `matplotlib` to be installed, but simply doing `import trgtools` does not have this requirement. + +## Example +```python +import trgtools +from trgtools.plot import PDFPlotter + +tp_data = trgtools.TPReader(file_to_read) + +pdf_save_name = 'example.pdf' +pdf_plotter = PDFPlotter(pdf_save_name) + +plot_style_dict = dict(title="ADC Peak Histogram", xlabel="ADC Counts", ylabel="Count") +pdf_plotter.plot_histogram(tp_data['adc_peak'], plot_style_dict) +``` + +By design, the `plot_style_dict` requires the keys `title`, `xlabel`, and `ylabel` at a minimum. More options are available to further change the style of the plot, and examples of this are available in the `*_dump.py`. + +### Development +The common plots available in `PDFPlotter` is rather limited right now. At this moment, these plots are sufficient, but more common plotting functions can be added. diff --git a/docs/ta-dump.md b/docs/ta-dump.md index 574c75f..1a498c9 100644 --- a/docs/ta-dump.md +++ b/docs/ta-dump.md @@ -2,23 +2,24 @@ `ta_dump.py` is a plotting script that shows TA diagnostic information, such as: algorithms produced, number of TPs per TA, event displays, ADC integral histogram, and a plot of the time starts. -A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: histograms for member data and event displays. +A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: 1) member data histograms and light analysis plots and 2) event displays. There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks. -While running, this prints warnings for empty fragments that are skipped in the given HDF5 file. These outputs can be suppressed with `--quiet`. +While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted. -One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason). +One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason). Event displays are processed by default. If there are many TAs that were loaded, then this may take a while to plot. The `--no-display` options skips event display plotting. -A text file named `ta_anomalies.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. +A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. ## Example ```bash python ta_dump.py file.hdf5 python ta_dump.py file.hdf5 --help -python ta_dump.py file.hdf5 --quiet +python ta_dump.py file.hdf5 -v +python ta_dump.py file.hdf5 -vv python ta_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments python ta_dump.py file.hdf5 --no-display python ta_dump.py file.hdf5 --no-anomaly diff --git a/docs/tc-dump.md b/docs/tc-dump.md index eebebef..8ecb82b 100644 --- a/docs/tc-dump.md +++ b/docs/tc-dump.md @@ -1,22 +1,26 @@ # Trigger Candidate Dump Info -`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots:time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages. +`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots: time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages. + +By default, a new PDF is generated (with naming based on the existing PDFs). One can pass `--overwrite` to overwrite the 0th PDF for a given HDF5 file. There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks. -While running, this prints warnings and general information about loading and plotting. These outputs can be suppressed with `--quiet`. +While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted. -One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason). +One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason). -A text file named `tc_anomalies_-.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. +A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. ## Example ```bash python tc_dump.py file.hdf5 python tc_dump.py file.hdf5 --help -python tc_dump.py file.hdf5 --quiet +python tc_dump.py file.hdf5 -v +python tc_dump.py file.hdf5 -vv python tc_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments python tc_dump.py file.hdf5 --no-anomaly python tc_dump.py file.hdf5 --log python tc_dump.py file.hdf5 --linear python tc_dump.py file.hdf5 --seconds +python tc_dump.py file.hdf5 --overwrite ``` diff --git a/docs/tp-dump.md b/docs/tp-dump.md index d7af42d..4cf930d 100644 --- a/docs/tp-dump.md +++ b/docs/tp-dump.md @@ -2,21 +2,22 @@ `tp_dump.py` is a plotting script that shows TP diagnostic information, such as: TP channel histogram and channel vs time over threshold. Plots are saved as SVGs, PDFs, and PNGs. -A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. +A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks. -While running, this script prints various loading information. These outputs can be suppressed with `--quiet`. +While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted. -One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is 0 by default (for the previously mentioned reason). +One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is N by default (for the previously mentioned reason). -A text file named `tp_anomaly_summary.txt` is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. +A text file is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation. ## Example ```bash python tp_dump.py file.hdf5 # Loads last 10 fragments by default. python tp_dump.py file.hdf5 --help -python tp_dump.py file.hdf5 --quiet +python tp_dump.py file.hdf5 -v +python tp_dump.py file.hdf5 -vv python tp_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Loads 50 fragments. python tp_dump.py file.hdf5 --no-anomaly python ta_dump.py file.hdf5 --log diff --git a/python/setup.cfg b/python/setup.cfg new file mode 100644 index 0000000..7787a0b --- /dev/null +++ b/python/setup.cfg @@ -0,0 +1,3 @@ +[flake8] +# Conforms to DUNE-DAQ style guide +max-line-length = 120 diff --git a/python/trgtools/HDF5Reader.py b/python/trgtools/HDF5Reader.py new file mode 100644 index 0000000..0b76dc1 --- /dev/null +++ b/python/trgtools/HDF5Reader.py @@ -0,0 +1,97 @@ +""" +Generic HDF5Reader class to read and store data. +""" +from hdf5libs import HDF5RawDataFile + +import abc + + +class HDF5Reader(abc.ABC): + """ + Abstract reader class for HDF5 files. + + Derived classes must complete all methods + decorated with @abc.abstractmethod. + """ + + # Useful print colors + _FAIL_TEXT_COLOR = '\033[91m' + _WARNING_TEXT_COLOR = '\033[93m' + _BOLD_TEXT = '\033[1m' + _END_TEXT_COLOR = '\033[0m' + + # Counts the number of empty fragments. + _num_empty = 0 + + def __init__(self, filename: str, verbosity: int = 0) -> None: + """ + Loads a given HDF5 file. + + Parameters: + filename (str): HDF5 file to open. + verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All. + + Returns nothing. + """ + # Generic loading + self._h5_file = HDF5RawDataFile(filename) + self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths() + self.run_id = self._h5_file.get_int_attribute('run_number') + self.file_index = self._h5_file.get_int_attribute('file_index') + + self._verbosity = verbosity + + self._filter_fragment_paths() # Derived class must define this. + + return None + + @abc.abstractmethod + def _filter_fragment_paths(self) -> None: + """ + Filter the fragment paths of interest. + + This should be according to the derived reader's + data type of interest, e.g., filter for TriggerActivity. + """ + ... + + def get_fragment_paths(self) -> list[str]: + """ Return the list of fragment paths. """ + return list(self._fragment_paths) + + def set_fragment_paths(self, fragment_paths: list[str]) -> None: + """ Set the list of fragment paths. """ + self._fragment_paths = fragment_paths + return None + + @abc.abstractmethod + def read_fragment(self, fragment_path: str) -> None: + """ Read one fragment from :fragment_path:. """ + ... + + def read_all_fragments(self) -> None: + """ Read all fragments. """ + for fragment_path in self._fragment_paths: + _ = self.read_fragment(fragment_path) + + # self.read_fragment should increment self._num_empty. + # Print how many were empty as a debug. + if self._verbosity >= 1 and self._num_empty != 0: + print( + self._FAIL_TEXT_COLOR + + self._BOLD_TEXT + + f"WARNING: Skipped {self._num_empty} frags." + + self._END_TEXT_COLOR + ) + + return None + + @abc.abstractmethod + def clear_data(self) -> None: + """ Clear the contents of the member data. """ + ... + + def reset_fragment_paths(self) -> None: + """ Reset the fragment paths to the initialized state. """ + self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths() + self._filter_fragment_paths() diff --git a/python/trgtools/TAData.py b/python/trgtools/TAData.py deleted file mode 100644 index 9f39064..0000000 --- a/python/trgtools/TAData.py +++ /dev/null @@ -1,174 +0,0 @@ -""" -Class containing TA data and the ability to process data. -""" -import os - -import numpy as np - -from hdf5libs import HDF5RawDataFile -import daqdataformats -import trgdataformats - -class TAData: - """ - Class that loads a given TPStream file and can - process the TA fragments within. - - Loading fragments populates obj.ta_data and obj.tp_data. - Numpy dtypes of ta_data and tp_data are available as - obj.ta_dt and obj.tp_dt. - - Gives warnings and information when trying to load - empty fragments. Can be quieted with quiet=True on - init. - """ - ## Useful print colors - _FAIL_TEXT_COLOR = '\033[91m' - _WARNING_TEXT_COLOR = '\033[93m' - _BOLD_TEXT = '\033[1m' - _END_TEXT_COLOR = '\033[0m' - - ## TA data type - ta_dt = np.dtype([ - ('adc_integral', np.uint64), - ('adc_peak', np.uint64), - ('algorithm', np.uint8), - ('channel_end', np.int32), - ('channel_peak', np.int32), - ('channel_start', np.int32), - ('detid', np.uint16), - ('num_tps', np.uint64), # Greedy - ('time_activity', np.uint64), - ('time_end', np.uint64), - ('time_peak', np.uint64), - ('time_start', np.uint64), - ('type', np.uint8), - ('version', np.uint16) - ]) - ## TP data type - tp_dt = np.dtype([ - ('adc_integral', np.uint32), - ('adc_peak', np.uint32), - ('algorithm', np.uint8), - ('channel', np.int32), - ('detid', np.uint16), - ('flag', np.uint16), - ('time_over_threshold', np.uint64), - ('time_peak', np.uint64), - ('time_start', np.uint64), - ('type', np.uint8), - ('version', np.uint16) - ]) - - def __init__(self, filename, quiet=False): - """ - Loads the given HDF5 file and inits memmber data. - """ - self._h5_file = HDF5RawDataFile(filename) - self._set_ta_frag_paths(self._h5_file.get_all_fragment_dataset_paths()) - self._quiet = quiet - - self.ta_data = np.array([], dtype=self.ta_dt) # Will concatenate new TAs - self.tp_data = [] # tp_data[i] will be a np.ndarray of TPs from the i-th TA - self._ta_size = trgdataformats.TriggerActivityOverlay().sizeof() - - # Masking frags that are found as empty. - self._nonempty_frags_mask = np.ones((len(self._frag_paths),), dtype=bool) - self._num_empty = 0 - - # File identification attributes - self.run_id = self._h5_file.get_int_attribute('run_number') - self.file_index = self._h5_file.get_int_attribute('file_index') - - def _set_ta_frag_paths(self, frag_paths) -> None: - """ - Only collect the fragment paths that are for TAs. - """ - self._frag_paths = [] - for frag_path in frag_paths: - if 'Trigger_Activity' in frag_path: - self._frag_paths.append(frag_path) - - def get_ta_frag_paths(self) -> list: - return self._frag_paths - - def load_frag(self, frag_path, index=None) -> None: - """ - Load a fragment from a given fragment path. - Saves the results to self.ta_data and self.tp_data. - Returns nothing. - """ - frag = self._h5_file.get_frag(frag_path) - frag_data_size = frag.get_data_size() - if frag_data_size == 0: - self._num_empty += 1 - if not self._quiet: - print(self._FAIL_TEXT_COLOR + self._BOLD_TEXT + "WARNING: Empty fragment. Returning empty array." + self._END_TEXT_COLOR) - print(self._WARNING_TEXT_COLOR + self._BOLD_TEXT + f"INFO: Fragment Path: {frag_path}" + self._END_TEXT_COLOR) - if index != None: - self._nonempty_frags_mask[index] = False - return - if index == None: - index = self._frag_paths.index(frag_path) - - ta_idx = 0 # Only used to output. - byte_idx = 0 # Variable TA sizing, must do while loop. - while (byte_idx < frag_data_size): - if (not self._quiet): - print(f"Fragment Index: {ta_idx}.") - ta_idx += 1 - print(f"Byte Index / Frag Size: {byte_idx} / {frag_data_size}") - ## Process TA data - ta_datum = trgdataformats.TriggerActivity(frag.get_data(byte_idx)) - np_ta_datum = np.array([( - ta_datum.data.adc_integral, - ta_datum.data.adc_peak, - np.uint8(ta_datum.data.algorithm), - ta_datum.data.channel_end, - ta_datum.data.channel_peak, - ta_datum.data.channel_start, - np.uint16(ta_datum.data.detid), - ta_datum.n_inputs(), - ta_datum.data.time_activity, - ta_datum.data.time_end, - ta_datum.data.time_peak, - ta_datum.data.time_start, - np.uint8(ta_datum.data.type), - np.uint16(ta_datum.data.version))], - dtype=self.ta_dt) - self.ta_data = np.hstack((self.ta_data, np_ta_datum)) - byte_idx += ta_datum.sizeof() - if (not self._quiet): - print(f"Upcoming byte: {byte_idx}") - - ## Process TP data - np_tp_data = np.zeros(np_ta_datum['num_tps'], dtype=self.tp_dt) - for tp_idx, tp in enumerate(ta_datum): - np_tp_data[tp_idx] = np.array([( - tp.adc_integral, - tp.adc_peak, - tp.algorithm, - tp.channel, - tp.detid, - tp.flag, - tp.time_over_threshold, - tp.time_peak, - tp.time_start, - tp.type, - tp.version)], - dtype=self.tp_dt) - self.tp_data.append(np_tp_data) # Jagged array - - return - - def load_all_frags(self) -> None: - """ - Load all fragments. - - Returns nothing. Data is accessible from obj.ta_data - and obj.tp_data. - """ - for idx, frag_path in enumerate(self._frag_paths): - self.load_frag(frag_path, idx) - if self._num_empty != 0 and not self._quiet: - print(self._FAIL_TEXT_COLOR + self._BOLD_TEXT + f"WARNING: Skipped {self._num_empty} frags." + self._END_TEXT_COLOR) diff --git a/python/trgtools/TAReader.py b/python/trgtools/TAReader.py new file mode 100644 index 0000000..64c3886 --- /dev/null +++ b/python/trgtools/TAReader.py @@ -0,0 +1,169 @@ +""" +Reader class for TA data. +""" +from .HDF5Reader import HDF5Reader + +import daqdataformats # noqa: F401 : Not used, but needed to recognize formats. +import trgdataformats + +import numpy as np + + +class TAReader(HDF5Reader): + """ + Class that reads a given HDF5 data file and can + process the TA fragments within. + + Loading fragments appends to :self.ta_data: and :self.tp_data:. + NumPy dtypes of :self.ta_data: and :self.tp_data: are available + as :TAReader.ta_dt: and :TAReader.tp_dt:. + + TA reading can print information that is relevant about the + loading process by specifying the verbose level. 0 for errors + only. 1 for warnings. 2 for all information. + """ + # TA data type + ta_dt = np.dtype([ + ('adc_integral', np.uint64), + ('adc_peak', np.uint64), + ('algorithm', np.uint8), + ('channel_end', np.int32), + ('channel_peak', np.int32), + ('channel_start', np.int32), + ('detid', np.uint16), + ('num_tps', np.uint64), # Greedy + ('time_activity', np.uint64), + ('time_end', np.uint64), + ('time_peak', np.uint64), + ('time_start', np.uint64), + ('type', np.uint8), + ('version', np.uint16) + ]) + ta_data = np.array([], dtype=ta_dt) + + # TP data type + tp_dt = np.dtype([ + ('adc_integral', np.uint32), + ('adc_peak', np.uint32), + ('algorithm', np.uint8), + ('channel', np.int32), + ('detid', np.uint16), + ('flag', np.uint16), + ('time_over_threshold', np.uint64), + ('time_peak', np.uint64), + ('time_start', np.uint64), + ('type', np.uint8), + ('version', np.uint16) + ]) + tp_data = [] + + def __init__(self, filename: str, verbosity: int = 0) -> None: + """ + Loads a given HDF5 file. + + Parameters: + filename (str): HDF5 file to open. + verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All. + + Returns nothing. + """ + super().__init__(filename, verbosity) + return None + + def _filter_fragment_paths(self) -> None: + """ Filter the fragment paths for TAs. """ + fragment_paths = [] + + # TA fragment paths contain their name in the path. + for path in self._fragment_paths: + if "Trigger_Activity" in path: + fragment_paths.append(path) + + self._fragment_paths = fragment_paths + return None + + def read_fragment(self, fragment_path: str) -> np.ndarray: + """ + Read from the given data fragment path. + + Returns a np.ndarray of the TAs that were read and appends to + :self.ta_data:. + """ + if self._verbosity >= 2: + print("="*60) + print(f"INFO: Reading from the path\n{fragment_path}") + + fragment = self._h5_file.get_frag(fragment_path) + fragment_data_size = fragment.get_data_size() + + if fragment_data_size == 0: + self._num_empty += 1 + if self._verbosity >= 1: + print( + self._FAIL_TEXT_COLOR + + self._BOLD_TEXT + + "WARNING: Empty fragment. Returning empty array." + + self._END_TEXT_COLOR + ) + print("="*60) + return np.array([], dtype=self.ta_dt) + + ta_idx = 0 # Debugging output. + byte_idx = 0 # Variable TA sizing, must do while loop. + while byte_idx < fragment_data_size: + if self._verbosity >= 2: + print(f"INFO: Fragment Index: {ta_idx}.") + ta_idx += 1 + print(f"INFO: Byte Index / Frag Size: {byte_idx} / {fragment_data_size}") + + # Read TA data + ta_datum = trgdataformats.TriggerActivity(fragment.get_data(byte_idx)) + np_ta_datum = np.array([( + ta_datum.data.adc_integral, + ta_datum.data.adc_peak, + np.uint8(ta_datum.data.algorithm), + ta_datum.data.channel_end, + ta_datum.data.channel_peak, + ta_datum.data.channel_start, + np.uint16(ta_datum.data.detid), + ta_datum.n_inputs(), + ta_datum.data.time_activity, + ta_datum.data.time_end, + ta_datum.data.time_peak, + ta_datum.data.time_start, + np.uint8(ta_datum.data.type), + np.uint16(ta_datum.data.version))], + dtype=self.ta_dt) + + self.ta_data = np.hstack((self.ta_data, np_ta_datum)) + + byte_idx += ta_datum.sizeof() + if self._verbosity >= 2: + print(f"INFO: Upcoming byte index: {byte_idx}") + + # Process TP data + np_tp_data = np.zeros(np_ta_datum['num_tps'], dtype=self.tp_dt) + for tp_idx, tp in enumerate(ta_datum): + np_tp_data[tp_idx] = np.array([( + tp.adc_integral, + tp.adc_peak, + tp.algorithm, + tp.channel, + tp.detid, + tp.flag, + tp.time_over_threshold, + tp.time_peak, + tp.time_start, + tp.type, + tp.version)], + dtype=self.tp_dt) + self.tp_data.append(np_tp_data) # Jagged array + + if self._verbosity >= 2: + print("INFO: Finished reading.") + print("="*60) + return np_ta_datum + + def clear_data(self) -> None: + self.ta_data = np.array([], dtype=self.ta_dt) + self.tp_data = [] diff --git a/python/trgtools/TCData.py b/python/trgtools/TCData.py deleted file mode 100644 index 4e9bb76..0000000 --- a/python/trgtools/TCData.py +++ /dev/null @@ -1,196 +0,0 @@ -""" -TriggerCandidate reader class to read and store data -from an HDF5 file. -""" -import daqdataformats # Not directly used, but necessary to interpret formats. -from hdf5libs import HDF5RawDataFile -import trgdataformats - -import numpy as np - -import os - - -class TCData(): - """ - Class that reads a given HDF5 TPStream file and stores - the TC fragments within - - Loading fragments populates obj.tc_data and obj.ta_data. - Numpy dtypes of ta_data and tp_data are available as - obj.ta_dt and obj.tp_dt. - - Gives warnings and information when trying to load - empty fragments. Can be quieted with quiet=True on - init. - """ - - # TC data type - tc_dt = np.dtype([ - ('algorithm', np.uint8), - ('detid', np.uint16), - ('num_tas', np.uint64), # Greedy - ('time_candidate', np.uint64), - ('time_end', np.uint64), - ('time_start', np.uint64), - ('type', np.uint8), - ('version', np.uint16), - ]) - - # TA data type - ta_dt = np.dtype([ - ('adc_integral', np.uint64), - ('adc_peak', np.uint64), - ('algorithm', np.uint8), - ('channel_end', np.int32), - ('channel_peak', np.int32), - ('channel_start', np.int32), - ('detid', np.uint16), - ('time_activity', np.uint64), - ('time_end', np.uint64), - ('time_peak', np.uint64), - ('time_start', np.uint64), - ('type', np.uint8), - ('version', np.uint16) - ]) - - tc_data = np.array([], dtype=tc_dt) # Will concatenate new TCs - ta_data = [] # ta_data[i] will be a np.ndarray of TAs from the i-th TC - _tc_size = trgdataformats.TriggerCandidateOverlay().sizeof() - _num_empty = 0 - - def __init__(self, filename: str, quiet: bool = False) -> None: - """ - Loads a given HDF5 file. - - Parameters: - filename (str): HDF5 file to open. - quiet (bool): Quiets outputs if true. - """ - self._h5_file = HDF5RawDataFile(os.path.expanduser(filename)) - self._set_tc_frag_paths(self._h5_file.get_all_fragment_dataset_paths()) - self._quiet = quiet - self.run_id = self._h5_file.get_int_attribute('run_number') - self.file_index = self._h5_file.get_int_attribute('file_index') - - return None - - # TODO STYLE: Conforming to TAData and TPData, but this should be 'filter' not 'set'. - def _set_tc_frag_paths(self, fragment_paths: list[str]) -> None: - """ - Filter fragment paths for TriggerCandidates. - - Parameters: - fragment_paths (list[str]): List of fragment paths that can be loaded from. - Returns: - Nothing. Creates a new instance member self._fragment_paths list[str]. - """ - self._fragment_paths = [] - for fragment_path in fragment_paths: - if 'Trigger_Candidate' in fragment_path: - self._fragment_paths.append(fragment_path) - - return None - - def get_tc_frag_paths(self) -> list[str]: - """ Returns the list of fragment paths. """ - return self._fragment_paths - - # TODO STYLE: Conforming to TAData and TPData, but this should be 'read_fragment'. - def load_frag(self, fragment_path: str) -> None: - """ - Read from the given fragment path and store. - - Parameters: - fragment_path (str): Fragment path to load from the initialized HDF5. - Returns: - Nothing. Stores the result in instance members :tc_data: and :ta_data:. - """ - frag = self._h5_file.get_frag(fragment_path) - frag_data_size = frag.get_data_size() - - if frag_data_size == 0: # Empty fragment - self._num_empty += 1 - if not self._quiet: - print( - self._FAIL_TEXT_COLOR - + self._BOLD_TEXT - + "WARNING: Empty fragment." - + self._END_TEXT_COLOR - ) - print( - self._WARNING_TEXT_COLOR - + self._BOLD_TEXT - + f"INFO: Fragment Path: {fragment_path}" - + self._END_TEXT_COLOR - ) - - tc_idx = 0 # Debugging output. - byte_idx = 0 # Variable TC sizing, must do a while loop. - while byte_idx < frag_data_size: - if not self._quiet: - print(f"INFO: Fragment Index: {tc_idx}.") - tc_idx += 1 - print(f"INFO: Byte Index / Frag Size: {byte_idx} / {frag_data_size}") - - # Process TC data - tc_datum = trgdataformats.TriggerCandidate(frag.get_data_bytes(byte_idx)) - np_tc_datum = np.array([( - tc_datum.data.algorithm, - tc_datum.data.detid, - tc_datum.n_inputs(), - tc_datum.data.time_candidate, - tc_datum.data.time_end, - tc_datum.data.time_start, - tc_datum.data.type, - tc_datum.data.version)], - dtype=self.tc_dt - ) - self.tc_data = np.hstack((self.tc_data, np_tc_datum)) - - byte_idx += tc_datum.sizeof() - if not self._quiet: - print(f"Upcoming byte index: {byte_idx}.") - - # Process TA data - np_ta_data = np.zeros(np_tc_datum['num_tas'], dtype=self.ta_dt) - for ta_idx, ta in enumerate(tc_datum): - np_ta_data[ta_idx] = np.array( - [( - ta.adc_integral, - ta.adc_peak, - np.uint8(ta.algorithm), - ta.channel_end, - ta.channel_peak, - ta.channel_start, - np.uint16(ta.detid), - ta.time_activity, - ta.time_end, - ta.time_peak, - ta.time_start, - np.uint8(ta.type), - ta.version - )], - dtype=self.ta_dt - ) - self.ta_data.append(np_ta_data) # Jagged array - - return None - - # TODO STYLE: Conforming to TAData and TPData, but this should be 'read_all_fragments'. - def load_all_frags(self) -> None: - """ - Read from all fragments in :self._fragment_paths:. - - Returns: - Nothing. Stores the result in instance members :tc_data: and :ta_data:. - """ - for frag_path in self._fragment_paths: - self.load_frag(frag_path) - if self._num_empty != 0 and not self._quiet: - print( - self._FAIL_TEXT_COLOR - + self._BOLD_TEXT - + f"WARNING: Skipped {self._num_empty} frags." - + self._END_TEXT_COLOR - ) diff --git a/python/trgtools/TCReader.py b/python/trgtools/TCReader.py new file mode 100644 index 0000000..e2dbd12 --- /dev/null +++ b/python/trgtools/TCReader.py @@ -0,0 +1,160 @@ +""" +Reader class for TC data. +""" +from .HDF5Reader import HDF5Reader + +import daqdataformats # noqa: F401 : Not used, but needed to recognize formats. +import trgdataformats + +import numpy as np + + +class TCReader(HDF5Reader): + """ + Class that reads a given HDF5 data file and can + process the TC fragments within. + + Loading fragments appends to :self.tc_data: and :self.ta_data:. + NumPy dtypes of :self.tc_data: and :self.ta_data: are available + as :TCReader.tc_dt: and :TCReader.ta_dt:. + + TC reading can print information that is relevant about the + loading process by specifying the verbose level. 0 for errors + only. 1 for warnings. 2 for all information. + """ + # TC data type + tc_dt = np.dtype([ + ('algorithm', np.uint8), + ('detid', np.uint16), + ('num_tas', np.uint64), # Greedy + ('time_candidate', np.uint64), + ('time_end', np.uint64), + ('time_start', np.uint64), + ('type', np.uint8), + ('version', np.uint16), + ]) + tc_data = np.array([], dtype=tc_dt) # Will concatenate new TCs + + # TA data type + ta_dt = np.dtype([ + ('adc_integral', np.uint64), + ('adc_peak', np.uint64), + ('algorithm', np.uint8), + ('channel_end', np.int32), + ('channel_peak', np.int32), + ('channel_start', np.int32), + ('detid', np.uint16), + ('time_activity', np.uint64), + ('time_end', np.uint64), + ('time_peak', np.uint64), + ('time_start', np.uint64), + ('type', np.uint8), + ('version', np.uint16) + ]) + ta_data = [] # ta_data[i] will be a np.ndarray of TAs from the i-th TC + + def __init__(self, filename: str, verbosity: int = 0) -> None: + """ + Loads a given HDF5 file. + + Parameters: + filename (str): HDF5 file to open. + verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All. + + Returns nothing. + """ + super().__init__(filename, verbosity) + return None + + def _filter_fragment_paths(self) -> None: + """ Filter the fragment paths for TCs. """ + fragment_paths = [] + + # TC fragment paths contain their name in the path. + for path in self._fragment_paths: + if "Trigger_Candidate" in path: + fragment_paths.append(path) + + self._fragment_paths = fragment_paths + return None + + def read_fragment(self, fragment_path: str) -> np.ndarray: + """ + Read from the given data fragment path. + + Returnss a np.ndarray of the TCs that were read and appends to :self.tc_data:. + """ + if self._verbosity >= 2: + print("="*60) + print(f"INFO: Reading from the path\n{fragment_path}") + + fragment = self._h5_file.get_frag(fragment_path) + fragment_data_size = fragment.get_data_size() + + if fragment_data_size == 0: # Empty fragment + self._num_empty += 1 + if self._verbosity >= 1: + print( + self._FAIL_TEXT_COLOR + + self._BOLD_TEXT + + "WARNING: Empty fragment." + + self._END_TEXT_COLOR + ) + print("="*60) + return np.array([], dtype=self.tc_dt) + + tc_idx = 0 # Debugging output. + byte_idx = 0 # Variable TC sizing, must do a while loop. + while byte_idx < fragment_data_size: + if self._verbosity >= 2: + print(f"INFO: Fragment Index: {tc_idx}.") + tc_idx += 1 + print(f"INFO: Byte Index / Frag Size: {byte_idx} / {fragment_data_size}") + + # Process TC data + tc_datum = trgdataformats.TriggerCandidate(fragment.get_data_bytes(byte_idx)) + np_tc_datum = np.array([( + tc_datum.data.algorithm, + tc_datum.data.detid, + tc_datum.n_inputs(), + tc_datum.data.time_candidate, + tc_datum.data.time_end, + tc_datum.data.time_start, + tc_datum.data.type, + tc_datum.data.version)], + dtype=self.tc_dt) + + self.tc_data = np.hstack((self.tc_data, np_tc_datum)) + + byte_idx += tc_datum.sizeof() + if self._verbosity >= 2: + print(f"INFO: Upcoming byte index: {byte_idx}.") + + # Process TA data + np_ta_data = np.zeros(np_tc_datum['num_tas'], dtype=self.ta_dt) + for ta_idx, ta in enumerate(tc_datum): + np_ta_data[ta_idx] = np.array([( + ta.adc_integral, + ta.adc_peak, + np.uint8(ta.algorithm), + ta.channel_end, + ta.channel_peak, + ta.channel_start, + np.uint16(ta.detid), + ta.time_activity, + ta.time_end, + ta.time_peak, + ta.time_start, + np.uint8(ta.type), + ta.version)], + dtype=self.ta_dt) + self.ta_data.append(np_ta_data) # Jagged array + + if self._verbosity >= 2: + print("INFO: Finished reading.") + print("="*60) + return np_tc_datum + + def clear_data(self) -> None: + self.tc_data = np.array([], dtype=self.tc_dt) + ta_data = [] diff --git a/python/trgtools/TPData.py b/python/trgtools/TPData.py deleted file mode 100644 index 8f71cd3..0000000 --- a/python/trgtools/TPData.py +++ /dev/null @@ -1,108 +0,0 @@ -""" -Class containing TP data and the ability to process data. -""" - -import os - -import numpy as np - -from hdf5libs import HDF5RawDataFile -import daqdataformats -import trgdataformats - -class TPData: - """ - Class that loads a given TPStream file and can - process the TP fragments within. - - Loading fragments populates obj.tp_data. The - Numpy dtype is available on obj.tp_dt. - """ - ## Useful print colors - _FAIL_TEXT_COLOR = '\033[91m' - _WARNING_TEXT_COLOR = '\033[93m' - _BOLD_TEXT = '\033[1m' - _END_TEXT_COLOR = '\033[0m' - - ## TP data type - tp_dt = np.dtype([ - ('adc_integral', np.uint32), - ('adc_peak', np.uint32), - ('algorithm', np.uint8), - ('channel', np.int32), - ('detid', np.uint16), - ('flag', np.uint16), - ('time_over_threshold', np.uint64), - ('time_peak', np.uint64), - ('time_start', np.uint64), - ('type', np.uint8), - ('version', np.uint16) - ]) - - def __init__(self, filename, quiet=False): - """ - Loads the given HDF5 file and inits member data. - """ - self._h5_file = HDF5RawDataFile(filename) - self._set_tp_frag_paths(self._h5_file.get_all_fragment_dataset_paths()) - self.tp_data = np.array([], dtype=self.tp_dt) # Will concatenate TPs - self._quiet = quiet - - # File identification attributes - self.run_id = self._h5_file.get_int_attribute('run_number') - self.file_index = self._h5_file.get_int_attribute('file_index') - - def _set_tp_frag_paths(self, frag_paths) -> None: - """ - Only collect the fragment paths that are for TAs. - """ - self._frag_paths = [] - for frag_path in frag_paths: - if 'Trigger_Primitive' in frag_path: - self._frag_paths.append(frag_path) - - def get_tp_frag_paths(self) -> list: - return self._frag_paths - - def load_frag(self, frag_path) -> np.ndarray: - """ - Load a fragment from a given fragment path - and append to self.tp_data. - - Returns an np.ndarray of dtype self.tp_dt. - """ - frag = self._h5_file.get_frag(frag_path) - - data_size = frag.get_data_size() - tp_size = trgdataformats.TriggerPrimitive.sizeof() - num_tps = data_size // tp_size - if (not self._quiet): - print(f"Loaded frag with {num_tps} TPs.") - - np_tp_data = np.zeros((num_tps,), dtype=self.tp_dt) - for idx, byte_idx in enumerate(range(0, data_size, tp_size)): # Increment by TP size - tp_datum = trgdataformats.TriggerPrimitive(frag.get_data(byte_idx)) - - np_tp_data[idx] = np.array([( - tp_datum.adc_integral, - tp_datum.adc_peak, - tp_datum.algorithm, - tp_datum.channel, - tp_datum.detid, - tp_datum.flag, - tp_datum.time_over_threshold, - tp_datum.time_peak, - tp_datum.time_start, - tp_datum.type, - tp_datum.version)], - dtype=self.tp_dt) - self.tp_data = np.hstack((self.tp_data, np_tp_data)) - - return np_tp_data - - def load_all_frags(self) -> None: - """ - Load all of the fragments to the current object. - """ - for idx, frag_path in enumerate(self._frag_paths): - _ = self.load_frag(frag_path) # Only care about the append to self.tp_data diff --git a/python/trgtools/TPReader.py b/python/trgtools/TPReader.py new file mode 100644 index 0000000..b445d66 --- /dev/null +++ b/python/trgtools/TPReader.py @@ -0,0 +1,122 @@ +""" +Reader class for TP data. +""" +from .HDF5Reader import HDF5Reader + +import daqdataformats # noqa: F401 : Not used, but needed to recognize formats. +import trgdataformats + +import numpy as np + + +class TPReader(HDF5Reader): + """ + Class that reads a given HDF5 data file and can + process the TP fragments within. + + Loading fragments appends to :self.tp_data:. The + NumPy dtypes of :self.tp_data: is available as + :TPReader.tp_dt:. + + TP reading can print information that is relevant about the + loading process by specifying the verbose level. 0 for errors + only. 1 for warnings. 2 for all information. + """ + # TP data type + tp_dt = np.dtype([ + ('adc_integral', np.uint32), + ('adc_peak', np.uint32), + ('algorithm', np.uint8), + ('channel', np.int32), + ('detid', np.uint16), + ('flag', np.uint16), + ('time_over_threshold', np.uint64), + ('time_peak', np.uint64), + ('time_start', np.uint64), + ('type', np.uint8), + ('version', np.uint16) + ]) + tp_data = np.array([], dtype=tp_dt) + + def __init__(self, filename: str, verbosity: int = 0) -> None: + """ + Loads a given HDF5 file. + + Parameters: + filename (str): HDF5 file to open. + verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All. + + Returns nothing. + """ + super().__init__(filename, verbosity) + return None + + def _filter_fragment_paths(self) -> None: + """ Filter the fragment paths for TAs. """ + fragment_paths = [] + + # TA fragment paths contain their name in the path. + for path in self._fragment_paths: + if "Trigger_Primitive" in path: + fragment_paths.append(path) + + self._fragment_paths = fragment_paths + return None + + def read_fragment(self, fragment_path: str) -> np.ndarray: + """ + Read from the given data fragment path. + + Returns a np.ndarray of the TPs that were read and appends to + :self.tp_data:. + """ + if self._verbosity >= 2: + print("="*60) + print(f"INFO: Reading from the path\n{fragment_path}") + + fragment = self._h5_file.get_frag(fragment_path) + fragment_data_size = fragment.get_data_size() + + if fragment_data_size == 0: + self._num_empty += 1 + if self._verbosity >= 1: + print( + self._FAIL_TEXT_COLOR + + self._BOLD_TEXT + + "WARNING: Empty fragment. Returning empty array." + + self._END_TEXT_COLOR + ) + print("="*60) + return np.array([], dtype=self.tp_dt) + + tp_size = trgdataformats.TriggerPrimitive.sizeof() + num_tps = fragment_data_size // tp_size + if self._verbosity >= 2: + print(f"INFO: Loaded fragment with {num_tps} TPs.") + + np_tp_data = np.zeros((num_tps,), dtype=self.tp_dt) + for idx, byte_idx in enumerate(range(0, fragment_data_size, tp_size)): # Increment by TP size + tp_datum = trgdataformats.TriggerPrimitive(fragment.get_data(byte_idx)) + + np_tp_data[idx] = np.array([( + tp_datum.adc_integral, + tp_datum.adc_peak, + tp_datum.algorithm, + tp_datum.channel, + tp_datum.detid, + tp_datum.flag, + tp_datum.time_over_threshold, + tp_datum.time_peak, + tp_datum.time_start, + tp_datum.type, + tp_datum.version)], + dtype=self.tp_dt) + self.tp_data = np.hstack((self.tp_data, np_tp_data)) + + if self._verbosity >= 2: + print("INFO: Finished reading.") + print("="*60) + return np_tp_data + + def clear_data(self) -> None: + self.tp_data = np.array([], dtype=self.tp_dt) diff --git a/python/trgtools/__init__.py b/python/trgtools/__init__.py index 263a1ad..b8d1489 100644 --- a/python/trgtools/__init__.py +++ b/python/trgtools/__init__.py @@ -1,3 +1,3 @@ -from .TAData import TAData -from .TPData import TPData -from .TCData import TCData +from .TAReader import TAReader +from .TCReader import TCReader +from .TPReader import TPReader diff --git a/python/trgtools/plot/PDFPlotter.py b/python/trgtools/plot/PDFPlotter.py new file mode 100644 index 0000000..c351124 --- /dev/null +++ b/python/trgtools/plot/PDFPlotter.py @@ -0,0 +1,192 @@ +""" +Plotter with common plots to put on a single PDF. +""" +import matplotlib.pyplot as plt +from matplotlib.backends.backend_pdf import PdfPages +import numpy as np + + +class PDFPlotter: + """ + Plotter with common plots to put on a single PDF. + + Using this class requires the matplotlib module. + It creates a PdfPages object for shared plot saving + and has a few common plots that can be used. + + The common plots take a dictionary of plot identifiers, + such as title, xlabel, and ylabel, and a sub-dictionary + that is used for modifying the style of the plots. + + Plotting methods require a 'title', 'xlabel', and 'ylabel' + is within the passed plotting style dictionary. An error is + raised if any of the three are missing. + """ + + _DEFAULT_FIG_SIZE = (6, 4) + + _DEFAULT_HIST_STYLE = dict( + linear=True, + linear_style=dict(color='#63ACBE', alpha=0.6, label='Linear'), + log=True, + log_style=dict(color='#EE442F', alpha=0.6, label='Log'), + bins=100 + ) + + _DEFAULT_ERRORBAR_STYLE = dict( + capsize=4, + color='k', + ecolor='r', + marker='.', + markersize=0.01 + ) + + def __init__(self, save_name: str) -> None: + """ + Inits the PdfPages with the given :save_name:. + + Parameters: + save_name (str): Initializes the PdfPages object to save to with the given name. + + Returns: + Nothing. + """ + if save_name.endswith('.pdf'): + self._pdf = PdfPages(save_name) + else: + self._pdf = PdfPages(f"{save_name}.pdf") + + return None + + def _check_title_and_labels(self, plot_details_dict) -> None: + """ + Check that the given :plot_details_dict: contains a title + and labels. + + Parameter: + plot_details_dict (dict): Dictionary containing details on plotting styles. + + Raises an error if missing. + """ + if 'title' not in plot_details_dict: + raise KeyError("Missing 'title' in plot_details_dict!") + if 'xlabel' not in plot_details_dict: + raise KeyError("Missing 'xlabel' in plot_details_dict!") + if 'ylabel' not in plot_details_dict: + raise KeyError("Missing 'ylabel' in plot_details_dict!") + + return None + + def get_pdf(self) -> PdfPages: + """ + Returns the PdfPages object. + + This is useful for appending plots that are outside of this class. + """ + return self._pdf + + def __del__(self): + """ Must close the PdfPages object before del. """ + self._pdf.close() + return None + + def plot_histogram( + self, + data: np.ndarray, + plot_details_dict: dict) -> None: + """ + Plot a histogram onto the PdfPages object. + + Parameters: + data (np.ndarray): Array to plot a histogram of. + plot_details_dict (dict): Dictionary with keys such as 'title', 'xlabel', etc. + + Returns: + Nothing. Mutates :self._pdf: with the new plot. + """ + self._check_title_and_labels(plot_details_dict) + + plt.figure(figsize=plot_details_dict.get('figsize', self._DEFAULT_FIG_SIZE)) + ax = plt.gca() + + # Custom xticks are for specific typing. Expect to see much + # smaller plots, so only do linear and use less bins. + if 'xticks' in plot_details_dict: + plt.xticks(**plot_details_dict['xticks']) + + hist_style = plot_details_dict.get('hist_style', self._DEFAULT_HIST_STYLE) + bins = plot_details_dict.get('bins', self._DEFAULT_HIST_STYLE['bins']) + + if plot_details_dict.get('linear', True) and plot_details_dict.get('log', True): + ax.hist(data, bins=bins, **plot_details_dict.get('linear_style', self._DEFAULT_HIST_STYLE['linear_style'])) + ax.set_yscale('linear') + + ax2 = ax.twinx() + ax2.hist(data, bins=bins, **plot_details_dict.get('log_style', self._DEFAULT_HIST_STYLE['log_style'])) + ax2.set_yscale('log') + + # Setting the plot order + ax.set_zorder(2) + ax.patch.set_visible(False) + ax2.set_zorder(1) + + handles, labels = ax.get_legend_handles_labels() + handles2, labels2 = ax2.get_legend_handles_labels() + handles = handles + handles2 + labels = labels + labels2 + plt.legend(handles=handles, labels=labels) + else: + hist_style = plot_details_dict.get('linear_style', self._DEFAULT_HIST_STYLE['linear_style']) + if plot_details_dict.get('log', self._DEFAULT_HIST_STYLE['log']): + hist_style = plot_details_dict.get('log_style', self._DEFAULT_HIST_STYLE['log_style']) + plt.hist(data, bins=bins, **hist_style) + if plot_details_dict.get('log', False): # Default to linear, so only change on log + plt.yscale('log') + + plt.title(plot_details_dict['title']) + ax.set_xlabel(plot_details_dict['xlabel']) + if 'xlim' in plot_details_dict: + plt.xlim(plot_details_dict['xlim']) + + plt.tight_layout() + self._pdf.savefig() + plt.close() + + return None + + def plot_errorbar( + self, + x_data: np.ndarray, + y_data: np.ndarray, + plot_details_dict: dict) -> None: + """ + Plot a scatter plot for the given x and y data to the PdfPages object. + + Parameters: + x_data (np.ndarray): Array to use as x values. + y_data (np.ndarray): Array to use as y values. + plot_details_dict (dict): Dictionary with keys such as 'title', 'xlabel', etc. + + Returns: + Nothing. Mutates :self._pdf: with the new plot. + + Error bars are handled by :plot_details_dict: since they are a style + choice. + """ + self._check_title_and_labels(plot_details_dict) + + plt.figure(figsize=plot_details_dict.get('figsize', self._DEFAULT_FIG_SIZE)) + + errorbar_style = plot_details_dict.get('errorbar_style', self._DEFAULT_ERRORBAR_STYLE) + plt.errorbar(x_data, y_data, **errorbar_style) + + plt.title(plot_details_dict['title']) + plt.xlabel(plot_details_dict['xlabel']) + plt.ylabel(plot_details_dict['ylabel']) + + plt.legend() + + plt.tight_layout() + self._pdf.savefig() + plt.close() + return None diff --git a/python/trgtools/plot/__init__.py b/python/trgtools/plot/__init__.py new file mode 100644 index 0000000..11194ca --- /dev/null +++ b/python/trgtools/plot/__init__.py @@ -0,0 +1 @@ +from .PDFPlotter import PDFPlotter diff --git a/requirements.txt b/requirements.txt new file mode 100644 index 0000000..74fa65e --- /dev/null +++ b/requirements.txt @@ -0,0 +1,3 @@ +matplotlib +numpy +scipy diff --git a/scripts/ta_dump.py b/scripts/ta_dump.py index edaa831..c03d8ae 100755 --- a/scripts/ta_dump.py +++ b/scripts/ta_dump.py @@ -3,55 +3,97 @@ Display diagnostic information for TAs for a given tpstream file. """ - -import os -import sys -import argparse +import trgtools +from trgtools.plot import PDFPlotter import numpy as np import matplotlib.pyplot as plt from matplotlib.backends.backend_pdf import PdfPages from scipy import stats -import trgtools +import os +import argparse + + +TICK_TO_SEC_SCALE = 16e-9 # s per tick -TICK_TO_SEC_SCALE = 16e-9 # s per tick -def time_start_plot(start_times, seconds=False): +def find_save_name(run_id: int, file_index: int, overwrite: bool) -> str: """ - Plot TA start times vs TA index. + Find a new save name or overwrite an existing one. - Optionally, use ticks or seconds for scaling. + Parameters: + run_id (int): The run number for the read file. + file_index (int): The file index for the run number of the read file. + overwrite (bool): Overwrite the 0th plot directory of the same naming. + + Returns: + (str): Save name to write as. + + This is missing the file extension. It's the job of the save/write command + to append the extension. """ - time_unit = 'Ticks' - if seconds: - start_times = start_times * TICK_TO_SEC_SCALE - time_unit = 's' + # Try to find a new name. + name_iter = 0 + save_name = f"ta_{run_id}-{file_index:04}_figures_{name_iter:04}" - first_time = start_times[0] - total_time = start_times[-1] - start_times[0] - total_tas = start_times.shape[0] - avg_rate = total_tas / total_time + # Outputs will always create a PDF, so use that as the comparison. + while not overwrite and os.path.exists(save_name + ".pdf"): + name_iter += 1 + save_name = f"ta_{run_id}-{file_index:04}_figures_{name_iter:04}" + print(f"Saving outputs to ./{save_name}.*") - plt.figure(figsize=(6,4)) - plt.plot(start_times - first_time, 'k', label=f"Average Rate: {avg_rate:.3} TA/{time_unit}") + return save_name - plt.title(f"TA Start Times: Shifted by {first_time:.4e} {time_unit}") - plt.xlabel("TA Order") - plt.ylabel(f"Start Time ({time_unit})") - plt.legend() +def plot_all_event_displays(tp_data: list[np.ndarray], run_id: int, file_index: int, seconds: bool = False) -> None: + """ + Plot all event displays. - plt.tight_layout() - plt.savefig("ta_start_times.svg") - plt.close() + Parameters: + tp_data (list[np.ndarray]): List of TPs for each TA. + run_id (int): Run number. + file_index (int): File index of this run. + seconds (bool): If True, plot using seconds as units. + + Saves event displays to a single PDF. + """ + time_unit = 's' if seconds else 'Ticks' + + with PdfPages(f"event_displays_{run_id}.{file_index:04}.pdf") as pdf: + for tadx, ta in enumerate(tp_data): + if seconds: + ta = ta * TICK_TO_SEC_SCALE + plt.figure(figsize=(6, 4)) + + plt.scatter(ta['time_peak'], ta['channel'], c='k', s=2) + + # Auto limits were too wide; this narrows it. + max_time = np.max(ta['time_peak']) + min_time = np.min(ta['time_peak']) + time_diff = max_time - min_time + + # Only change the xlim if there is more than one TP. + if time_diff != 0: + plt.xlim((min_time - 0.1*time_diff, max_time + 0.1*time_diff)) + + plt.title(f'Run {run_id}.{file_index:04} Event Display: {tadx:03}') + plt.xlabel(f"Peak Time ({time_unit})") + plt.ylabel("Channel") + + plt.tight_layout() + pdf.savefig() + plt.close() + + return None def plot_pdf_time_delta_histograms( ta_data: np.ndarray, tp_data: list[np.ndarray], pdf: PdfPages, - time_label: str) -> None: + time_label: str, + logarithm: bool) -> None: """ Plot the different time delta histograms to a PdfPages. @@ -60,6 +102,7 @@ def plot_pdf_time_delta_histograms( tp_data (list[np.ndarray]): List of TPs per TA. tp_data[i] holds TP data for the i-th TA. pdf (PdfPages): PdfPages object to append plot to. time_label (str): Time label to plot with (ticks vs seconds). + logarithm (bool): Use logarithmic scaling if true. Returns: Nothing. Mutates :pdf: with the new plot. @@ -100,180 +143,128 @@ def plot_pdf_time_delta_histograms( alpha=0.6 ) + if logarithm: + plt.yscale('log') + plt.title("Time Difference Histograms") plt.xlabel(time_label) plt.legend(framealpha=0.4) plt.tight_layout() pdf.savefig() - plt.close() return None -def write_summary_stats(data, filename, title): +def write_summary_stats(data: np.ndarray, filename: str, title: str) -> None: """ - Writes the given summary statistics to 'filename'. + Writes the given summary statistics to :filename:. + + Parameters: + data (np.ndarray): Array of a TA data member. + filename (str): File to append outputs to. + title (str): Title of the TA data member. + + Appends statistics to the given file. """ # Algorithm, Det ID, etc. are not expected to vary. # Check first that they don't vary, and move on if so. if np.all(data == data[0]): print(f"{title} data member is the same for all TAs. Skipping summary statistics.") - return + return None summary = stats.describe(data) std = np.sqrt(summary.variance) with open(filename, 'a') as out: out.write(f"{title}\n") - out.write(f"Reference Statistics:\n" \ - f"\tTotal # TPs = {summary.nobs},\n" \ - f"\tMean = {summary.mean:.2f},\n" \ - f"\tStd = {std:.2f},\n" \ - f"\tMin = {summary.minmax[0]},\n" \ + out.write(f"Reference Statistics:\n" + f"\tTotal # TAs = {summary.nobs},\n" + f"\tMean = {summary.mean:.2f},\n" + f"\tStd = {std:.2f},\n" + f"\tMin = {summary.minmax[0]},\n" f"\tMax = {summary.minmax[1]}.\n") - std3_count = np.sum(data > summary.mean + 3*std) \ - + np.sum(data < summary.mean - 3*std) - std2_count = np.sum(data > summary.mean + 2*std) \ - + np.sum(data < summary.mean - 2*std) - out.write(f"Anomalies:\n" \ - f"\t# of >3 Sigma TPs = {std3_count},\n" \ - f"\t# of >2 Sigma TPs = {std2_count}.\n") + std3_count = np.sum(data > summary.mean + 3*std) + np.sum(data < summary.mean - 3*std) + std2_count = np.sum(data > summary.mean + 2*std) + np.sum(data < summary.mean - 2*std) + out.write(f"Anomalies:\n" + f"\t# of >3 Sigma TAs = {std3_count},\n" + f"\t# of >2 Sigma TAs = {std2_count}.\n") out.write("\n\n") -def plot_pdf_histogram(data, plot_details_dict, pdf, linear=True, log=True): - """ - Plot a histogram for the given data to a PdfPage object. - """ - plt.figure(figsize=(6,4)) - ax = plt.gca() - bins = 100 - - # Custom xticks are for specific typing. Expect to see much - # smaller plots, so only do linear and use less bins. - if 'xticks' in plot_details_dict: - linear = True - log = False - plt.xticks(**plot_details_dict['xticks']) - if 'bins' in plot_details_dict: - bins = plot_details_dict['bins'] - - if linear and log: - ax.hist(data, bins=bins, color='#63ACBE', label='Linear', alpha=0.6) - ax.set_yscale('linear') - - ax2 = ax.twinx() - ax2.hist(data, bins=bins, color='#EE442F', label='Log', alpha=0.6) - ax2.set_yscale('log') - - # Setting the plot order - ax.set_zorder(2) - ax.patch.set_visible(False) - ax2.set_zorder(1) - - handles, labels = ax.get_legend_handles_labels() - handles2, labels2 = ax2.get_legend_handles_labels() - handles = handles + handles2 - labels = labels + labels2 - plt.legend(handles=handles, labels=labels) - else: - plt.hist(data, bins=bins, color='k') - if log: # Default to linear, so only change on log - plt.yscale('log') - - plt.title(plot_details_dict['title'] + " Histogram") - ax.set_xlabel(plot_details_dict['xlabel']) - if 'xlim' in plot_details_dict: - plt.xlim(plot_details_dict['xlim']) - - plt.tight_layout() - pdf.savefig() - plt.close() - -def all_event_displays(tp_data, run_id, file_index, seconds=False): - """ - Plot all event_displays as pages in a PDF. - - Optionally, use ticks or seconds for scaling. - """ - time_unit = 's' if seconds else 'Ticks' - - with PdfPages(f"event_displays_{run_id}.{file_index:04}.pdf") as pdf: - for tadx, ta in enumerate(tp_data): - if seconds: - ta = ta * TICK_TO_SEC_SCALE - plt.figure(figsize=(6,4)) - - plt.scatter(ta['time_peak'], ta['channel'], c='k', s=2) - - # Auto limits were too wide; this narrows it. - max_time = np.max(ta['time_peak']) - min_time = np.min(ta['time_peak']) - time_diff = max_time - min_time - plt.xlim((min_time - 0.1*time_diff, max_time + 0.1*time_diff)) - - plt.title(f'Run {run_id}.{file_index:04} Event Display: {tadx:03}') - plt.xlabel(f"Peak Time ({time_unit})") - plt.ylabel("Channel") - - plt.tight_layout() - pdf.savefig() - plt.close() - - -def event_display(peak_times, channels, idx, seconds=False): - """ - Plot an individual event display. - - Optionally, use ticks or seconds for scaling. - """ - time_unit = 'Ticks' - if seconds: - peak_times = peak_times * TICK_TO_SEC_SCALE - time_unit = 's' - - plt.figure(figsize=(6,4)) - - plt.scatter(peak_times, channels, c='k', s=2) - max_time = np.max(peak_times) - min_time = np.min(peak_times) - time_diff = max_time - min_time - - plt.xlim((min_time - 0.1*time_diff, max_time + 0.1*time_diff)) - - plt.title(f"Event Display: {idx:03}") - plt.xlabel(f"Peak Time ({time_unit})") - plt.ylabel("Channel") + return None - plt.tight_layout() - plt.savefig(f"./event_display_{idx:03}.svg") - plt.close() def parse(): """ Parses CLI input arguments. """ - parser = argparse.ArgumentParser(description="Display diagnostic information for TAs for a given tpstream file.") - parser.add_argument("filename", help="Absolute path to tpstream file to display.") - parser.add_argument("--quiet", action="store_true", help="Stops the output of printed information. Default: False.") - parser.add_argument("--no-displays", action="store_true", help="Stops the processing of event displays.") - parser.add_argument("--start-frag", type=int, help="Starting fragment index to process from. Takes negative indexing. Default: -10.", default=-10) - parser.add_argument("--end-frag", type=int, help="Fragment index to stop processing (i.e. not inclusive). Takes negative indexing. Default: 0.", default=0) - parser.add_argument("--no-anomaly", action="store_true", help="Pass to not write 'ta_anomaly_summary.txt'. Default: False.") - parser.add_argument("--seconds", action="store_true", help="Pass to use seconds instead of time ticks. Default: False.") - parser.add_argument("--linear", action="store_true", help="Pass to use linear histogram scaling. Default: plots both linear and log.") - parser.add_argument("--log", action="store_true", help="Pass to use logarithmic histogram scaling. Default: plots both linear and log.") - parser.add_argument("--overwrite", action="store_true", help="Overwrite old outputs. Default: False.") + parser = argparse.ArgumentParser( + description="Display diagnostic information for TAs for a given tpstream file." + ) + parser.add_argument( + "filename", + help="Absolute path to tpstream file to display." + ) + parser.add_argument( + "--verbose", '-v', + action="count", + help="Increment the verbose level (errors, warnings, all)." + "Save names and skipped writes are always printed. Default: 0.", + default=0 + ) + parser.add_argument( + "--no-displays", + action="store_true", + help="Stops the processing of event displays." + ) + parser.add_argument( + "--start-frag", + type=int, + help="Starting fragment index to process from. Takes negative indexing. Default: -10.", + default=-10 + ) + parser.add_argument( + "--end-frag", + type=int, + help="Fragment index to stop processing (i.e. not inclusive). Takes negative indexing. Default: N.", + default=0 + ) + parser.add_argument( + "--no-anomaly", + action="store_true", + help="Pass to not write 'ta_anomaly_summary.txt'. Default: False." + ) + parser.add_argument( + "--seconds", + action="store_true", + help="Pass to use seconds instead of time ticks. Default: False." + ) + parser.add_argument( + "--linear", + action="store_true", + help="Pass to use linear histogram scaling. Default: plots both linear and log." + ) + parser.add_argument( + "--log", + action="store_true", + help="Pass to use logarithmic histogram scaling. Default: plots both linear and log." + ) + parser.add_argument( + "--overwrite", + action="store_true", + help="Overwrite old outputs. Default: False." + ) return parser.parse_args() + def main(): """ Drives the processing and plotting. """ - ## Process Arguments & Data + # Process Arguments & Data args = parse() filename = args.filename - quiet = args.quiet + verbosity = args.verbose no_displays = args.no_displays start_frag = args.start_frag end_frag = args.end_frag @@ -289,53 +280,63 @@ def main(): linear = True log = True - data = trgtools.TAData(filename, quiet) + data = trgtools.TAReader(filename, verbosity) # Load all case. if start_frag == 0 and end_frag == -1: - data.load_all_frags() # Has extra debug/warning info - else: # Only load some. - if end_frag != 0: # Python doesn't like [n:0] - frag_paths = data.get_ta_frag_paths()[start_frag:end_frag] + data.read_all_fragments() # Has extra debug/warning info + else: # Only load some. + if end_frag != 0: # Python doesn't like [n:0] + frag_paths = data.get_fragment_paths()[start_frag:end_frag] elif end_frag == 0: - frag_paths = data.get_ta_frag_paths()[start_frag:] + frag_paths = data.get_fragment_paths()[start_frag:] - # Does not count empty frags. for path in frag_paths: - data.load_frag(path) - - # Try to find an empty plotting directory - plot_iter = 0 - plot_dir = f"{data.run_id}-{data.file_index}_figures_{plot_iter:04}" - while not overwrite and os.path.isdir(plot_dir): - plot_iter += 1 - plot_dir = f"{data.run_id}-{data.file_index}_figures_{plot_iter:04}" - print(f"Saving outputs to ./{plot_dir}/") - # If overwriting and it does exist, don't need to make it. - # So take the inverse to mkdir. - if not (overwrite and os.path.isdir(plot_dir)): - os.mkdir(plot_dir) - os.chdir(plot_dir) - - print(f"Number of TAs: {data.ta_data.shape[0]}") # Enforcing output for useful metric - - ## Plotting - # General Data Member Plots + data.read_fragment(path) + + # Find a new save name or overwrite an old one. + save_name = find_save_name(data.run_id, data.file_index, overwrite) + + print(f"Number of TAs: {data.ta_data.shape[0]}") # Enforcing output for useful metric + + # Plotting + + if not no_anomaly: + anomaly_filename = f"{save_name}.txt" + print(f"Writing descriptive statistics to {anomaly_filename}.") + if os.path.isfile(anomaly_filename): + # Prepare a new ta_anomaly_summary.txt + os.remove(anomaly_filename) + time_label = "Time (s)" if seconds else "Time (Ticks)" # Dictionary containing unique title, xlabel, and xticks (only some) - plot_dict = { + plot_hist_dict = { 'adc_integral': { - 'title': "ADC Integral", - 'xlabel': "ADC Integral" + 'title': "ADC Integral Histogram", + 'xlabel': "ADC Integral", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'adc_peak': { - 'title': "ADC Peak", - 'xlabel': "ADC Count" + 'title': "ADC Peak Histogram", + 'xlabel': "ADC Count", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'algorithm': { - 'title': "Algorithm", + 'title': "Algorithm Histogram", 'xlabel': 'Algorithm Type', + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now. + 'linear_style': dict(color='k'), + 'log': False, 'bins': 8, 'xlim': (-0.5, 7.5), 'xticks': { @@ -359,44 +360,93 @@ def main(): # inconsistent between detectors (APA/CRP). # Requires loading channel maps. 'channel_end': { - 'title': "Channel End", - 'xlabel': "Channel Number" + 'title': "Channel End Histogram", + 'xlabel': "Channel Number", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'channel_peak': { - 'title': "Channel Peak", - 'xlabel': "Channel Number" + 'title': "Channel Peak Histogram", + 'xlabel': "Channel Number", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'channel_start': { - 'title': "Channel Start", - 'xlabel': "Channel Number" + 'title': "Channel Start Histogram", + 'xlabel': "Channel Number", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'detid': { - 'title': "Detector ID", - 'xlabel': "Detector IDs" + 'title': "Detector ID Histogram", + 'xlabel': "Detector IDs", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'num_tps': { - 'title': "Number of TPs per TA", - 'xlabel': "Number of TPs" + 'title': "Number of TPs per TA Histogram", + 'xlabel': "Number of TPs", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_activity': { - 'title': "Relative Time Activity", - 'xlabel': time_label + 'title': "Relative Time Activity Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_end': { - 'title': "Relative Time End", - 'xlabel': time_label + 'title': "Relative Time End Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_peak': { - 'title': "Relative Time Peak", - 'xlabel': time_label + 'title': "Relative Time Peak Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_start': { - 'title': "Relative Time Start", - 'xlabel': time_label + 'title': "Relative Time Start Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'type': { - 'title': "Type", + 'title': "Type Histogram", 'xlabel': "Type", + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now. + 'linear_style': dict(color='k'), + 'log': False, 'bins': 3, 'xlim': (-0.5, 2.5), 'xticks': { @@ -407,36 +457,83 @@ def main(): } }, 'version': { - 'title': "Version", - 'xlabel': "Versions" + 'title': "Version Histogram", + 'xlabel': "Versions", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') } } - if not no_anomaly: - anomaly_filename = "ta_anomalies.txt" - if not quiet: - print(f"Writing descriptive statistics to {anomaly_filename}.") - if os.path.isfile(anomaly_filename): - # Prepare a new ta_anomaly_summary.txt - os.remove(anomaly_filename) - with PdfPages("ta_data_member_histograms.pdf") as pdf: - for ta_key in data.ta_data.dtype.names: + pdf_plotter = PDFPlotter(f"{save_name}.pdf") + # Generic Plots + for ta_key in data.ta_data.dtype.names: + if 'time' in ta_key: # Special case. + time = data.ta_data[ta_key] + if seconds: + time = time * TICK_TO_SEC_SCALE + min_time = np.min(time) # Prefer making the relative time change. + pdf_plotter.plot_histogram(time - min_time, plot_hist_dict[ta_key]) if not no_anomaly: - write_summary_stats(data.ta_data[ta_key], anomaly_filename, plot_dict[ta_key]['title']) - if 'time' in ta_key: - time = data.ta_data[ta_key] - if seconds: - time = time * TICK_TO_SEC_SCALE - min_time = np.min(time) # Prefer making the relative time change. - plot_pdf_histogram(time - min_time, plot_dict[ta_key], pdf, linear, log) - continue - plot_pdf_histogram(data.ta_data[ta_key], plot_dict[ta_key], pdf, linear, log) - - # Analysis Plots - plot_pdf_time_delta_histograms(data.ta_data, data.tp_data, pdf, time_label) - - if (not no_displays): - all_event_displays(data.tp_data, data.run_id, data.file_index, seconds) + write_summary_stats(time - min_time, anomaly_filename, ta_key) + continue + + pdf_plotter.plot_histogram(data.ta_data[ta_key], plot_hist_dict[ta_key]) + if not no_anomaly: + write_summary_stats(data.ta_data[ta_key], anomaly_filename, ta_key) + + # Analysis Plots + pdf = pdf_plotter.get_pdf() # Needed for extra plots that are not general. + # ==== Time Delta Comparisons ===== + if linear: + plot_pdf_time_delta_histograms(data.ta_data, data.tp_data, pdf, time_label, False) + if log: + plot_pdf_time_delta_histograms(data.ta_data, data.tp_data, pdf, time_label, True) + # ================================= + + # ==== Time Spans Per TA ==== + time_peak = data.ta_data['time_peak'] + time_end = data.ta_data['time_end'] + time_start = data.ta_data['time_start'] + ta_min_time = np.min((time_peak, time_end, time_start)) + + time_peak -= ta_min_time + time_end -= ta_min_time + time_start -= ta_min_time + + if seconds: + ta_min_time = ta_min_time * TICK_TO_SEC_SCALE + time_peak = time_peak * TICK_TO_SEC_SCALE + time_end = time_end * TICK_TO_SEC_SCALE + time_start = time_start * TICK_TO_SEC_SCALE + + yerr = np.array([time_peak - time_start, time_end - time_peak]).astype(np.int64) + time_unit = "Seconds" if seconds else "Ticks" + time_spans_dict = { + 'title': "TA Relative Time Spans", + 'xlabel': "TA", + 'ylabel': time_label, + 'errorbar_style': { + 'yerr': yerr, + 'capsize': 4, + 'color': 'k', + 'ecolor': 'r', + 'label': f"Avg {time_unit} / TA: {(time_peak[-1] - time_peak[0]) / len(time_peak):.2f}", + 'marker': '.', + 'markersize': 0.01 + } + } + ta_count = np.arange(len(time_peak)) + pdf_plotter.plot_errorbar(ta_count, time_peak, time_spans_dict) + # =========================== + + if not no_displays: + plot_all_event_displays(data.tp_data, data.run_id, data.file_index, seconds) + + return None + if __name__ == "__main__": main() diff --git a/scripts/tc_dump.py b/scripts/tc_dump.py index c34d530..3fe9a23 100755 --- a/scripts/tc_dump.py +++ b/scripts/tc_dump.py @@ -3,8 +3,8 @@ Display diagnostic information for TCs for a given tpstream file. """ - import trgtools +from trgtools.plot import PDFPlotter import numpy as np import matplotlib.pyplot as plt @@ -18,76 +18,32 @@ TICK_TO_SEC_SCALE = 16e-9 # s per tick -def plot_pdf_histogram( - data: np.ndarray, - plot_details_dict: dict, - pdf: PdfPages, - linear: bool = True, - log: bool = True) -> None: +def find_save_name(run_id: int, file_index: int, overwrite: bool) -> str: """ - Plot a histogram for the given data to a PdfPages object. + Find a new save name or overwrite an existing one. Parameters: - data (np.ndarray): Array to plot a histogram of. - plot_details_dict (dict): Dictionary with keys such as 'title', 'xlabel', etc. - pdf (PdfPages): The PdfPages object that this plot will be appended to. - linear (bool): Option to use linear y-scale. - log (bool): Option to use logarithmic y-scale. + run_id (int): The run number for the read file. + file_index (int): The file index for the run number of the read file. + overwrite (bool): Overwrite the 0th plot directory of the same naming. Returns: - Nothing. Mutates :pdf: with the new plot. - """ - plt.figure(figsize=(6, 4)) - ax = plt.gca() - bins = 100 + (str): Save name to write as. - # Custom xticks are for specific typing. Expect to see much - # smaller plots, so only do linear and use less bins. - if 'xticks' in plot_details_dict: - linear = True - log = False - plt.xticks( - plot_details_dict['xticks']['ticks'], - plot_details_dict['xticks']['labels'], - rotation=plot_details_dict['xticks']['rotation'], - ha=plot_details_dict['xticks']['ha'] - ) - if 'bins' in plot_details_dict: - bins = plot_details_dict['bins'] - - if linear and log: - ax.hist(data, bins=bins, color='#63ACBE', label='Linear', alpha=0.6) - ax.set_yscale('linear') - - ax2 = ax.twinx() - ax2.hist(data, bins=bins, color='#EE442F', label='Log', alpha=0.6) - ax2.set_yscale('log') - - # Setting the plot order - ax.set_zorder(2) - ax.patch.set_visible(False) - ax2.set_zorder(1) - - handles, labels = ax.get_legend_handles_labels() - handles2, labels2 = ax2.get_legend_handles_labels() - handles = handles + handles2 - labels = labels + labels2 - plt.legend(handles=handles, labels=labels) - else: - plt.hist(data, bins=bins, color='k') - if log: # Default to linear, so only change on log - plt.yscale('log') - - plt.title(plot_details_dict['title'] + " Histogram") - ax.set_xlabel(plot_details_dict['xlabel']) - if 'xlim' in plot_details_dict: - plt.xlim(plot_details_dict['xlim']) + This is missing the file extension. It's the job of the save/write command + to append the extension. + """ + # Try to find a new name. + name_iter = 0 + save_name = f"tc_{run_id}-{file_index:04}_figures_{name_iter:04}" - plt.tight_layout() - pdf.savefig() - plt.close() + # Outputs will always create a PDF, so use that as the comparison. + while not overwrite and os.path.exists(save_name + ".pdf"): + name_iter += 1 + save_name = f"tc_{run_id}-{file_index:04}_figures_{name_iter:04}" + print(f"Saving outputs to ./{save_name}.*") - return None + return save_name def plot_pdf_scatter( @@ -126,49 +82,12 @@ def plot_pdf_scatter( return None -def plot_pdf_errorbar( - x_data: np.ndarray, - y_data: np.ndarray, - plot_details_dict: dict, - pdf: PdfPages) -> None: - """ - Plot a scatter plot for the given x and y data to a PdfPages object. - Error bars are handled by :plot_details_dict: since they are a style - choice. - - Parameters: - x_data (np.ndarray): Array to use as x values. - y_data (np.ndarray): Array to use as y values. - plot_details_dict (dict): Dictionary with keys such as 'title', 'xlabel', etc. - pdf (PdfPages): The PdfPages object that this plot will be appended to. - - Returns: - Nothing. Mutates :pdf: with the new plot. - """ - plt.figure(figsize=plot_details_dict.get('figsize', (6, 4))) - - errorbar_style = plot_details_dict.get('errorbar_style', {}) - plt.errorbar(x_data, y_data, **errorbar_style) - - # Asserts that the following need to be in the plotting details. - # Who wants unlabeled plots? - plt.title(plot_details_dict['title']) - plt.xlabel(plot_details_dict['xlabel']) - plt.ylabel(plot_details_dict['ylabel']) - - plt.legend() - - plt.tight_layout() - pdf.savefig() - plt.close() - return None - - def plot_pdf_time_delta_histograms( tc_data: np.ndarray, ta_data: list[np.ndarray], pdf: PdfPages, - time_label: str) -> None: + time_label: str, + logarithm: bool) -> None: """ Plot the different time delta histograms to a PdfPages. @@ -177,6 +96,7 @@ def plot_pdf_time_delta_histograms( ta_data (list[np.ndarray]): List of TAs per TC. ta_data[i] holds TA data for the i-th TC. pdf (PdfPages): PdfPages object to append plot to. time_label (str): Time label to plot with (ticks vs seconds). + logarithm (bool): Use logarithmic scaling if true. Returns: Nothing. Mutates :pdf: with the new plot. @@ -223,13 +143,15 @@ def plot_pdf_time_delta_histograms( alpha=0.6 ) + if logarithm: + plt.yscale('log') + plt.title("Time Difference Histograms") plt.xlabel(time_label) plt.legend(framealpha=0.4) plt.tight_layout() pdf.savefig() - plt.close() return None @@ -239,17 +161,17 @@ def write_summary_stats(data: np.ndarray, filename: str, title: str) -> None: Writes the given summary statistics to 'filename'. Parameters: - data (np.ndarray): Data set to calculate statistics on. - filename (str): File to write to. - title (str): Title of this data set. - Returns: - Nothing. Appends to the given filename. + data (np.ndarray): Array of a TC data member. + filename (str): File to append outputs to. + title (str): Title of the TC data member. + + Appends statistics to the given file. """ # Algorithm, Det ID, etc. are not expected to vary. # Check first that they don't vary, and move on if so. if np.all(data == data[0]): print(f"{title} data member is the same for all TCs. Skipping summary statistics.") - return + return None summary = stats.describe(data) std = np.sqrt(summary.variance) @@ -268,6 +190,8 @@ def write_summary_stats(data: np.ndarray, filename: str, title: str) -> None: f"\t# of >2 Sigma TCs = {std2_count}.\n") out.write("\n\n") + return None + def parse(): """ @@ -281,9 +205,11 @@ def parse(): help="Absolute path to tpstream file to display." ) parser.add_argument( - "--quiet", - action="store_true", - help="Stops the output of printed information. Default: False." + "--verbose", "-v", + action="count", + help="Increment the verbose level (errors, warnings, all)." + "Save names and skipped writes are always printed. Default: 0.", + default=0 ) parser.add_argument( "--start-frag", @@ -294,8 +220,7 @@ def parse(): parser.add_argument( "--end-frag", type=int, - help="Fragment index to stop processing (i.e. not inclusive)." - + "Takes negative indexing. Default: 0.", + help="Fragment index to stop processing (i.e. not inclusive). Takes negative indexing. Default: N.", default=0 ) parser.add_argument( @@ -318,6 +243,11 @@ def parse(): action="store_true", help="Pass to use logarithmic histogram scaling. Default: plots both linear and log." ) + parser.add_argument( + "--overwrite", + action="store_true", + help="Overwrite old outputs. Default: False." + ) return parser.parse_args() @@ -329,11 +259,13 @@ def main(): # Process Arguments & Data args = parse() filename = args.filename - quiet = args.quiet + verbosity = args.verbose start_frag = args.start_frag end_frag = args.end_frag no_anomaly = args.no_anomaly seconds = args.seconds + overwrite = args.overwrite + linear = args.linear log = args.log @@ -342,30 +274,30 @@ def main(): linear = True log = True - data = trgtools.TCData(filename, quiet) - run_id = data.run_id - file_index = data.file_index + data = trgtools.TCReader(filename, verbosity) # Load all case. if start_frag == 0 and end_frag == -1: - data.load_all_frags() # Has extra debug/warning info + data.read_all_fragments() # Has extra debug/warning info else: # Only load some. if end_frag != 0: # Python doesn't like [n:0] - frag_paths = data.get_tc_frag_paths()[start_frag:end_frag] + frag_paths = data.get_fragment_paths()[start_frag:end_frag] elif end_frag == 0: - frag_paths = data.get_tc_frag_paths()[start_frag:] + frag_paths = data.get_fragment_paths()[start_frag:] - # Does not count empty frags. for path in frag_paths: - data.load_frag(path) + data.read_fragment(path) + + # Find a new save name or overwrite an old one. + save_name = find_save_name(data.run_id, data.file_index, overwrite) print(f"Number of TCs: {data.tc_data.shape[0]}") # Enforcing output for useful metric # Plotting - anomaly_filename = f"tc_anomalies_{run_id}-{file_index:04}.txt" if not no_anomaly: - if not quiet: + anomaly_filename = f"{save_name}.txt" + if verbosity >= 2: print(f"Writing descriptive statistics to {anomaly_filename}.") if os.path.isfile(anomaly_filename): # Prepare a new ta_anomaly_summary.txt @@ -374,11 +306,15 @@ def main(): time_label = "Time (s)" if seconds else "Time (Ticks)" # Dictionary containing unique title, xlabel, and xticks (only some) - plot_dict = { + plot_hist_dict = { 'algorithm': { 'bins': np.arange(-0.5, 9.5, 1), 'title': "Algorithm", 'xlabel': 'Algorithm Type', + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now. + 'linear_style': dict(color='k'), + 'log': False, 'xlim': (-1, 9), 'xticks': { 'ticks': range(0, 9), @@ -399,35 +335,69 @@ def main(): }, 'detid': { 'title': "Detector ID", - 'xlabel': "Detector IDs" + 'xlabel': "Detector IDs", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'num_tas': { 'title': "Number of TAs per TC", - 'xlabel': "Number of TAs" + 'xlabel': "Number of TAs", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_candidate': { 'title': "Relative Time Candidate", - 'xlabel': time_label + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_end': { 'title': "Relative Time End", - 'xlabel': time_label + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_peak': { 'title': "Relative Time Peak", - 'xlabel': time_label + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_start': { 'title': "Relative Time Start", - 'xlabel': time_label + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'type': { 'bins': np.arange(-0.5, 10.5, 1), 'title': "Type", 'xlabel': "Type", + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now. + 'linear_style': dict(color='k'), + 'log': False, 'xlim': (-1, 10), 'xticks': { - 'ticks': range(0, 10), # Ticks to change + 'ticks': range(0, 10), 'labels': ( 'Unknown', 'Timing', @@ -446,90 +416,102 @@ def main(): }, 'version': { 'title': "Version", - 'xlabel': "Versions" + 'xlabel': "Versions", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') } } - with PdfPages(f"tc_data_member_histograms_{run_id}-{file_index:04}.pdf") as pdf: - - # Generic plots - for tc_key in data.tc_data.dtype.names: - if 'time' in tc_key: # Special case. - time = data.tc_data[tc_key] - if seconds: - time = time * TICK_TO_SEC_SCALE - min_time = np.min(time) # Prefer making the relative time change. - plot_pdf_histogram(time - min_time, plot_dict[tc_key], pdf, linear, log) - if not no_anomaly: - write_summary_stats(time - min_time, anomaly_filename, - plot_dict[tc_key]['title']) - continue - - plot_pdf_histogram(data.tc_data[tc_key], plot_dict[tc_key], pdf, linear, log) + + pdf_plotter = PDFPlotter(save_name) + + # Generic plots + for tc_key in data.tc_data.dtype.names: + if 'time' in tc_key: # Special case. + time = data.tc_data[tc_key] + if seconds: + time = time * TICK_TO_SEC_SCALE + min_time = np.min(time) # Prefer making the relative time change. + pdf_plotter.plot_histogram(time - min_time, plot_hist_dict[tc_key]) if not no_anomaly: - write_summary_stats(data.tc_data[tc_key], anomaly_filename, - plot_dict[tc_key]['title']) - - # "Analysis" plots - # ==== Time Delta Comparisons ===== - plot_pdf_time_delta_histograms(data.tc_data, data.ta_data, pdf, time_label) - - tc_adc_integrals = np.array([np.sum(tas['adc_integral']) for tas in data.ta_data]) - adc_integrals_dict = { - 'title': "TC ADC Integrals", - 'xlabel': "ADC Integral" - } - plot_pdf_histogram(tc_adc_integrals, adc_integrals_dict, pdf, linear, log) - # ================================= - - # ==== ADC Integral vs Number of TAs ==== - integral_vs_num_tas_dict = { - 'title': "TC ADC Integral vs Number of TAs", - 'xlabel': "Number of TAs", - 'ylabel': "TC ADC Integral", - 'scatter_style': { - 'alpha': 0.6, - 'c': 'k', - 's': 2 - } - } - plot_pdf_scatter(data.tc_data['num_tas'], tc_adc_integrals, integral_vs_num_tas_dict, pdf) - # ======================================= - - # ==== Time Spans Per TC ==== - time_candidate = data.tc_data['time_candidate'] - time_end = data.tc_data['time_end'] - time_start = data.tc_data['time_start'] - tc_min_time = np.min((time_candidate, time_end, time_start)) - - time_candidate -= tc_min_time - time_end -= tc_min_time - time_start -= tc_min_time - - if seconds: - tc_min_time = tc_min_time * TICK_TO_SEC_SCALE - time_candidate = time_candidate * TICK_TO_SEC_SCALE - time_end = time_end * TICK_TO_SEC_SCALE - time_start = time_start * TICK_TO_SEC_SCALE - - yerr = np.array([time_candidate - time_start, time_end - time_candidate]).astype(np.int64) - time_unit = "Seconds" if seconds else "Ticks" - time_spans_dict = { - 'title': "TC Relative Time Spans", - 'xlabel': "TC", - 'ylabel': time_label, - 'errorbar_style': { - 'yerr': yerr, - 'capsize': 4, - 'color': 'k', - 'ecolor': 'r', - 'label': f"Avg {time_unit} / TC: {(time_candidate[-1] - time_candidate[0]) / len(time_candidate):.2f}", - 'marker': '.', - 'markersize': 0.01 - } - } - tc_count = np.arange(len(time_candidate)) - plot_pdf_errorbar(tc_count, time_candidate, time_spans_dict, pdf) - # =========================== + write_summary_stats(time - min_time, anomaly_filename, tc_key) + continue + + pdf_plotter.plot_histogram(data.tc_data[tc_key], plot_hist_dict[tc_key]) + if not no_anomaly: + write_summary_stats(data.tc_data[tc_key], anomaly_filename, tc_key) + + pdf = pdf_plotter.get_pdf() + # Analysis plots + # ==== Time Delta Comparisons ===== + if linear: + plot_pdf_time_delta_histograms(data.tc_data, data.ta_data, pdf, time_label, False) + if log: + plot_pdf_time_delta_histograms(data.tc_data, data.ta_data, pdf, time_label, True) + # ================================= + + # ==== TC ADC Integrals ==== + tc_adc_integrals = np.array([np.sum(tas['adc_integral']) for tas in data.ta_data]) + adc_integrals_dict = { + 'title': "TC ADC Integrals", + 'xlabel': "ADC Integral", + 'ylabel': "Count" + } + pdf_plotter.plot_histogram(tc_adc_integrals, adc_integrals_dict) + # ========================== + + # ==== ADC Integral vs Number of TAs ==== + integral_vs_num_tas_dict = { + 'title': "TC ADC Integral vs Number of TAs", + 'xlabel': "Number of TAs", + 'ylabel': "TC ADC Integral", + 'scatter_style': { + 'alpha': 0.6, + 'c': 'k', + 's': 2 + } + } + plot_pdf_scatter(data.tc_data['num_tas'], tc_adc_integrals, integral_vs_num_tas_dict, pdf) + # ======================================= + + # ==== Time Spans Per TC ==== + time_candidate = data.tc_data['time_candidate'] + time_end = data.tc_data['time_end'] + time_start = data.tc_data['time_start'] + tc_min_time = np.min((time_candidate, time_end, time_start)) + + time_candidate -= tc_min_time + time_end -= tc_min_time + time_start -= tc_min_time + + if seconds: + tc_min_time = tc_min_time * TICK_TO_SEC_SCALE + time_candidate = time_candidate * TICK_TO_SEC_SCALE + time_end = time_end * TICK_TO_SEC_SCALE + time_start = time_start * TICK_TO_SEC_SCALE + + yerr = np.array([time_candidate - time_start, time_end - time_candidate]).astype(np.int64) + time_unit = "Seconds" if seconds else "Ticks" + time_spans_dict = { + 'title': "TC Relative Time Spans", + 'xlabel': "TC", + 'ylabel': time_label, + 'errorbar_style': { + 'yerr': yerr, + 'capsize': 4, + 'color': 'k', + 'ecolor': 'r', + 'label': f"Avg {time_unit} / TC: " + f"{(time_candidate[-1] - time_candidate[0]) / len(time_candidate):.2f}", + 'marker': '.', + 'markersize': 0.01 + } + } + tc_count = np.arange(len(time_candidate)) + pdf_plotter.plot_errorbar(tc_count, time_candidate, time_spans_dict) + # =========================== return None diff --git a/scripts/tp_dump.py b/scripts/tp_dump.py index 2a84c0a..fcf32d4 100755 --- a/scripts/tp_dump.py +++ b/scripts/tp_dump.py @@ -3,80 +3,115 @@ Display diagnostic information for TPs in a given tpstream file. """ - -import os -import sys -import argparse +import trgtools +from trgtools.plot import PDFPlotter import numpy as np import matplotlib.pyplot as plt from matplotlib.backends.backend_pdf import PdfPages from scipy import stats -import trgtools +import os +import argparse -TICK_TO_SEC_SCALE = 16e-9 # secs per tick -def channel_tot(tp_data): - """ - Plot the TP channel vs time over threshold scatter plot. - """ - plt.figure(figsize=(6,4), dpi=200) +TICK_TO_SEC_SCALE = 16e-9 # s per tick - plt.scatter(tp_data['channel'], tp_data['time_over_threshold'], c='k', s=2, label='TP') - plt.title("TP Time Over Threshold vs Channel") - plt.xlabel("Channel") - plt.ylabel("Time Over Threshold (Ticks)") - plt.legend() +def find_save_name(run_id: int, file_index: int, overwrite: bool) -> str: + """ + Find a new save name or overwrite an existing one. - plt.tight_layout() - plt.savefig("tp_channel_vs_tot.png") # Many scatter points makes this a PNG - plt.close() + Parameters: + run_id (int): The run number for the read file. + file_index (int): The file index for the run number of the read file. + overwrite (bool): Overwrite the 0th plot directory of the same naming. -def tp_percent_histogram(tp_data): - """ - Plot the count of TPs that account for 1%, 0.1%, and 0.01% - of the total channel count. + Returns: + (str): Save name to write as. + + This is missing the file extension. It's the job of the save/write command + to append the extension. """ - counts, _ = np.histogram(tp_data['channel'], bins=np.arange(0.5, 3072.5, 1)) + # Try to find a new name. + name_iter = 0 + save_name = f"tp_{run_id}-{file_index:04}_figures_{name_iter:04}" - count_mask = np.ones(counts.shape, dtype=bool) - total_counts = np.sum(counts) + # Outputs will always create a PDF, so use that as the comparison. + while not overwrite and os.path.exists(save_name + ".pdf"): + name_iter += 1 + save_name = f"tp_{run_id}-{file_index:04}_figures_{name_iter:04}" + print(f"Saving outputs to ./{save_name}.*") - percent01 = np.sum(counts > 0.01*total_counts) - count_mask[np.where(counts > 0.01*total_counts)] = False + return save_name - percent001 = np.sum(counts[count_mask] > 0.001*total_counts) - count_mask[np.where(counts[count_mask] > 0.001*total_counts)] = False - percent0001 = np.sum(counts[count_mask] > 0.0001*total_counts) - count_mask[np.where(counts[count_mask] > 0.0001*total_counts)] = False +def plot_pdf_tot_vs_channel(tp_data: np.ndarray, pdf: PdfPages) -> None: + """ + Plot the TP channel vs time over threshold scatter plot. - remaining = np.sum(counts[count_mask]) + Parameter: + tp_data (np.ndarray): Array of TPs. + pdf (PdfPages): The PdfPages object that this plot will be appended to. - plt.figure(figsize=(6,4)) - plt.stairs([percent01, percent001, percent0001, remaining], [0,1,2,3,4], color='k', fill=True) + Returns: + Nothing. Mutates :pdf: with the new plot. - plt.xlim((0,4)) - plt.xticks([0.5, 1.5, 2.5, 3.5], ['>1%', '>0.1%', '>0.01%', "Remainder"]) + Does not have a seconds option. This plot is more informative in the ticks version. + """ + plt.figure(figsize=(6, 4), dpi=200) - plt.title(f"Number of Channels Contributing % To Total Count {total_counts}") + plt.scatter(tp_data['channel'], tp_data['time_over_threshold'], c='k', s=2, label='TP', rasterized=True) + + plt.title("TP Time Over Threshold vs Channel") + plt.xlabel("Channel") + plt.ylabel("Time Over Threshold (Ticks)") + plt.legend() plt.tight_layout() - plt.savefig("percent_total.svg") + pdf.savefig() # Many scatter points makes this a PNG ._. plt.close() + return None + -def plot_adc_integral_vs_peak(tp_data): +def plot_pdf_adc_integral_vs_peak(tp_data: np.ndarray, pdf: PdfPages, verbosity: int = 0) -> None: """ Plot the ADC Integral vs ADC Peak. + + Parameters: + tp_data (np.ndarray): Array of TPs. + pdf (PdfPages): The PdfPages object that this plot will be appended to. + verbosity (int): Verbose level to print information. + + Returns: + Nothing. Mutates :pdf: with the new plot. """ - plt.figure(figsize=(6, 4), dpi=200) - plt.scatter(tp_data['adc_peak'], tp_data['adc_integral'], c='k', s=2, label='TP') - print("Number of ADC Integrals at Signed 16 Limit:", np.sum(tp_data['adc_integral'] == np.power(2, 15)-1)) - print("Total number of TPs:", len(tp_data['adc_peak'])) + if verbosity >= 2: + print( + "Number of ADC Integrals at Signed 16 Limit:", + np.sum(tp_data['adc_integral'] == np.power(2, 15)-1) + ) + print("Total number of TPs:", len(tp_data['adc_peak'])) high_integral_locs = np.where(tp_data['adc_integral'] == np.power(2, 15)-1) - plt.scatter(tp_data['adc_peak'][high_integral_locs], tp_data['adc_integral'][high_integral_locs], c='#63ACBE', s=2, marker='+', label=r'$2^{15}-1$') + + plt.figure(figsize=(6, 4), dpi=200) + + plt.scatter( + tp_data['adc_peak'], + tp_data['adc_integral'], + c='k', + s=2, + label='TP', + rasterized=True + ) + plt.scatter( + tp_data['adc_peak'][high_integral_locs], + tp_data['adc_integral'][high_integral_locs], + c='#63ACBE', + s=2, marker='+', + label=r'$2^{15}-1$', + rasterized=True + ) plt.title("ADC Integral vs ADC Peak") plt.xlabel("ADC Peak") @@ -84,273 +119,256 @@ def plot_adc_integral_vs_peak(tp_data): plt.legend() plt.tight_layout() - plt.savefig("tp_adc_integral_vs_peak.png") # Many scatter plot points makes this a PNG + pdf.savefig() plt.close() + return None -def plot_pdf_histogram(data, plot_details_dict, pdf, linear=True, log=True): - """ - Plot a histogram for the given data to a PdfPage object. - """ - plt.figure(figsize=(6,4)) - ax = plt.gca() - bins = 100 - # Custom xticks are for specific typing. Expect to see much - # smaller plots, so only do linear and use less bins. - if 'xticks' in plot_details_dict: - linear = True - log = False - plt.xticks(**plot_details_dict['xticks']) - if 'bins' in plot_details_dict: - bins = plot_details_dict['bins'] - - if linear and log: - ax.hist(data, bins=bins, color='#63ACBE', label='Linear', alpha=0.6) - ax.set_yscale('linear') - - ax2 = ax.twinx() - ax2.hist(data, bins=bins, color='#EE442F', label='Log', alpha=0.6) - ax2.set_yscale('log') - - # Setting the plot order - ax.set_zorder(2) - ax.patch.set_visible(False) - ax2.set_zorder(1) - - handles, labels = ax.get_legend_handles_labels() - handles2, labels2 = ax2.get_legend_handles_labels() - handles = handles + handles2 - labels = labels + labels2 - plt.legend(handles=handles, labels=labels) - else: - plt.hist(data, bins=bins, color='k') - if log: # Default to linear, so only change on log - plt.yscale('log') - - plt.title(plot_details_dict['title'] + " Histogram") - ax.set_xlabel(plot_details_dict['xlabel']) - if 'xlim' in plot_details_dict: - plt.xlim(plot_details_dict['xlim']) +def write_summary_stats(data: np.ndarray, filename: str, title: str) -> None: + """ + Writes the given summary statistics to :filename:. - plt.tight_layout() - pdf.savefig() - plt.close() + Parameters: + data (np.ndarray): Array of a TP data member. + filename (str): File to append outputs to. + title (str): Title of the TP data member. -def write_summary_stats(data, filename, title): - """ - Writes the given summary statistics to 'filename'. + Appends statistics to the given file. """ # Algorithm, Det ID, etc. are not expected to vary. # Check first that they don't vary, and move on if so. if np.all(data == data[0]): print(f"{title} data member is the same for all TPs. Skipping summary statistics.") - return + return None + summary = stats.describe(data) std = np.sqrt(summary.variance) with open(filename, 'a') as out: out.write(f"{title}\n") - out.write(f"Reference Statistics:\n" \ - f"\tTotal # TPs = {summary.nobs},\n" \ - f"\tMean = {summary.mean:.2f},\n" \ - f"\tStd = {std:.2f},\n" \ - f"\tMin = {summary.minmax[0]},\n" \ + out.write(f"Reference Statistics:\n" + f"\tTotal # TPs = {summary.nobs},\n" + f"\tMean = {summary.mean:.2f},\n" + f"\tStd = {std:.2f},\n" + f"\tMin = {summary.minmax[0]},\n" f"\tMax = {summary.minmax[1]}.\n") - std3_count = np.sum(data > summary.mean + 3*std) \ - + np.sum(data < summary.mean - 3*std) - std2_count = np.sum(data > summary.mean + 2*std) \ - + np.sum(data < summary.mean - 2*std) - out.write(f"Anomalies:\n" \ - f"\t# of >3 Sigma TPs = {std3_count},\n" \ + std3_count = np.sum(data > summary.mean + 3*std) + np.sum(data < summary.mean - 3*std) + std2_count = np.sum(data > summary.mean + 2*std) + np.sum(data < summary.mean - 2*std) + out.write(f"Anomalies:\n" + f"\t# of >3 Sigma TPs = {std3_count},\n" f"\t# of >2 Sigma TPs = {std2_count}.\n") out.write("\n\n") -def plot_summary_stats(tp_data, no_anomaly=False, quiet=False): - """ - Plot summary statistics on the various TP member data. - Displays as box plots on multiple pages of a PDF. - """ - # 'Sanity' titles _should_ all be the same value. - titles = { - 'adc_integral': "ADC Integral Summary", - 'adc_peak': "ADC Peak Summary", - 'algorithm': "Algorithm (Sanity) Summary", - 'channel': "Channel Summary", - 'detid': "Detector ID (Sanity) Summary", - 'flag': "Flag (Sanity) Summary", - 'time_over_threshold': "Time Over Threshold Summary", - 'time_peak': "Time Peak Summary", - 'time_start': "Time Start Summary", - 'type': "Type (Sanity) Summary", - 'version': "Version (Sanity) Summary" - } - labels = { - 'adc_integral': "ADC Integral", - 'adc_peak': "ADC Count", - 'algorithm': "", - 'channel': "Channel Number", - 'detid': "", - 'flag': "", - 'time_over_threshold': "Ticks", - 'time_peak': "Ticks", - 'time_start': "Ticks", - 'type': "", - 'version': "" - } - - anomaly_filename = 'tp_anomaly_summary.txt' - - if not no_anomaly: - if not quiet: - print(f"Writing descriptive statistics to {anomaly_filename}.") - if os.path.isfile(anomaly_filename): - # Prepare a new tp_anomaly_summary.txt - os.remove(anomaly_filename) - - with PdfPages("tp_summary_stats.pdf") as pdf: - for tp_key, title in titles.items(): - plt.figure(figsize=(6,4)) - ax = plt.gca() + return None - # Only plot the 'tp_key' - plt.boxplot(tp_data[tp_key], notch=True, vert=False, sym='+') - plt.yticks([]) - ax.xaxis.grid(True) - plt.xlabel(labels[tp_key]) - - plt.title(title) - - plt.tight_layout() - pdf.savefig() - plt.close() - - # Write anomalies to file. - if not no_anomaly: - if "Sanity" in title and np.all(tp_data[tp_key] == tp_data[tp_key][0]): - # Either passed check or all wrong in the same way. - continue - write_summary_stats(tp_data[tp_key], anomaly_filename, title) def parse(): - parser = argparse.ArgumentParser(description="Display diagnostic information for TAs for a given tpstream file.") - parser.add_argument("filename", help="Absolute path to tpstream file to display.") - parser.add_argument("--quiet", action="store_true", help="Stops the output of printed information. Default: False.") - parser.add_argument("--start-frag", type=int, help="Fragment to start loading from (inclusive); can take negative integers. Default: -10", default=-10) - parser.add_argument("--end-frag", type=int, help="Fragment to stop loading at (exclusive); can take negative integers. Default: 0", default=0) - parser.add_argument("--no-anomaly", action="store_true", help="Pass to not write 'tp_anomaly_summary.txt'. Default: False.") - parser.add_argument("--seconds", action="store_true", help="Pass to use seconds instead of time ticks. Default: False.") - parser.add_argument("--linear", action="store_true", help="Pass to use linear histogram scaling. Default: plots both linear and log.") - parser.add_argument("--log", action="store_true", help="Pass to use logarithmic histogram scaling. Default: plots both linear and log.") - parser.add_argument("--overwrite", action="store_true", help="Overwrite old outputs. Default: False.") + parser = argparse.ArgumentParser( + description="Display diagnostic information for TAs for a given tpstream file." + ) + parser.add_argument( + "filename", + help="Absolute path to tpstream file to display." + ) + parser.add_argument( + "--verbose", "-v", + action="count", + help="Increment the verbose level (errors, warnings, all)." + "Save names and skipped writes are always printed. Default: 0.", + default=0 + ) + parser.add_argument( + "--start-frag", + type=int, + help="Fragment to start loading from (inclusive); can take negative integers. Default: -10", + default=-10 + ) + parser.add_argument( + "--end-frag", + type=int, + help="Fragment to stop loading at (exclusive); can take negative integers. Default: N", + default=0 + ) + parser.add_argument( + "--no-anomaly", + action="store_true", + help="Pass to not write 'tp_anomaly_summary.txt'. Default: False." + ) + parser.add_argument( + "--seconds", + action="store_true", + help="Pass to use seconds instead of time ticks. Default: False." + ) + parser.add_argument( + "--linear", + action="store_true", + help="Pass to use linear histogram scaling. Default: plots both linear and log." + ) + parser.add_argument( + "--log", + action="store_true", + help="Pass to use logarithmic histogram scaling. Default: plots both linear and log." + ) + parser.add_argument( + "--overwrite", + action="store_true", + help="Overwrite old outputs. Default: False." + ) return parser.parse_args() + def main(): """ Drives the processing and plotting. """ - ## Process Arguments & Data + # Process Arguments & Data args = parse() filename = args.filename - quiet = args.quiet + verbosity = args.verbose start_frag = args.start_frag end_frag = args.end_frag no_anomaly = args.no_anomaly + seconds = args.seconds overwrite = args.overwrite + linear = args.linear log = args.log - seconds = args.seconds # User didn't pass either flag, so default to both being true. if (not linear) and (not log): linear = True log = True - data = trgtools.TPData(filename, quiet) - if end_frag == 0: # Ex: [-10:0] is bad. - frag_paths = data.get_tp_frag_paths()[start_frag:] + data = trgtools.TPReader(filename, verbosity) + + # Load all case + if start_frag == 0 and end_frag == -1: + data.read_all_fragments() # Has extra debug/warning info else: - frag_paths = data.get_tp_frag_paths()[start_frag:end_frag] - - for path in frag_paths: - if (not quiet): - print("Fragment Path:", path) - data.load_frag(path) - - # Try to find an empty plotting directory - plot_iter = 0 - plot_dir = f"{data.run_id}-{data.file_index}_figures_{plot_iter:04}" - while not overwrite and os.path.isdir(plot_dir): - plot_iter += 1 - plot_dir = f"{data.run_id}-{data.file_index}_figures_{plot_iter:04}" - print(f"Saving outputs to ./{plot_dir}/") - # If overwriting and it does exist, don't need to make it. - # So take the inverse to mkdir. - if not (overwrite and os.path.isdir(plot_dir)): - os.mkdir(plot_dir) - os.chdir(plot_dir) - - if (not quiet): - print("Size of tp_data:", data.tp_data.shape) - - ## Plots with more involved analysis - channel_tot(data.tp_data) - tp_percent_histogram(data.tp_data) - plot_adc_integral_vs_peak(data.tp_data) - - ## Basic Plots: Histograms & Box Plots - # Dictionary containing unique title, xlabel, and xticks (only some) + if end_frag == 0: # Ex: [-10:0] is bad. + frag_paths = data.get_fragment_paths()[start_frag:] + else: + frag_paths = data.get_fragment_paths()[start_frag:end_frag] + + for path in frag_paths: + data.read_fragment(path) + + # Find a new save name or overwrite an old one. + save_name = find_save_name(data.run_id, data.file_index, overwrite) + + print(f"Number of TPs: {data.tp_data.shape[0]}") # Enforcing output for a useful metric. + + # Plotting + + if not no_anomaly: + anomaly_filename = f"{save_name}.txt" + if verbosity >= 2: + print(f"Writing descriptive statistics to {anomaly_filename}.") + if os.path.isfile(anomaly_filename): + # Prepare a new tp_anomaly_summary.txt + os.remove(anomaly_filename) + time_label = "Time (s)" if seconds else "Time (Ticks)" - plot_dict = { + # Dictionary containing unique title, xlabel, and xticks (only some) + plot_hist_dict = { 'adc_integral': { - 'title': "ADC Integral", - 'xlabel': "ADC Integral" + 'title': "ADC Integral Histogram", + 'xlabel': "ADC Integral", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'adc_peak': { - 'title': "ADC Peak", - 'xlabel': "ADC Count" + 'title': "ADC Peak Histogram", + 'xlabel': "ADC Count", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'algorithm': { - 'title': "Algorithm", + 'title': "Algorithm Histogram", 'xlabel': 'Algorithm Type', + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now + 'linear_style': dict(color='k'), + 'log': False, 'xlim': (-1, 2), 'xticks': { 'ticks': (0, 1), 'labels': ("Unknown", "TPCDefault") }, - 'bins': (-0.5, 0.5, 1.5) # TODO: Dangerous. Hides values outside of this range. + 'bins': (-0.5, 0.5, 1.5) # TODO: Dangerous. Hides values outside of this range. }, # TODO: Channel should bin on the available # channels; however, this is inconsistent # between detectors (APA/CRP). # Requires loading channel maps. 'channel': { - 'title': "Channel", - 'xlabel': "Channel Number" + 'title': "Channel Histogram", + 'xlabel': "Channel Number", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'detid': { - 'title': "Detector ID", - 'xlabel': "Detector IDs" + 'title': "Detector ID Histogram", + 'xlabel': "Detector IDs", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'flag': { - 'title': "Flag", - 'xlabel': "Flags" + 'title': "Flag Histogram", + 'xlabel': "Flags", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_over_threshold': { - 'title': "Time Over Threshold", - 'xlabel': time_label + 'title': "Time Over Threshold Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_peak': { - 'title': "Relative Time Peak", - 'xlabel': time_label + 'title': "Relative Time Peak Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'time_start': { - 'title': "Relative Time Start", - 'xlabel': time_label + 'title': "Relative Time Start Histogram", + 'xlabel': time_label, + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') }, 'type': { - 'title': "Type", + 'title': "Type Histogram", 'xlabel': "Type", + 'ylabel': "Count", + 'linear': True, # TODO: Hard set for now + 'linear_style': dict(color='k'), + 'log': False, 'xlim': (-1, 3), 'xticks': { 'ticks': (0, 1, 2), @@ -359,30 +377,46 @@ def main(): 'bins': (-0.5, 0.5, 1.5, 2.5) # TODO: Dangerous. Hides values outside of this range. }, 'version': { - 'title': "Version", - 'xlabel': "Versions" + 'title': "Version Histogram", + 'xlabel': "Versions", + 'ylabel': "Count", + 'linear': linear, + 'linear_style': dict(color='#63ACBE', alpha=0.6, label='Linear'), + 'log': log, + 'log_style': dict(color='#EE442F', alpha=0.6, label='Log') } } - if not no_anomaly: - anomaly_filename = "tp_anomalies.txt" - if not quiet: - print(f"Writing descriptive statistics to {anomaly_filename}.") - if os.path.isfile(anomaly_filename): - # Prepare a new tp_anomaly_summary.txt - os.remove(anomaly_filename) - with PdfPages("tp_data_member_histograms.pdf") as pdf: - for tp_key in data.tp_data.dtype.names: + pdf_plotter = PDFPlotter(save_name) + + # Generic plots + for tp_key in data.tp_data.dtype.names: + if 'time' in tp_key: # Special case. + time = data.tp_data[tp_key] + if seconds: + time = time * TICK_TO_SEC_SCALE + min_time = np.min(time) # Prefer making the relative time change. + pdf_plotter.plot_histogram(time - min_time, plot_hist_dict[tp_key]) if not no_anomaly: - write_summary_stats(data.tp_data[tp_key], anomaly_filename, plot_dict[tp_key]['title']) - if tp_key == 'time_start' or tp_key == 'time_peak': - time = data.tp_data[tp_key] - if seconds: - time = time * TICK_TO_SEC_SCALE - min_time = np.min(time) - plot_pdf_histogram(time - min_time, plot_dict[tp_key], pdf, linear, log) - continue - plot_pdf_histogram(data.tp_data[tp_key], plot_dict[tp_key], pdf, linear, log) + write_summary_stats(time - min_time, anomaly_filename, tp_key) + continue + + pdf_plotter.plot_histogram(data.tp_data[tp_key], plot_hist_dict[tp_key]) + if not no_anomaly: + write_summary_stats(data.tp_data[tp_key], anomaly_filename, tp_key) + + pdf = pdf_plotter.get_pdf() + # Analysis plots + # ==== Time Over Threshold vs Channel ==== + plot_pdf_tot_vs_channel(data.tp_data, pdf) + # ======================================== + + # ==== ADC Integral vs ADC Peak ==== + plot_pdf_adc_integral_vs_peak(data.tp_data, pdf, verbosity) + # =================================== + + return None + if __name__ == "__main__": main()