Merge pull request #23 from DUNE-DAQ/aeoranday/python-styles

Python Style Update
DUNE-DAQ · Feb 28, 2024 · eee32c3 · eee32c3
2 parents 1600c8d + 64ebafc
commit eee32c3
Show file tree

Hide file tree

Showing 20 changed files with 1,680 additions and 1,206 deletions.
diff --git a/docs/README.md b/docs/README.md
@@ -2,7 +2,10 @@
 
 The `trgtools` repository contains a collection of tools and scripts to emulate, test and analyze the performance of trigger and trigger algorithms.
 
+Use `pip install -r requirements.txt` to install all the Python packages necessary to run the `*_dump.py` scripts and the `trgtools.plot` submodule.
+
 - `process_tpstream`: Example of a simple pipeline to process TPStream files (slice by slice) and apply a trigger activity algorithm.
 - `ta_dump.py`: Script that loads HDF5 files containing trigger activities and plots various diagnostic information. [Documentation](ta-dump.md).
 - `tc_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tc-dump.md).
 - `tp_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tp-dump.md).
+- Python `trgtools` module: Reading and plotting module in that specializes in reading TP, TA, and TC fragments for a given HDF5. The submodule `trgtools.plot` has a common `PDFPlotter` that is used in the `*_dump.py` scripts. [Documentation](py-trgtools.md).
diff --git a/docs/py-trgtools.md b/docs/py-trgtools.md
@@ -0,0 +1,83 @@
+# Python trgtools Module
+
+Reading a DUNE-DAQ HDF5 file for the TP, TA, and TC contents can be easily done using the `trgtools` Python module.
+
+# Example
+
+## Common Methods
+```python
+import trgtools
+
+tp_data = trgtools.TPReader(hdf5_file_name)
+
+# Get all the available paths for TPs in this file.
+frag_paths = tp_data.get_fragment_paths()
+
+# Read all fragment paths. Appends results to tp_data.tp_data.
+tp_data.read_all_fragments()
+
+# Read only one fragment. Return result and append to tp_data.tp_data.
+frag0_tps = tp_data.read_fragment(frag_paths[0])
+
+# Reset tp_data.tp_data. Keeps the current fragment paths.
+tp_data.clear_data()
+
+# Reset the fragment paths to the initalized state.
+tp_data.reset_fragment_paths()
+```
+
+## Data Accessing
+```python
+tp_data = trgtools.TPReader(hdf5_file_name)
+ta_data = trgtools.TAReader(hdf5_file_name)
+tc_data = trgtools.TCReader(hdf5_file_name)
+
+tp_data.read_all_fragments()
+ta_data.read_all_fragments()
+tc_data.read_all_fragments()
+
+# Primary contents of the fragments
+# np.ndarray with each index as one T*
+tp_data.tp_data
+ta_data.ta_data
+tc_data.tc_data
+
+# Secondary contents of the fragments
+# List with each index as the TPs/TAs in the TA/TC
+ta_data.tp_data
+tc_data.ta_data
+
+ta0_contents = ta_data.tp_data[0]
+tc0_contents = tc_data.ta_data[0]
+```
+Data accessing follows a very similar procedure between the different readers. The TAReader and TCReader also contain the secondary information about the TPs and TAs that formed the TAs and TCs, respectively. For the `np.ndarray` objects, one can also specify the member data they want to access. For example,
+```python
+ta_data.ta_data['time_start']  # Returns a np.ndarray of the time_starts for all read TAs
+```
+The available data members for each reader can be used (and shown) with `tp_data.tp_dt`, `ta_data.ta_dt` and `ta_data.tp_dt`, and `tc_data.tc_dt` and `tc_data.ta_dt`.
+
+Look at the contents of `*_dump.py` for more detailed examples of data member usage.
+
+While using interactive Python, one can do `help(tp_data)` and `help(tp_data.read_fragment)` for documentation on their usage (and similarly for the other readers and plotters).
+
+# Plotting
+There is also a submodule `trgtools.plot` that features a class `PDFPlotter`. This class contains common plotting that was repeated between the `*_dump.py`. Loading this class requires `matplotlib` to be installed, but simply doing `import trgtools` does not have this requirement.
+
+## Example
+```python
+import trgtools
+from trgtools.plot import PDFPlotter
+
+tp_data = trgtools.TPReader(file_to_read)
+
+pdf_save_name = 'example.pdf'
+pdf_plotter = PDFPlotter(pdf_save_name)
+
+plot_style_dict = dict(title="ADC Peak Histogram", xlabel="ADC Counts", ylabel="Count")
+pdf_plotter.plot_histogram(tp_data['adc_peak'], plot_style_dict)
+```
+
+By design, the `plot_style_dict` requires the keys `title`, `xlabel`, and `ylabel` at a minimum. More options are available to further change the style of the plot, and examples of this are available in the `*_dump.py`.
+
+### Development
+The common plots available in `PDFPlotter` is rather limited right now. At this moment, these plots are sufficient, but more common plotting functions can be added.
diff --git a/docs/ta-dump.md b/docs/ta-dump.md
@@ -2,23 +2,24 @@
 
 `ta_dump.py` is a plotting script that shows TA diagnostic information, such as: algorithms produced, number of TPs per TA, event displays, ADC integral histogram, and a plot of the time starts.
 
-A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: histograms for member data and event displays.
+A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: 1) member data histograms and light analysis plots and 2) event displays.
 
 There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.
 
-While running, this prints warnings for empty fragments that are skipped in the given HDF5 file. These outputs can be suppressed with `--quiet`.
+While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.
 
-One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason).
+One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason).
 
 Event displays are processed by default. If there are many TAs that were loaded, then this may take a while to plot. The `--no-display` options skips event display plotting.
 
-A text file named `ta_anomalies.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
+A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
 
 ## Example
 ```bash
 python ta_dump.py file.hdf5
 python ta_dump.py file.hdf5 --help
-python ta_dump.py file.hdf5 --quiet
+python ta_dump.py file.hdf5 -v
+python ta_dump.py file.hdf5 -vv
 python ta_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments
 python ta_dump.py file.hdf5 --no-display
 python ta_dump.py file.hdf5 --no-anomaly

diff --git a/docs/tc-dump.md b/docs/tc-dump.md
@@ -1,22 +1,26 @@
 # Trigger Candidate Dump Info
-`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots:time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages.
+`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots: time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages.
+
+By default, a new PDF is generated (with naming based on the existing PDFs). One can pass `--overwrite` to overwrite the 0th PDF for a given HDF5 file.
 
 There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.
 
-While running, this prints warnings and general information about loading and plotting. These outputs can be suppressed with `--quiet`.
+While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.
 
-One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason).
+One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason).
 
-A text file named `tc_anomalies_<run_number>-<file_index>.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
+A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
 
 ## Example
 ```bash
 python tc_dump.py file.hdf5
 python tc_dump.py file.hdf5 --help
-python tc_dump.py file.hdf5 --quiet
+python tc_dump.py file.hdf5 -v
+python tc_dump.py file.hdf5 -vv
 python tc_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments
 python tc_dump.py file.hdf5 --no-anomaly
 python tc_dump.py file.hdf5 --log
 python tc_dump.py file.hdf5 --linear
 python tc_dump.py file.hdf5 --seconds
+python tc_dump.py file.hdf5 --overwrite
 ```
diff --git a/docs/tp-dump.md b/docs/tp-dump.md
@@ -2,21 +2,22 @@
 
 `tp_dump.py` is a plotting script that shows TP diagnostic information, such as: TP channel histogram and channel vs time over threshold. Plots are saved as SVGs, PDFs, and PNGs.
 
-A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`.
+A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`.
 
 There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.
 
-While running, this script prints various loading information. These outputs can be suppressed with `--quiet`.
+While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.
 
-One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is 0 by default (for the previously mentioned reason).
+One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is N by default (for the previously mentioned reason).
 
-A text file named `tp_anomaly_summary.txt` is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
+A text file is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
 
 ## Example
 ```bash
 python tp_dump.py file.hdf5 # Loads last 10 fragments by default.
 python tp_dump.py file.hdf5 --help
-python tp_dump.py file.hdf5 --quiet
+python tp_dump.py file.hdf5 -v
+python tp_dump.py file.hdf5 -vv
 python tp_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Loads 50 fragments.
 python tp_dump.py file.hdf5 --no-anomaly
 python ta_dump.py file.hdf5 --log

diff --git a/python/setup.cfg b/python/setup.cfg
@@ -0,0 +1,3 @@
+[flake8]
+# Conforms to DUNE-DAQ style guide
+max-line-length = 120
diff --git a/python/trgtools/HDF5Reader.py b/python/trgtools/HDF5Reader.py
@@ -0,0 +1,97 @@
+"""
+Generic HDF5Reader class to read and store data.
+"""
+from hdf5libs import HDF5RawDataFile
+
+import abc
+
+
+class HDF5Reader(abc.ABC):
+    """
+    Abstract reader class for HDF5 files.
+
+    Derived classes must complete all methods
+    decorated with @abc.abstractmethod.
+    """
+
+    # Useful print colors
+    _FAIL_TEXT_COLOR = '\033[91m'
+    _WARNING_TEXT_COLOR = '\033[93m'
+    _BOLD_TEXT = '\033[1m'
+    _END_TEXT_COLOR = '\033[0m'
+
+    # Counts the number of empty fragments.
+    _num_empty = 0
+
+    def __init__(self, filename: str, verbosity: int = 0) -> None:
+        """
+        Loads a given HDF5 file.
+
+        Parameters:
+            filename (str): HDF5 file to open.
+            verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All.
+
+        Returns nothing.
+        """
+        # Generic loading
+        self._h5_file = HDF5RawDataFile(filename)
+        self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths()
+        self.run_id = self._h5_file.get_int_attribute('run_number')
+        self.file_index = self._h5_file.get_int_attribute('file_index')
+
+        self._verbosity = verbosity
+
+        self._filter_fragment_paths()  # Derived class must define this.
+
+        return None
+
+    @abc.abstractmethod
+    def _filter_fragment_paths(self) -> None:
+        """
+        Filter the fragment paths of interest.
+
+        This should be according to the derived reader's
+        data type of interest, e.g., filter for TriggerActivity.
+        """
+        ...
+
+    def get_fragment_paths(self) -> list[str]:
+        """ Return the list of fragment paths. """
+        return list(self._fragment_paths)
+
+    def set_fragment_paths(self, fragment_paths: list[str]) -> None:
+        """ Set the list of fragment paths. """
+        self._fragment_paths = fragment_paths
+        return None
+
+    @abc.abstractmethod
+    def read_fragment(self, fragment_path: str) -> None:
+        """ Read one fragment from :fragment_path:. """
+        ...
+
+    def read_all_fragments(self) -> None:
+        """ Read all fragments. """
+        for fragment_path in self._fragment_paths:
+            _ = self.read_fragment(fragment_path)
+
+        # self.read_fragment should increment self._num_empty.
+        # Print how many were empty as a debug.
+        if self._verbosity >= 1 and self._num_empty != 0:
+            print(
+                    self._FAIL_TEXT_COLOR
+                    + self._BOLD_TEXT
+                    + f"WARNING: Skipped {self._num_empty} frags."
+                    + self._END_TEXT_COLOR
+            )
+
+        return None
+
+    @abc.abstractmethod
+    def clear_data(self) -> None:
+        """ Clear the contents of the member data. """
+        ...
+
+    def reset_fragment_paths(self) -> None:
+        """ Reset the fragment paths to the initialized state. """
+        self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths()
+        self._filter_fragment_paths()