Skip to content

Commit

Permalink
Merge pull request #23 from DUNE-DAQ/aeoranday/python-styles
Browse files Browse the repository at this point in the history
Python Style Update
  • Loading branch information
aeoranday authored Feb 28, 2024
2 parents 1600c8d + 64ebafc commit eee32c3
Show file tree
Hide file tree
Showing 20 changed files with 1,680 additions and 1,206 deletions.
3 changes: 3 additions & 0 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,10 @@

The `trgtools` repository contains a collection of tools and scripts to emulate, test and analyze the performance of trigger and trigger algorithms.

Use `pip install -r requirements.txt` to install all the Python packages necessary to run the `*_dump.py` scripts and the `trgtools.plot` submodule.

- `process_tpstream`: Example of a simple pipeline to process TPStream files (slice by slice) and apply a trigger activity algorithm.
- `ta_dump.py`: Script that loads HDF5 files containing trigger activities and plots various diagnostic information. [Documentation](ta-dump.md).
- `tc_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tc-dump.md).
- `tp_dump.py`: Script that loads HDF5 files containing trigger primitives and plots various diagnostic information. [Documentation](tp-dump.md).
- Python `trgtools` module: Reading and plotting module in that specializes in reading TP, TA, and TC fragments for a given HDF5. The submodule `trgtools.plot` has a common `PDFPlotter` that is used in the `*_dump.py` scripts. [Documentation](py-trgtools.md).
83 changes: 83 additions & 0 deletions docs/py-trgtools.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Python trgtools Module

Reading a DUNE-DAQ HDF5 file for the TP, TA, and TC contents can be easily done using the `trgtools` Python module.

# Example

## Common Methods
```python
import trgtools

tp_data = trgtools.TPReader(hdf5_file_name)

# Get all the available paths for TPs in this file.
frag_paths = tp_data.get_fragment_paths()

# Read all fragment paths. Appends results to tp_data.tp_data.
tp_data.read_all_fragments()

# Read only one fragment. Return result and append to tp_data.tp_data.
frag0_tps = tp_data.read_fragment(frag_paths[0])

# Reset tp_data.tp_data. Keeps the current fragment paths.
tp_data.clear_data()

# Reset the fragment paths to the initalized state.
tp_data.reset_fragment_paths()
```

## Data Accessing
```python
tp_data = trgtools.TPReader(hdf5_file_name)
ta_data = trgtools.TAReader(hdf5_file_name)
tc_data = trgtools.TCReader(hdf5_file_name)

tp_data.read_all_fragments()
ta_data.read_all_fragments()
tc_data.read_all_fragments()

# Primary contents of the fragments
# np.ndarray with each index as one T*
tp_data.tp_data
ta_data.ta_data
tc_data.tc_data

# Secondary contents of the fragments
# List with each index as the TPs/TAs in the TA/TC
ta_data.tp_data
tc_data.ta_data

ta0_contents = ta_data.tp_data[0]
tc0_contents = tc_data.ta_data[0]
```
Data accessing follows a very similar procedure between the different readers. The TAReader and TCReader also contain the secondary information about the TPs and TAs that formed the TAs and TCs, respectively. For the `np.ndarray` objects, one can also specify the member data they want to access. For example,
```python
ta_data.ta_data['time_start'] # Returns a np.ndarray of the time_starts for all read TAs
```
The available data members for each reader can be used (and shown) with `tp_data.tp_dt`, `ta_data.ta_dt` and `ta_data.tp_dt`, and `tc_data.tc_dt` and `tc_data.ta_dt`.

Look at the contents of `*_dump.py` for more detailed examples of data member usage.

While using interactive Python, one can do `help(tp_data)` and `help(tp_data.read_fragment)` for documentation on their usage (and similarly for the other readers and plotters).

# Plotting
There is also a submodule `trgtools.plot` that features a class `PDFPlotter`. This class contains common plotting that was repeated between the `*_dump.py`. Loading this class requires `matplotlib` to be installed, but simply doing `import trgtools` does not have this requirement.

## Example
```python
import trgtools
from trgtools.plot import PDFPlotter

tp_data = trgtools.TPReader(file_to_read)

pdf_save_name = 'example.pdf'
pdf_plotter = PDFPlotter(pdf_save_name)

plot_style_dict = dict(title="ADC Peak Histogram", xlabel="ADC Counts", ylabel="Count")
pdf_plotter.plot_histogram(tp_data['adc_peak'], plot_style_dict)
```

By design, the `plot_style_dict` requires the keys `title`, `xlabel`, and `ylabel` at a minimum. More options are available to further change the style of the plot, and examples of this are available in the `*_dump.py`.

### Development
The common plots available in `PDFPlotter` is rather limited right now. At this moment, these plots are sufficient, but more common plotting functions can be added.
11 changes: 6 additions & 5 deletions docs/ta-dump.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,23 +2,24 @@

`ta_dump.py` is a plotting script that shows TA diagnostic information, such as: algorithms produced, number of TPs per TA, event displays, ADC integral histogram, and a plot of the time starts.

A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: histograms for member data and event displays.
A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`. Plots are saved in two multi-page PDFs: 1) member data histograms and light analysis plots and 2) event displays.

There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.

While running, this prints warnings for empty fragments that are skipped in the given HDF5 file. These outputs can be suppressed with `--quiet`.
While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.

One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason).
One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason).

Event displays are processed by default. If there are many TAs that were loaded, then this may take a while to plot. The `--no-display` options skips event display plotting.

A text file named `ta_anomalies.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.

## Example
```bash
python ta_dump.py file.hdf5
python ta_dump.py file.hdf5 --help
python ta_dump.py file.hdf5 --quiet
python ta_dump.py file.hdf5 -v
python ta_dump.py file.hdf5 -vv
python ta_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments
python ta_dump.py file.hdf5 --no-display
python ta_dump.py file.hdf5 --no-anomaly
Expand Down
14 changes: 9 additions & 5 deletions docs/tc-dump.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,26 @@
# Trigger Candidate Dump Info
`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots:time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages.
`tc_dump.py` is a plotting script that shows TC diagnostic information. This includes histograms of all the available data members and ADC integral (sum of all contained TA ADC integrals) and a few light analysis plots: time difference histogram (various start and end time definitions), ADC integral vs number of TAs scatter plot, time spans per TC plot (including a calculation for the number of ticks per TC). These plots are written to a single PDF with multiple pages.

By default, a new PDF is generated (with naming based on the existing PDFs). One can pass `--overwrite` to overwrite the 0th PDF for a given HDF5 file.

There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.

While running, this prints warnings and general information about loading and plotting. These outputs can be suppressed with `--quiet`.
While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.

One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `0` by default (for the previously mentioned reason).
One can specify which fragments to _attempt_ to load from with the `--start-frag` option. This is `-10` by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is `N` by default (for the previously mentioned reason).

A text file named `tc_anomalies_<run_number>-<file_index>.txt` is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
A text file is generated that gives reference statistics for each TA data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.

## Example
```bash
python tc_dump.py file.hdf5
python tc_dump.py file.hdf5 --help
python tc_dump.py file.hdf5 --quiet
python tc_dump.py file.hdf5 -v
python tc_dump.py file.hdf5 -vv
python tc_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Attempts 50 fragments
python tc_dump.py file.hdf5 --no-anomaly
python tc_dump.py file.hdf5 --log
python tc_dump.py file.hdf5 --linear
python tc_dump.py file.hdf5 --seconds
python tc_dump.py file.hdf5 --overwrite
```
11 changes: 6 additions & 5 deletions docs/tp-dump.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,22 @@

`tp_dump.py` is a plotting script that shows TP diagnostic information, such as: TP channel histogram and channel vs time over threshold. Plots are saved as SVGs, PDFs, and PNGs.

A new directory is created that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`.
A new save name is found that identifies the HDF5 file the script was run on and increments based on preceding plots for the same file. One can overwrite the first set of plots generated by using `--overwrite`.

There are two plotting options `--linear` and `--log` that set the y-scale for the plots. By default, plots use both scales with linear on the left y-axis and log on the right y-axis. There is an additional plotting option `--seconds` to produce time plots using seconds instead of ticks.

While running, this script prints various loading information. These outputs can be suppressed with `--quiet`.
While running, this can print information about the file reading using `-v` (warnings) and `-vv` (all). Errors and useful output information (save names and location) are always outputted.

One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is 0 by default (for the previously mentioned reason).
One can specify which fragments to load from with the `--start-frag` option. This is -10 by default in order to get the last 10 fragments for the given file. One can also specify which fragment to end on (not inclusive) with `--end-frag` option. This is N by default (for the previously mentioned reason).

A text file named `tp_anomaly_summary.txt` is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.
A text file is generated that gives reference statistics for each TP data member and gives a count of data members that are at least 2 sigma and 3 sigma from the mean. One can use `--no-anomaly` to stop this file generation.

## Example
```bash
python tp_dump.py file.hdf5 # Loads last 10 fragments by default.
python tp_dump.py file.hdf5 --help
python tp_dump.py file.hdf5 --quiet
python tp_dump.py file.hdf5 -v
python tp_dump.py file.hdf5 -vv
python tp_dump.py file.hdf5 --start-frag 50 --end-frag 100 # Loads 50 fragments.
python tp_dump.py file.hdf5 --no-anomaly
python ta_dump.py file.hdf5 --log
Expand Down
3 changes: 3 additions & 0 deletions python/setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[flake8]
# Conforms to DUNE-DAQ style guide
max-line-length = 120
97 changes: 97 additions & 0 deletions python/trgtools/HDF5Reader.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
"""
Generic HDF5Reader class to read and store data.
"""
from hdf5libs import HDF5RawDataFile

import abc


class HDF5Reader(abc.ABC):
"""
Abstract reader class for HDF5 files.
Derived classes must complete all methods
decorated with @abc.abstractmethod.
"""

# Useful print colors
_FAIL_TEXT_COLOR = '\033[91m'
_WARNING_TEXT_COLOR = '\033[93m'
_BOLD_TEXT = '\033[1m'
_END_TEXT_COLOR = '\033[0m'

# Counts the number of empty fragments.
_num_empty = 0

def __init__(self, filename: str, verbosity: int = 0) -> None:
"""
Loads a given HDF5 file.
Parameters:
filename (str): HDF5 file to open.
verbosity (int): Verbose level. 0: Only errors. 1: Warnings. 2: All.
Returns nothing.
"""
# Generic loading
self._h5_file = HDF5RawDataFile(filename)
self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths()
self.run_id = self._h5_file.get_int_attribute('run_number')
self.file_index = self._h5_file.get_int_attribute('file_index')

self._verbosity = verbosity

self._filter_fragment_paths() # Derived class must define this.

return None

@abc.abstractmethod
def _filter_fragment_paths(self) -> None:
"""
Filter the fragment paths of interest.
This should be according to the derived reader's
data type of interest, e.g., filter for TriggerActivity.
"""
...

def get_fragment_paths(self) -> list[str]:
""" Return the list of fragment paths. """
return list(self._fragment_paths)

def set_fragment_paths(self, fragment_paths: list[str]) -> None:
""" Set the list of fragment paths. """
self._fragment_paths = fragment_paths
return None

@abc.abstractmethod
def read_fragment(self, fragment_path: str) -> None:
""" Read one fragment from :fragment_path:. """
...

def read_all_fragments(self) -> None:
""" Read all fragments. """
for fragment_path in self._fragment_paths:
_ = self.read_fragment(fragment_path)

# self.read_fragment should increment self._num_empty.
# Print how many were empty as a debug.
if self._verbosity >= 1 and self._num_empty != 0:
print(
self._FAIL_TEXT_COLOR
+ self._BOLD_TEXT
+ f"WARNING: Skipped {self._num_empty} frags."
+ self._END_TEXT_COLOR
)

return None

@abc.abstractmethod
def clear_data(self) -> None:
""" Clear the contents of the member data. """
...

def reset_fragment_paths(self) -> None:
""" Reset the fragment paths to the initialized state. """
self._fragment_paths = self._h5_file.get_all_fragment_dataset_paths()
self._filter_fragment_paths()
Loading

0 comments on commit eee32c3

Please sign in to comment.