feat: Support reading Picard-style metrics files with comments above the header #103

msto · 2024-03-15T16:02:12Z

It would be helpful if Metric.read could parse metrics files with comments above the header, such as those produced by Picard. I've updated Metric.read to use the read_header() method introduced in #124 , and also simplified the typing on io.to_reader() and io.to_writer() (per Clint's suggestion)

note I want to keep this separate from the MetricWriter feature branch, but it's dependent on the Metric._read_header method introduced in that branch.

I am waiting to merge until #123 is merged.

codecov · 2024-03-15T16:02:59Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 94.07%. Comparing base (032388d) to head (6b094cc).

❗ Current head 6b094cc differs from pull request most recent head c8a083c

Please upload reports for the commit c8a083c to get more accurate results.

Additional details and impacted files

@@                         Coverage Diff                         @@
##           ms_metric-writer-feature-branch     #103      +/-   ##
===================================================================
+ Coverage                            88.56%   94.07%   +5.50%     
===================================================================
  Files                                   16       33      +17     
  Lines                                 1775     3441    +1666     
  Branches                               378      706     +328     
===================================================================
+ Hits                                  1572     3237    +1665     
- Misses                                 134      135       +1     
  Partials                                69       69

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fgpyo/io/__init__.py

fgpyo/util/metric.py

TedBrookings

LGTM. I think you should wait to merge until Nils signs off. My general sense is that Metric at least originated as an internal tool for Fulcrum code, not a general-purpose TSV <-> data utility. It would then read and write in a way that Fulcrum approved of. Header comments and CSV instead of TSV were deliberately excluded. I personally don't see the harm in relaxing the allowed inputs while retaining strict outputs.

msto · 2024-03-15T17:58:15Z

That's fair.

I would be open to refactoring so .read() remains specific to Metric-formatted files. In that case I'd suggest providing a .read_picard() method or similar, which would filter comment lines before calling .read() under the hood.

I do think it would be generally valuable to have built-in support for reading Picard metrics into a Metric, and it would be very specifically helpful for one of my clients.

msto · 2024-03-15T17:58:25Z

@nh13 thoughts?

nh13 · 2024-03-15T18:23:06Z

@msto I'd like to review this one before it gets merged

clintval

I'm indifferent on the PR because for Picard files I use this library:

https://pypi.org/project/crimson/

It's not typed but it's been good enough for me since it doesn't force a co-install of things like pysam.

fgpyo/io/__init__.py

tfenne

Some reasonably minor suggestions.

fgpyo/util/metric.py

nh13

Minor comments from me, but a few others to fixup from others.

fgpyo/io/__init__.py

fgpyo/util/metric.py

* feat: add asdict * fix: typeguard fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types * fix: 3.8 compatible Dict typing * refactor: make instance checks private * tests: add coverage * fix: typeerror * doc: clarify that asdict does not format values

nh13

LGTM, take it or leave the additional comments/suggestions.

nh13 · 2024-06-05T21:27:29Z

fgpyo/util/metric.py

+        cls,
+        path: Path,
+        ignore_extra_fields: bool = True,
+        comment_prefix: str = "#",


Is this backwards incompatible? Should we bump the minor version?

Good question, it isn't.

One possible solution for backwards compatibility - I could update Metric._read_header to take an Optional comment prefix, and return the first line if the prefix is None (i.e. no skipping of prefixed or blank lines)

Is that something you'd like to see, or should I leave as is and bump the version?

I'd prefer it to the Optional[str] = None, but I don't have that strong of a feeling.

I'd also ask if comment_prefix is the name we want to go with for this parameter. @tfenne is a good judge of naming.

I might suggest comment_symbol?

Also i agree with @nh13 that ideally the method gets expanded to include both:

comment_symbol: Optional[str] = None, skip_blank_lines: bool = False

I might even argue for skip_blank_lines to even be a regex that defines lines to skip ... so e.g. I could use skip_lines_matching = "$\s*^" to skip stupid lines that are all tabs/spaces but no content!

fgpyo/util/metric.py

* feat: add read_header * fix: rm kw_only to support 3.9 * fix: use Typing.List to support 3.8 * fix: pr suggestions

chore: type hint fix: method name

msto · 2024-06-06T13:49:00Z

note I want to keep this separate from the MetricWriter feature branch, but it's dependent on the Metric._read_header method introduced in that branch.

I am waiting to merge until #123 is merged.

msto force-pushed the ms_read-picard-metrics branch 2 times, most recently from aad1b76 to 2ac66c4 Compare March 15, 2024 16:08

msto commented Mar 15, 2024

View reviewed changes

fgpyo/io/__init__.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Outdated Show resolved Hide resolved

msto marked this pull request as ready for review March 15, 2024 16:25

msto requested review from TedBrookings, nh13, clintval and geoffjentry March 15, 2024 16:25

TedBrookings approved these changes Mar 15, 2024

View reviewed changes

msto assigned nh13 Mar 18, 2024

clintval reviewed Mar 26, 2024

View reviewed changes

fgpyo/io/__init__.py Outdated Show resolved Hide resolved

msto changed the title ~~feat: Support reading metrics files with comments above the header~~ feat: Support reading Picard-style metrics files with comments above the header Apr 17, 2024

msto mentioned this pull request Apr 17, 2024

Metric wishlist #104

Open

6 tasks

tfenne requested changes Apr 17, 2024

View reviewed changes

fgpyo/util/metric.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Outdated Show resolved Hide resolved

clintval mentioned this pull request May 8, 2024

feat: Add MetricWriter class #107

Open

nh13 requested changes May 20, 2024

View reviewed changes

fgpyo/io/__init__.py Outdated Show resolved Hide resolved

fgpyo/util/metric.py Outdated Show resolved Hide resolved

msto marked this pull request as draft May 27, 2024 16:47

msto mentioned this pull request Jun 4, 2024

feat: Add read_header() #124

Merged

msto force-pushed the ms_read-picard-metrics branch from 6b094cc to ae55f2f Compare June 4, 2024 20:13

msto changed the base branch from main to ms_get-header June 4, 2024 20:13

msto force-pushed the ms_read-picard-metrics branch 2 times, most recently from f99f839 to 066cf19 Compare June 4, 2024 20:15

msto marked this pull request as ready for review June 4, 2024 23:35

msto requested review from nh13 and tfenne June 4, 2024 23:35

msto requested a review from clintval June 4, 2024 23:35

msto assigned tfenne Jun 4, 2024

msto mentioned this pull request Jun 5, 2024

refactor: Narrow typing on to_reader and to_writer #125

Merged

nh13 approved these changes Jun 5, 2024

View reviewed changes

feat: Add read_header() (#124)

48dd0ef

* feat: add read_header * fix: rm kw_only to support 3.9 * fix: use Typing.List to support 3.8 * fix: pr suggestions

Base automatically changed from ms_get-header to ms_metric-writer-feature-branch June 6, 2024 00:11

refactor: Narrow typing on to_reader and to_writer (#125)

032388d

msto force-pushed the ms_read-picard-metrics branch from dbdbe42 to fddbd40 Compare June 6, 2024 10:23

feat: update Metric.read to use _read_header

c8a083c

chore: type hint fix: method name

msto force-pushed the ms_read-picard-metrics branch from f833d92 to c8a083c Compare June 6, 2024 10:30

msto force-pushed the ms_metric-writer-feature-branch branch from 032388d to 78f2246 Compare June 6, 2024 14:46

msto mentioned this pull request Jun 6, 2024

feat: Add assertions to support MetricWriter #129

Open

msto assigned msto and unassigned nh13 and tfenne Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support reading Picard-style metrics files with comments above the header #103

feat: Support reading Picard-style metrics files with comments above the header #103

msto commented Mar 15, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

TedBrookings left a comment

msto commented Mar 15, 2024

msto commented Mar 15, 2024

nh13 commented Mar 15, 2024

clintval left a comment •

edited

Loading

tfenne left a comment

nh13 left a comment

nh13 left a comment

nh13 Jun 5, 2024

msto Jun 6, 2024

nh13 Jun 6, 2024

tfenne Jun 6, 2024

msto commented Jun 6, 2024 •

edited

Loading

feat: Support reading Picard-style metrics files with comments above the header #103

Are you sure you want to change the base?

feat: Support reading Picard-style metrics files with comments above the header #103

Conversation

msto commented Mar 15, 2024 • edited Loading

codecov bot commented Mar 15, 2024 • edited Loading

Codecov Report

TedBrookings left a comment

Choose a reason for hiding this comment

msto commented Mar 15, 2024

msto commented Mar 15, 2024

nh13 commented Mar 15, 2024

clintval left a comment • edited Loading

Choose a reason for hiding this comment

tfenne left a comment

Choose a reason for hiding this comment

nh13 left a comment

Choose a reason for hiding this comment

nh13 left a comment

Choose a reason for hiding this comment

nh13 Jun 5, 2024

Choose a reason for hiding this comment

msto Jun 6, 2024

Choose a reason for hiding this comment

nh13 Jun 6, 2024

Choose a reason for hiding this comment

tfenne Jun 6, 2024

Choose a reason for hiding this comment

msto commented Jun 6, 2024 • edited Loading

msto commented Mar 15, 2024 •

edited

Loading

codecov bot commented Mar 15, 2024 •

edited

Loading

clintval left a comment •

edited

Loading

msto commented Jun 6, 2024 •

edited

Loading