-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Support reading Picard-style metrics files with comments above the header #103
base: ms_metric-writer-feature-branch
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## ms_metric-writer-feature-branch #103 +/- ##
===================================================================
+ Coverage 88.56% 94.07% +5.50%
===================================================================
Files 16 33 +17
Lines 1775 3441 +1666
Branches 378 706 +328
===================================================================
+ Hits 1572 3237 +1665
- Misses 134 135 +1
Partials 69 69 ☔ View full report in Codecov by Sentry. |
aad1b76
to
2ac66c4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think you should wait to merge until Nils signs off. My general sense is that Metric at least originated as an internal tool for Fulcrum code, not a general-purpose TSV <-> data utility. It would then read and write in a way that Fulcrum approved of. Header comments and CSV instead of TSV were deliberately excluded. I personally don't see the harm in relaxing the allowed inputs while retaining strict outputs.
That's fair. I would be open to refactoring so I do think it would be generally valuable to have built-in support for reading Picard metrics into a |
@nh13 thoughts? |
@msto I'd like to review this one before it gets merged |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm indifferent on the PR because for Picard files I use this library:
It's not typed but it's been good enough for me since it doesn't force a co-install of things like pysam.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some reasonably minor suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments from me, but a few others to fixup from others.
* feat: add asdict * fix: typeguard fix: typeguard import doc: update docstring refactor: import Instance types refactor: import Instance types * fix: 3.8 compatible Dict typing * refactor: make instance checks private * tests: add coverage * fix: typeerror * doc: clarify that asdict does not format values
6b094cc
to
ae55f2f
Compare
f99f839
to
066cf19
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, take it or leave the additional comments/suggestions.
cls, | ||
path: Path, | ||
ignore_extra_fields: bool = True, | ||
comment_prefix: str = "#", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this backwards incompatible? Should we bump the minor version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, it isn't.
One possible solution for backwards compatibility - I could update Metric._read_header
to take an Optional
comment prefix, and return the first line if the prefix is None
(i.e. no skipping of prefixed or blank lines)
Is that something you'd like to see, or should I leave as is and bump the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer it to the Optional[str] = None
, but I don't have that strong of a feeling.
I'd also ask if comment_prefix
is the name we want to go with for this parameter. @tfenne is a good judge of naming.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might suggest comment_symbol
?
Also i agree with @nh13 that ideally the method gets expanded to include both:
comment_symbol: Optional[str] = None,
skip_blank_lines: bool = False
I might even argue for skip_blank_lines
to even be a regex that defines lines to skip ... so e.g. I could use skip_lines_matching = "$\s*^"
to skip stupid lines that are all tabs/spaces but no content!
dbdbe42
to
fddbd40
Compare
chore: type hint fix: method name
f833d92
to
c8a083c
Compare
note I want to keep this separate from the I am waiting to merge until #123 is merged. |
032388d
to
78f2246
Compare
It would be helpful if
Metric.read
could parse metrics files with comments above the header, such as those produced by Picard. I've updatedMetric.read
to use theread_header()
method introduced in #124 , and also simplified the typing onio.to_reader()
andio.to_writer()
(per Clint's suggestion)note I want to keep this separate from the
MetricWriter
feature branch, but it's dependent on theMetric._read_header
method introduced in that branch.I am waiting to merge until #123 is merged.