Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

V2 asdf #56

Merged
merged 28 commits into from
Jul 24, 2024
Merged

V2 asdf #56

merged 28 commits into from
Jul 24, 2024

Conversation

krisvanneste
Copy link
Collaborator

In this branch, I add further support to read traces and station inventories from ASDF files:

  • Added new parse_asdf_inventory() function in input.station_metadata.py
  • Added new parse_asdf_traces() function in input.traces.py

These functions can be used interactively as follows:

from sourcespec2.input.station_metadata import parse_asdf_inventory
inventory = parse_asdf_inventory(asdf_file)

from sourcespec2.input.traces import parse_asdf_traces
tag = 'raw'
st = parse_asdf_traces(asdf_file, tag, read_headers=True)

In addition, I also modified input.event_parsers.quakeml.parse_qml_file(), input.station_metadata.read_station_metadata() and input.traces._read_trace_files() to choose between the ASDF/non-ASDF parsers depending on file extension (.asdf or .h5 for ASDF format). This should allow using ASDF file(s) in the CLI as well, as shown here interactively:

from sourcespec2.input import read_event_and_picks
config.options.qml_file = asdf_file
event, picks = read_event_and_picks()

from sourcespec2.input import read_station_metadata
config.station_metadata = asdf_file
inventory = read_station_metadata()

from sourcespec2.input.traces import read_traces
config.options.trace_path = [asdf_file]
st = read_traces()

However, as ASDF files may contain different waveform data labeled with tags, we might consider 1 or 2 additional configuration options: asdf_tag to specify the tag (the current default is 'raw') and perhaps also asdf_read_headers to specify whether or not to try reading trace headers (current default False). The latter is not specifically defined in the ASDF standard, but I implemented the way this is done in ESM as well as my own way (which is more logical).

@claudiodsf
Copy link
Member

Hi Kris, I just force-pushed some commits.

Here's what I did:

  • I prefer a different approach for reading ASDF file from command line
  • I reduced the nesting in parse_asdf_traces(): could you please check that the code still works?

More commits to come...

@krisvanneste
Copy link
Collaborator Author

Claudio,
Everything still works on my side after replacing config.options.qml_file with config.options.asdf_file.
Apparently, this new option only relates to event information, while for station metadata and traces the parsing still depends on file extension. Concerning this extension, I included both .asdf and .h5, but if you would like to come up with a custom HDF5 format for (processed) traces, you may want to restrict this to .asdf.

Concerning the use of waveform tags in ASDF files, we should either define a configuration parameter for this or else set the default value of 'tag' in input.traces.parse-asdf_traces() to an empty string or None, so that the first tag available is used. It is then up to the user to create ASDF files with just 1 tag.

Finally, I noticed that you hardcoded the tag "raw" in input.traces._parse_asdf_trace_headers(). I would prefer to add a tag argument in this function, otherwise it will break if parse_asdf_traces() is called with a different tag value.

@claudiodsf
Copy link
Member

Claudio, Everything still works on my side after replacing config.options.qml_file with config.options.asdf_file. Apparently, this new option only relates to event information, while for station metadata and traces the parsing still depends on file extension.

This is in the next commit to come 😉

Concerning this extension, I included both .asdf and .h5, but if you would like to come up with a custom HDF5 format for (processed) traces, you may want to restrict this to .asdf.

I'm thinking about a more robust way to detect the file, not based on the extension...

Concerning the use of waveform tags in ASDF files, we should either define a configuration parameter for this or else set the default value of 'tag' in input.traces.parse-asdf_traces() to an empty string or None, so that the first tag available is used. It is then up to the user to create ASDF files with just 1 tag.

Ok, I'm changing the default to None for the moment

Finally, I noticed that you hardcoded the tag "raw" in input.traces._parse_asdf_trace_headers(). I would prefer to add a tag argument in this function, otherwise it will break if parse_asdf_traces() is called with a different tag value.

Oops, that's an error made by Copilot in the refactoring process 😆

@claudiodsf
Copy link
Member

Concerning the use of waveform tags in ASDF files, we should either define a configuration parameter for this or else set the default value of 'tag' in input.traces.parse-asdf_traces() to an empty string or None, so that the first tag available is used. It is then up to the user to create ASDF files with just 1 tag.

Ok, I'm changing the default to None for the moment

Finally, I noticed that you hardcoded the tag "raw" in input.traces._parse_asdf_trace_headers(). I would prefer to add a tag argument in this function, otherwise it will break if parse_asdf_traces() is called with a different tag value.

Oops, that's an error made by Copilot in the refactoring process 😆

Force-pushed to fix these two points. Is it ok now?

@claudiodsf
Copy link
Member

Sorry, there was a small error. Force-pushed again

@krisvanneste
Copy link
Collaborator Author

Claudio, it works.
If I use input.traces.read_traces, the trace headers are not read (because the default value of read_headers is False).
If I use input.traces.parse_asdf_traces, then trace headers are read correctly, both using tag='raw' and tag=None.

@claudiodsf claudiodsf force-pushed the v2_asdf branch 2 times, most recently from 9dcffb8 to 1038ba2 Compare July 22, 2024 06:47
@claudiodsf
Copy link
Member

claudiodsf commented Jul 22, 2024

Hi Kris,
I worked a little bit more on this PR to make it possible the following workflow:

from sourcespec2.setup import config
config.options.asdf_file = ASDF_FILE_PATH
config.options.asdf_tag = 'raw'

from sourcespec2.input import read_event_and_picks
event, picks = read_event_and_picks()

from sourcespec2.input import read_station_metadata
inventory = read_station_metadata()

from sourcespec2.input import read_traces
st = read_traces()

Note that no other options need to be specified to config.options, since I'm using getattr(config.options, OPTION_NAME, None) to retrieve them.

Please let me know if you're satisfied with this workflow.

I'm making some further adjustments to include in the same workflow the possibility of reading event and station metadata from the SAC header.

krisvanneste and others added 11 commits July 22, 2024 09:40
Choose between parse_asdf_inventory and read_inventory depending on extension in read_station_metadata function.
Choose between obspy.read and parse_asdf_traces in _read_trace_files function depending on file extension.
Note: the option will allow also reading traces and metadata from the
ASDF file (to be implemented in the next commits).

Also, renamed asdf.py to asdf_event.py
The function was previously named parse_qml_file()
…ath` is None

Also, add `config.options.asdf_file` to the list of files to symlink.
@claudiodsf
Copy link
Member

claudiodsf commented Jul 22, 2024

I have a question concerning the function def parse_asdf_traces().

Its current signature is:

def parse_asdf_traces(asdf_file, tag=None, read_headers=False):
    ...

Is there any practical case in which tag is not None, but read_headers remains False?

@krisvanneste
Copy link
Collaborator Author

I have a question concerning the function def parse_asdf_traces().

Its current signature is:

def parse_asdf_traces(asdf_file, tag=None, read_headers=False):
    ...

Is there any practical case in which tag is not None, but read_headers remains False?

Yes, the 2 options are independent. All waveform data within an ASDF file are stored under a particular tag. The corresponding trace headers are stored in the auxiliary data with the same tag. The tag option is required if there is more than 1 waveform tag in the ASDF file; the read_headers option controls reading of trace headers, for which no standard is defined in the ASDF specification.

@claudiodsf claudiodsf mentioned this pull request Jul 22, 2024
8 tasks
@claudiodsf
Copy link
Member

Yes, the 2 options are independent. All waveform data within an ASDF file are stored under a particular tag. The corresponding trace headers are stored in the auxiliary data with the same tag. The tag option is required if there is more than 1 waveform tag in the ASDF file; the read_headers option controls reading of trace headers, for which no standard is defined in the ASDF specification.

Ok, so nothing to change then 😉 .

We will have to add documentation for all the changes and the new features (tracking this in #58).

@krisvanneste
Copy link
Collaborator Author

Hi Kris, I worked a little bit more on this PR to make it possible the following workflow:

from sourcespec2.setup import config
config.options.asdf_file = ASDF_FILE_PATH
config.options.asdf_tag = 'raw'

from sourcespec2.input import read_event_and_picks
event, picks = read_event_and_picks()

from sourcespec2.input import read_station_metadata
inventory = read_station_metadata()

from sourcespec2.input.traces import read_traces
st = read_traces()

Note that no other options need to be specified to config.options, since I'm using getattr(config.options, OPTION_NAME, None) to retrieve them.

Please let me know if you're satisfied with this workflow.

I'm making some further adjustments to include in the same workflow the possibility of reading event and station metadata from the SAC header.

Claudio,

Just tested, and I confirm it works.
This workflow is easier, but less flexible than before. I now requires that there is only 1 ASDF file, containing all the waveforms, station inventories and event information. This is indeed the most logical way to collect the data for a sourcespec run and acceptable if we indicate this in the documentation.
I'm wondering what happens if along with config.options.asdf_file, also one of config.options.qml_file, config.station_metadata or config.options.trace_path are set? Will these take precedence? I think they should.

@claudiodsf
Copy link
Member

OK, I see. I checked and it works without having config.options.outdir and config.options.evname.

Great!

By the way, wasn't the latter parameter supposed to be set when reading event data?

I admit, I'm lost. But I think that evname is retrieved throughout the code from the Event object.

Anyway, config.options is not going to stay!

@claudiodsf
Copy link
Member

We can maybe merge this PR now?

@krisvanneste
Copy link
Collaborator Author

We can maybe merge this PR now?

Can I still try adding support for wildcards in ASDF tags? I will look into it now.

@claudiodsf
Copy link
Member

We can maybe merge this PR now?

Can I still try adding support for wildcards in ASDF tags? I will look into it now.

Sure!

…ards.

Added 'header_key' argument to _parse_asdf_headers function, and allowed 'tag' argument to be list.
Improved error catching when trying to deserialize header values in _parse_asdf_headers function.
@krisvanneste
Copy link
Collaborator Author

I added support for wildcards ('?' or '*') in ASDF tags.
With an ASDF file downloaded from ESM, I can now set

config.options.asdf_tag = '00_hg?_emsc_20160824_0000006_dis_mp'

to read 3 traces (including trace headers), even though they are stored with different tags.
Note that I also added a header_key argument to _parse_asdf_headers() replacing the hardcoded "TraceHeaders".
This will allow us to write a notebook using data downloaded from ESM (if we find the time)...

@claudiodsf
Copy link
Member

I added support for wildcards ('?' or '*') in ASDF tags. With an ASDF file downloaded from ESM, I can now set

config.options.asdf_tag = '00_hg?_emsc_20160824_0000006_dis_mp'

to read 3 traces (including trace headers), even though they are stored with different tags. Note that I also added a header_key argument to _parse_asdf_headers() replacing the hardcoded "TraceHeaders". This will allow us to write a notebook using data downloaded from ESM (if we find the time)...

Great! Saw that!

The only thing missing, from my point of view, is command line support for multiple tags (i.e., -g/--tag option should take multiple arguments).
I can do that! 😉

@claudiodsf
Copy link
Member

claudiodsf commented Jul 24, 2024

Wait! This is not even supported from your code: the tag argument in parse_asdf_traces() is always just a string.
It can become a list later in _parse_asdf_trace_headers().

So, the use cases are:

  • a simple tag, e.g. tag
  • a tag with wildcards, e.g. tag* or tag??
  • no explicit list of tags, e.g. ['aaaa', 'bbbb', 'cccc']

If that is fine with you, there is nothing to change!

Alternatively, we could allow the latter by splitting tag by comma, e.g.:

config.options.asdf_tag = 'aaaa,bbbb,cccc'

What do you think?

@krisvanneste
Copy link
Collaborator Author

Wait! This is not even supported from your code: the tag argument in parse_asdf_traces() is always just a string. It can become a list later in _parse_asdf_trace_headers().

So, the use cases are:

* a simple tag, e.g. `tag`

* a tag with wildcards, e.g. `tag*` or `tag??`

* no explicit list of tags, e.g. `['aaaa', 'bbbb', 'cccc']`

If that is fine with you, there is nothing to change!

Alternatively, we could allow the latter by splitting tag by comma, e.g.:

config.options.asdf_tag = 'aaaa,bbbb,cccc'

What do you think?

Claudio, sorry I wasn't paying attention.

  • parse_asdf_traces() should accept only 1 tag (possibly containing wildcards), as it parses only 1 file
  • _parse_asdf_trace_headers() gets its tag argument from parse_asdf_traces, so no need to worry
  • We could support different ASDF tags for different ASDF files, but this should be handled in input.traces._read_asdf_traces(), which then dispatches the correct tag to parse_asdf_traces(). This could be done using a simple list, no need to embed commas in a string.
    Does that sound acceptable?

@claudiodsf
Copy link
Member

  • We could support different ASDF tags for different ASDF files, but this should be handled in input.traces._read_asdf_traces(), which then dispatches the correct tag to parse_asdf_traces(). This could be done using a simple list, no need to embed commas in a string.

Is this a plausible use case?
If yes, could you please go ahead and implement it?

Thanks!

@krisvanneste
Copy link
Collaborator Author

  • We could support different ASDF tags for different ASDF files, but this should be handled in input.traces._read_asdf_traces(), which then dispatches the correct tag to parse_asdf_traces(). This could be done using a simple list, no need to embed commas in a string.

Is this a plausible use case? If yes, could you please go ahead and implement it?

Thanks!

Not very likely, but possible. I will implement it in input.traces._read_asdf_traces().

@krisvanneste
Copy link
Collaborator Author

Done. I have no test case but at least the case with 1 tag still works...

@claudiodsf
Copy link
Member

Great, thanks!

I'll make my test, then some cosmetic changes (if necessary), then merge!

Most important change is log message when no ASDF tag is specified.
@claudiodsf
Copy link
Member

Ok, I pushed some small changes.

Should trace_tags be renamed to waveform_tags for consistency?

@krisvanneste
Copy link
Collaborator Author

Great, thanks!

I'll make my test, then some cosmetic changes (if necessary), then merge!

Actually I am able to test this with my multiple ASDF files case, with a list containing the same tag for each file. It works!
And if I change one tag in the list to something that is wrong, I get less traces.

@krisvanneste
Copy link
Collaborator Author

Ok, I pushed some small changes.

Should trace_tags be renamed to waveform_tags for consistency?

Do you mean in _parse_asdf_trace_headers()?

@claudiodsf
Copy link
Member

Ok, I pushed some small changes.
Should trace_tags be renamed to waveform_tags for consistency?

Do you mean in _parse_asdf_trace_headers()?

Yes, but also in parse_asdf_traces()

@krisvanneste
Copy link
Collaborator Author

OK, I see.
I think so, yes. From there on, if trace_tags is a list, it really refers to the tags of each individual trace.

@claudiodsf
Copy link
Member

OK, I see. I think so, yes. From there on, if trace_tags is a list, it really refers to the tags of each individual trace.

So we keep it to trace_tags?

@krisvanneste
Copy link
Collaborator Author

OK, I see. I think so, yes. From there on, if trace_tags is a list, it really refers to the tags of each individual trace.

So we keep it to trace_tags?

Yes, but maybe update the docstring to "trace tag(s) in ASDF file"?
I will be away ~1 hour for lunch...

@claudiodsf
Copy link
Member

Done! Let me know if you're happy with that and I'll merge!

Bon appetit!

@krisvanneste
Copy link
Collaborator Author

Done! Let me know if you're happy with that and I'll merge!

Bon appetit!

Looks good to me!

@claudiodsf
Copy link
Member

Time to hit that "Merge" button!

Thanks for this good work! 🙏

@claudiodsf claudiodsf merged commit 87ce571 into SeismicSource:v2 Jul 24, 2024
2 checks passed
@claudiodsf
Copy link
Member

I will not have time to further work on SourceSpec today.
I'll try to do some further work tomorrow (last day before vacation).

@krisvanneste
Copy link
Collaborator Author

Thank you too, Claudio.
What will you be working on tomorrow? Let me know if I can do something.

@claudiodsf
Copy link
Member

What will you be working on tomorrow? Let me know if I can do something.

I'll try to create the processing submodules. No hurry, though 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants