Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Coherence Sorting Method for Improved TF Estimates #119

Open
5 tasks
kkappler opened this issue Oct 12, 2021 · 3 comments
Open
5 tasks

Coherence Sorting Method for Improved TF Estimates #119

kkappler opened this issue Oct 12, 2021 · 3 comments
Assignees

Comments

@kkappler
Copy link
Collaborator

kkappler commented Oct 12, 2021

A basic coherence sorting has been implemented. This improves the highest frequency estimates at for example IAK34, but there are still some issues with phase, and a few bad periods in aurora that are not showing as bad in EMTF. The existing method is a sort-of jackknife that iteratively removes a segment of data and recomputes the coherence of the remaining ensembles, either local of remote. The coherence is only between two channels though (not multiple coherence)

To do multiple coherence we should make:

  • a properly sized segment weights placeholder vector (currently it is a NoneType)
  • a method to specify the "multiple coherence estimate band", but for now, we could use a table prefilled, or a simple let of rules, like "let the 'multiple coherence estimate band' have 3x the FCs as the TF estimation band
  • a variation on get_band_for_tf_estimate that extracts the coherence band
  • "preliminary Z-estimator", can be OLS, or a simple jackknife
  • mutliple-coherence computation (PSD(E_p)/PSD(E_obs)

There are several issues here:

  1. Need to automate the choice of coherence threshold (at each decimation level this can be different, in some cases it can differ between periods as well)
  2. May want to rerun leverage points downweighting after restricting to the coherence filtered FCs
  3. Decide if we drop the low coherence or just downweight them

References:

Egbert, Gary D. New approaches to estimation of magnetotelluric parameters. Final technical report, 1 August 1989--31 July 1991. No. DOE/ER/14057-2. Oregon State Univ., Corvallis, OR (United States). Coll. of Oceanography, 1991a.

Egbert, Gary D., and Dean W. Livelybrooks. "Single station magnetotelluric impedance estimation: Coherence weighting and the regression M-estimate." Geophysics 61.4 (1996): 964-970.

Jones, Alan G., and Hartmut Jödicke. "Magnetotelluric transfer function estimation improvement by a coherence-based rejection technique." SEG Technical Program Expanded Abstracts 1984. Society of Exploration Geophysicists, 1984. 51-55.

Smirnov, M. Yu. "Magnetotelluric data processing with a robust statistical procedure having a high breakdown point." Geophysical Journal International 152.1 (2003): 1-7.

@kkappler
Copy link
Collaborator Author

kkappler commented Oct 12, 2021

Notes from a chat with Gary

This is really a segment rejection method

Standard Multiple Coherence:
-R2 is just the fraction of the E field signal power that is Fit
-Use the residuals to weight the data --

Could also use magnetic vs magnetic from remote station. This would just identify segments, it is done before we do the processing pipeline.

You use a much wider band than your estimation band to do these estimations.

The FORTRAN implementation is aimed at single station data where there is no remote.

One way to implement this:

  • For each band that will be processed, select a representative frequency (center frequency)
  • Select a few harmonics from about the center frequency (note that even if your band is only one harmonic wide, we will still need to select some neighboring harmonics, otherwise the coherence for a segment will be degenerate with value = 1)
  • use these "coherence bands" to compute the cross-spectral powers, and those cross spectral powers to in turn give preliminary estimates of the impedance tensor (following for example Sims et al 1971), and use the RR estimates if RR is available
  • Now you can use the impedance tensor to estimate the cross powers (not the components), but estimate it once using auto-powers and once using cross-powers.
  • The cross power estimate will in general be smaller than the auto-power estimate, and the ratio of these estimates is a form of coherence, as it is bounded by [0,1] in absolute value.
  • That is, estimate
    <ExEx*>_1 = Zxx <ExHx*> + Zxy <ExHy*> [Equation 1] where <> denotes cross-power estimates from the fcs
  • But we can also estimate ExEx* another way,
    ...
    Consider that an autopower-based estimate of ExHx can be obtained via:
    <ExHx*>' ~= Zxx <HxHx*> + Zxy <HxHy*>, [Equation 2]
    and similarly we can get an autopower-based estimate of ExHy* from
    <ExHy*>' ~= Zxx <HxHy*> + Zxy <HyHy*>,[Equation 3]
    The estimates from Equations 2 & 3 can then be fed into Equation 1, yielding a second estimate of <ExEx*>_2.

Note that in some implementations, a direct ExEx is multiplied into the "diminished estimate" and the sqrt of their product is taken. In either case, 0 < [<ExEx*>_1 / <ExEx*>_2] < 1.0 and ranking these and selecting a precentile cut from below will increase SNR in the dataset fed to RME.

kkappler added a commit that referenced this issue Oct 15, 2021
Wrote and tested a simple coherence sorting.  Mostly tested on IAK34
dataset but did not implement -- i.e. commented it out in
transfer_function_helpers.py for this commit.

If this commit passes I will test moving mth5, mt_metadata to master branches

[Issue: #119]
@kkappler
Copy link
Collaborator Author

kkappler commented Dec 2, 2022

The task of extracting features (cross-powers and impedance estimates per time-window) should be treated as separate tasks from the coherence sorting. These are necessary, but not sufficient. The features may also find use for other weighting or data QC/viz schemes in future.

Go forward plan:

  • Decide where in the flow this will be implemented
  • A good place would be in process_mth5 right after the merged FC objects are collected and before the TF estimation process begins. This would make compute_coherence_weights a method that would be associated with STFTs rather than TFs, since coherence analysis is a process that should be available whether or not TFs will be computed.
  • The only change this would force is that the weights will need to be shared to process_tf_decimation_level function.
  • Note also, that the coherence measurements do not depend on having all the data in memory, but are computed separately on each segment (time-element of the STFT) so there is room to paralellize

Probably we can make STFT into a class, and then coherence_weights can be a method of that class.

  • A "CrossPower" class should be developed. The parent data come from STFT. The cross power class can manage the "coherency bands"
  • The CrossPower" class estimates the cross power for each time series window
    (there may be some tricky bits if the window length to estimate coherecy differs from that for estimating TFs)
  • A tool for generating the "coherency bands". These can be diffferent from the estimation bands but it is required that there is a mapping from each "estimation band" to a unique "coherence band". Probably use a central frequency and follow Borah et al. 2015 for a "spectral radius"
  • Initial Z-estimates
    The multiple coherences require estimates of the TF in order to take the ratio of predicted to observed power. If the initial estimates are terrible, then multiple coherences may not be much help.
    • OLS
    • Jackknife

Note that issue #316 or a similar solution will likely need to be addressed.
The SigMT version of this seems similar to the method that Gary described.

@kkappler kkappler self-assigned this Dec 22, 2023
kkappler added a commit that referenced this issue Jan 19, 2024
- tidied up workflow in process_transfer_functions
- created placeholders for segment weights
- factored out some methods applied to X, Y, RR for readability
kkappler added a commit that referenced this issue Jan 23, 2024
Add some placeholders for coherence sorting

- frequency_band_helpers.py
  - add some placeholder methods
  - especially Spectrogram class -- a candidate to replace stft_obj in main flow

- kernel_dataset.py:
  - Factor update_survey_metadata method into its own method for better readability

- coherence_weights.py
  - experimental code for Jackkknife coherence weights

- xarray_helpers.py
  - Deprecate unusued cast_3d_stft_to_2d_observations method from xarray_helpers
  - move notes from this method into stack_fcs()
@kkappler
Copy link
Collaborator Author

Consider these two distributions of simple coherence from the same band:
Screenshot from 2024-01-22 19-13-37
Screenshot from 2024-01-22 19-14-35
It turs out that the first one seems like a reasonable candidate for jaackknife coherence, but the second one has some fairly non-intuitive results, in that the ensemble that gives the lowest partial coherence is not necesarily one with particularly poor simple coherence. Something to think about ...
Probably, application of jackknife prematurely not a good idea ... probably should apply simple and or multiple coherence before jackknife.

kkappler added a commit that referenced this issue Jan 23, 2024
- added a note on issue #119 about how jackknife may be better applied
  after simple and multiple coherences
- misc notes/doc in coherence_weights and factoring
- adjust_band_for_coherence_sorting added to frequency_band_helpers
kkappler added a commit that referenced this issue Feb 2, 2024
- have been experimenting with a few approaches to coherence sorting
- have decided on a method using Spectrogram() class _crosspower method
- most of what is in coherence sorting now is just messy, half tested, experimental muck
- but it has a form of what I would like later, and there may be a few nuggets
- so, against my better judgement I am committing it and tagging associated with issue #119
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant