Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hurst analysis #55

Open
lcoandrade opened this issue Oct 1, 2024 · 5 comments
Open

Hurst analysis #55

lcoandrade opened this issue Oct 1, 2024 · 5 comments

Comments

@lcoandrade
Copy link

I'm trying to analyze meteorological data with Nolds, using Hurst (R/S, DFA and GHE) with this code:

    def compute_Hurst(self):
        hurst_dict = {}

        # Computing Hurst using nolds package
        self.data["T2M"] = self.data[["T2M_MAX", "T2M_MIN"]].mean(axis=1)

        total = len(self.data["T2M"])

        # Calculating Hurst using Hurst exponent (hurst_rs)
        nstepss = np.arange(15, 31)
        nvalss = [
            nolds.logmid_n(total, ratio=1 / 4.0, nsteps=nsteps) for nsteps in nstepss
        ]
        hurst_rs = [
            nolds.hurst_rs(self.data[self.variable], nvals=nvals) for nvals in nvalss
        ]
        hurst_dict["rs_max"] = max(hurst_rs)
        hurst_dict["rs_min"] = min(hurst_rs)

        # Calculating Hurst using detrended fluctuation analysis (DFA) (dfa)
        factors = [1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2]
        nvalss = [nolds.logarithmic_n(4, 0.1 * total, factor) for factor in factors]
        hurst_dfa = [
            nolds.dfa(self.data[self.variable], nvals=nvals) for nvals in nvalss
        ]
        hurst_dict["dfa_max"] = max(hurst_dfa)
        hurst_dict["dfa_min"] = min(hurst_dfa)

        # Calculating Hurst using Generalized Hurst Exponent (mfhurst_b)
        factors = [1.2, 1.3, 1.4, 1.5, 1.6, 1.7, 1.8, 1.9, 2]
        distss = [
            nolds.logarithmic_n(1, max(20, 0.02 * total), factor) for factor in factors
        ]
        hurst_ghe = [
            nolds.mfhurst_b(self.data[self.variable], dists=dists) for dists in distss
        ]
        hurst_dict["ghe_max"] = max(hurst_ghe)
        hurst_dict["ghe_min"] = min(hurst_ghe)

        # Plotting the data
        plt.figure(figsize=(10, 6))
        plt.plot(
            nstepss,
            hurst_rs,
            color="b",
            label="Hurst R/S",
        )
        plt.title(f"Hurst R/S (Station: {self.config['station']})")
        plt.xlabel("Subserie size")
        plt.ylabel("Hurst exponent")
        plt.legend()
        plt.savefig(
            fname=self.config["hurst_rs_plot_path"], format=self.config["plot_format"]
        )
        plt.close()

        plt.figure(figsize=(10, 6))
        plt.plot(
            factors,
            hurst_dfa,
            color="r",
            label="Hurst DFA",
        )
        plt.title(f"Hurst DFA (Station: {self.config['station']})")
        plt.xlabel("Factor")
        plt.ylabel("Hurst exponent")
        plt.legend()
        plt.savefig(
            fname=self.config["hurst_dfa_plot_path"], format=self.config["plot_format"]
        )
        plt.close()

        plt.figure(figsize=(10, 6))
        plt.plot(
            factors,
            hurst_ghe,
            color="g",
            label="Hurst GHE",
        )
        plt.title(f"Hurst GHE (Station: {self.config['station']})")
        plt.xlabel("Factor")
        plt.ylabel("Hurst exponent")
        plt.legend()
        plt.savefig(
            fname=self.config["hurst_ghe_plot_path"], format=self.config["plot_format"]
        )
        plt.close()

        return hurst_dict

I'm trying to see the effects of different values. For R/S and DFA, I'm getting all values above 1.0 (it was supposed to go until 1.0, right?) and for GHE I'm getting low values (less than 0.2). Why Is this happening?

This is the data I'm using:
el_carmen.csv

These are the Hurst graphs:
el_carmen_hurst_dfa.pdf
el_carmen_hurst_ghe.pdf
el_carmen_hurst_rs.pdf

Can you help to understand?

@CSchoel
Copy link
Owner

CSchoel commented Oct 1, 2024

For hurst_rs and mfhurst_b, the theoretical limit is 1, yes. For dfa there can be higher numbers, though. See this part of the documentation:

the estimate alpha for the Hurst parameter (alpha < 1: stationary process similar to fractional Gaussian noise with H = alpha, alpha > 1: non-stationary process similar to fractional Brownian motion with H = alpha - 1)

While I've never seen values above 1 before myself, I notice that your outputs are still close to 1. I would imagine that the reason for this is just irregularities in the line fitting process. There are two things you can try to diagnose this:

  • Set the parameter fit="poly" for hurst_rs and fit_exp="poly" for dfa. The default setting is to use RANSAC, which is an algorithm involving randomness to be more robust against outliers. In the future, I'll use fit="poly" as default for all measures in nolds. After having a bit more experience with these algorithms, I'm not sure RANSAC really helps so much.
  • Most nolds measures have a parameter debug_plot, which you can set to true to see what is happening before the final estimate step. In hurst_rs, dfa, and mfhurst_b this will show you a plot of the line fitting. This should give a lot of insight into how "stable" the line fitting is and how well you chose the other parameters.

@lcoandrade
Copy link
Author

lcoandrade commented Oct 1, 2024

Thanks for your help!!!

Changing to poly gave me no changes.

The debug plots seem ok to me. Here some examples:
R/S:
el_carmen_hurst_rs_0.pdf
el_carmen_hurst_rs_1.pdf

DFA:
el_carmen_hurst_dfa_0.pdf
el_carmen_hurst_dfa_1.pdf

GHE:
el_carmen_hurst_ghe_0.pdf
el_carmen_hurst_ghe_1.pdf

@lcoandrade
Copy link
Author

One question, should I remove the mean from the series before running Hurst?

@lcoandrade
Copy link
Author

An improvement. I've changed the nvals calculation to the classical division by powers of 2.

I was using your logmid_n method as follows:

nstepss = np.arange(15, 31)
nvalss = [
    nolds.logmid_n(total, ratio=1 / 4.0, nsteps=nsteps) for nsteps in nstepss
]
"""
hurst_rs = [nolds.hurst_rs(data, nvals=nvals, fit="poly") for nvals in nvalss]
ltm_dict["rs_max"] = max(hurst_rs)
ltm_dict["rs_min"] = min(hurst_rs)

The idea was to check the Hurst behavior for different nvals.
The output graph was this for the station that I uploaded above:
el_carmen_hurst_rs.pdf

Changing to the divisions in powers of 2 (i.e 2ˆk for) like this:

# Calculating a series of nvals of powers of 2
# max_power is the maximum k such as 2ˆk <= total/2 (i.e. we want the maximum k that divides the series in 2 parts)
max_power = int(np.log2(total / 2))
powers = np.arange(4, max_power + 1)
nvalss = [2 ** np.arange(1, power + 1) for power in powers]

Gave this output graph:
el_carmen_hurst_rs.pdf

This seems acceptable and according to a Matlab calculation made by my colleague.

What are your thoughts on this?

@CSchoel
Copy link
Owner

CSchoel commented Oct 9, 2024

Hi @lcoandrade. Sorry for my sudden silence. My time for working on nolds is quite limited. I originally planned to spend a few more days on it over last and this week, but something else came up instead.

I'll definitely come back to this (and the other) issue eventually, but it may take me quite some time. 🙈 I hope you can manage yourself. There are a few notes I took on parameter settings in the examples.py file that may help.

nolds/nolds/examples.py

Lines 484 to 492 in b91761c

# Rationale for argument values:
# start with medium settings for min_tsep and lag, span a large area with trajectory_len, set fit_offset to 0
# up the embedding dimension until you get a clear line in the debug plot
# adjust trajectory_len and fit_offset to split off only the linear part
# in general: the longer the linear part of the plot, the better
lyap_r_args = dict(min_tsep=10, emb_dim=5, tau=dt, lag=5, trajectory_len=28, fit_offset=8, fit="poly")
lyap_rx = nolds.lyap_r(data[:, 0], **lyap_r_args)
lyap_ry = nolds.lyap_r(data[:, 1], **lyap_r_args)
lyap_rz = nolds.lyap_r(data[:, 2], **lyap_r_args)

Updates from your side are very much appreciated as any real-world use case may help debugging potential existing issues in the software.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants