Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle time normalization for nonexistent and ambiguous times #2133

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

scttnlsn
Copy link

@scttnlsn scttnlsn commented Jul 15, 2024

  • Closes NonExistentTimeError when calling hour_angle #2132
  • I am familiar with the contributing guidelines
  • Tests added
  • Adds description and name entries in the appropriate "what's new" file in docs/sphinx/source/whatsnew for all changes. Includes link to the GitHub Issue with :issue:`num` or this Pull Request with :pull:`num`. Includes contributor name and/or GitHub username (link with :ghuser:`user`).
  • Pull request is nearly complete and ready for detailed review.
  • Maintainer: Appropriate GitHub Labels (including remote-data) and Milestone are assigned to the Pull Request and linked Issue.

@cwhanse cwhanse added the bug label Jul 15, 2024
@cwhanse cwhanse added this to the v0.11.1 milestone Jul 15, 2024
Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the whitespace and LGTM

@scttnlsn
Copy link
Author

Started running the code from this branch on a real problem and realized the handling of ambiguous='infer' is not sufficient. The issue can be reproduced in the following test case:

eot = np.array([-3.935172, -4.117227, -4.026295])
longitude = 82.3666
times = pd.DatetimeIndex([
    '2014-11-02 09:00:00',
    '2014-11-02 10:00:00',
    '2014-11-02 11:00:00',
]).tz_localize('America/Havana')
solarposition.hour_angle(times, longitude, eot)

We get:

pytz.exceptions.AmbiguousTimeError: There are 2 dst switches when there should only be 1.

I think the infer piece requires exactly 2 duplicate times but we may have more in our times array. Not sure what the best solution is at this point. Any ideas?

@scttnlsn
Copy link
Author

scttnlsn commented Jul 17, 2024

I suppose we could look at the UTC offsets to build a list of bools indicating which times are DST. We'd then pass ambigious=dst_list (https://pandas.pydata.org/docs/reference/api/pandas.Series.tz_localize.html).

EDIT: looks like the tzinfo already has info about DST

@scttnlsn scttnlsn requested a review from cwhanse July 17, 2024 15:11
@cwhanse
Copy link
Member

cwhanse commented Jul 17, 2024

Hmm...I see that I didn't think enough about the tests :) According to the pandas docs infer requires the timestamps to be monotonic, which they aren't after normalize.

@cwhanse
Copy link
Member

cwhanse commented Jul 17, 2024

This passes tests, but is likely slow due to the looping. I'm now thinking that perhaps ambiguous=NaT would be better.

The issue arises only when a user wants to use DST-aware timestamps in a few places and is not careful when constructing the series of timestamps. It seems better to suggest the user localize using a non-DST timezone such as GMT-4. As far as importing data recorded with DST timestamps, if the timestamps are correct, the ambiguous times won't be in the data.

@scttnlsn
Copy link
Author

It seems better to suggest the user localize using a non-DST timezone such as GMT-4

That's a good point. In my own use case I could handle the DST stuff before passing the times into hour_angle.

I'm now thinking that perhaps ambiguous=NaT would be better.

I think that would result in nans being returned from hour_angle which I think could be confusing. I think even letting the AmbiguousTimeError be raised is fine if the documentation indicates that DST should be handled outside pvlib. That way the user is forced to address what's happening instead of maybe being confused about receiving nans.

@cwhanse
Copy link
Member

cwhanse commented Jul 17, 2024

I think even letting the AmbiguousTimeError be raised is fine if the documentation indicates that DST should be handled outside pvlib. That way the user is forced to address what's happening instead of maybe being confused about receiving nans.

+1

@scttnlsn
Copy link
Author

Do you think I should just close this PR then? For my own use case, if I'm handling the DST stuff outside pvlib anyway then I don't expect to see a NonExistentTimeError either. If you'd like to keep the nonexistent='shift_forward' I can just remove the ambiguous=is_dst stuff.

Copy link
Member

@cwhanse cwhanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still want to handle the NonExistentTimeError case, but lets raise the AmbiguousTimeError.

pvlib/solarposition.py Outdated Show resolved Hide resolved
pvlib/solarposition.py Outdated Show resolved Hide resolved
@cwhanse
Copy link
Member

cwhanse commented Jul 22, 2024

@scttnlsn can you add yourself to the Contributors list in the whatsnew file? I can do that for you, if needed.

Copy link
Member

@kandersolar kandersolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @scttnlsn for the PR! I cannot stand these bizarre timezone issues...

I think even letting the AmbiguousTimeError be raised is fine if the documentation indicates that DST should be handled outside pvlib

Do we still want to add something to the docstring along these lines? I don't see any changes there currently.

pvlib/tests/test_solarposition.py Show resolved Hide resolved
pvlib/solarposition.py Outdated Show resolved Hide resolved
Co-authored-by: Adam R. Jensen <[email protected]>
@cwhanse
Copy link
Member

cwhanse commented Sep 11, 2024

@pvlib/pvanalytics-maintainer if no objections are raised, I will merge this on 9/13

@yhkee0404
Copy link
Contributor

yhkee0404 commented Sep 12, 2024

What about _local_times_from_hours_since_midnight and _times_to_hours_after_local_midnight? I think they too are subject to the same bug of tz_localize and normalize and can be fixed together in the same way easily.

def _local_times_from_hours_since_midnight(times, hours):
"""
converts hours since midnight from an array of floats to localized times
"""
tz_info = times.tz # pytz timezone info
naive_times = times.tz_localize(None) # naive but still localized
# normalize local, naive times to previous midnight and add the hours until
# sunrise, sunset, and transit
return pd.DatetimeIndex(
naive_times.normalize() + pd.to_timedelta(hours, unit='h'), tz=tz_info)

def _times_to_hours_after_local_midnight(times):
"""convert local pandas datetime indices to array of hours as floats"""
times = times.tz_localize(None)
hrs = (times - times.normalize()) / pd.Timedelta('1h')
return np.array(hrs)

Comment on lines +1405 to +1407
# Use Pandas functionality for shifting nonexistent times forward
normalized_times = naive_normalized_times.tz_localize(
times.tz, nonexistent='shift_forward', ambiguous='raise')
Copy link
Contributor

@yhkee0404 yhkee0404 Sep 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about this with the modification of def hour_angle(times, longitude, equation_of_time) into def hour_angle(times, longitude, equation_of_time, nonexistent='shift_forward', ambiguous='raise') to give users a choice to control the tz_localize behavior if they want rather than just receiving an error and being blocked?

Suggested change
# Use Pandas functionality for shifting nonexistent times forward
normalized_times = naive_normalized_times.tz_localize(
times.tz, nonexistent='shift_forward', ambiguous='raise')
# Use Pandas functionality for shifting nonexistent times forward
normalized_times = naive_normalized_times.tz_localize(
times.tz, nonexistent=nonexistent, ambiguous=ambiguous)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@scttnlsn could you comment on this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user would not necessarily be blocked if the exception is raised. They'd just need to convert the given times to a non-DST timezone ahead of calling hour_angle. The documentation is updated to mention this.

infer won't really work as mentioned here: #2133 (comment) The other options are to raise (current approach) or to have NaNs returned where the times are ambiguous (the user still likely needs to deal with that case).

I'm personally of the opinion that forcing the user to convert to a non-DST timezone is a fine approach but I could understand someone might want to have NaNs instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NonExistentTimeError when calling hour_angle
5 participants