Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fraction computation with large times produces strange timestamps which are not printed sanely #59

Closed
addisoncrump opened this issue Jul 30, 2024 · 2 comments · Fixed by #76
Labels
bug Something isn't working

Comments

@addisoncrump
Copy link

addisoncrump commented Jul 30, 2024

The following testcases trigger the observed behaviour:

pT026733320132.65M
PT834662292.2H
PT22444444452,6m
Pt843517081,1H

Looking a bit further, it seems that the routine for parsing fractional time is a bit strange in the presence of large values. Specifically (for the case of minutes), this code region:

if minutes > t::SpanMinutes::MAX_SELF {
    nanos +=
        (minutes - t::SpanMinutes::MAX_SELF) * t::NANOS_PER_MINUTE;
    minutes = t::NoUnits128::rfrom(t::SpanMinutes::MAX_SELF);
}

This code is explained by the following text:

The above is why we have `if unit_value > MAX { <do adjustments> }` in
the balancing code below. Basically, if we overshoot our limit, we back
out anything over the limit and carry it over to the lesser units. If
our value is truly too big, then the final call to set nanoseconds will
fail.

It is unclear to me why the units would be backfilled to the smallest unit. Shouldn't this be implemented in the other direction, preferring larger units for larger values?

Regardless, the result is that the nanos value is set to something large. For the first testcase, the resulting computed timestamp is printed as PT10518456960m972891790359s. This value is correct, but not parseable because the seconds unit is out of range:

>>> int(26733320132.65*60)
1603999207959
>>> 10518456960*60+972891790359
1603999207959

Note that in memory, this value is actually:

Span { minutes: 10518456960, seconds: 631107417600, milliseconds: 341784372759000, .. }

So it would appear that this issue is in part induced by the printer implementation.

This was caught by #57 because the printer emitted invalid timestamps.

@addisoncrump
Copy link
Author

To be clear, I think part of the problem here is that the printed duration now reads quite strangely. 10518456960 minutes AND 972891790359s? This is quite unintuitive, since I would never expect to need to compute the actual number of minutes based on the number of seconds as well. In other words, I think it is confusing that the seconds value ever is greater than or equal to 60 when minutes are presented, and similarly for other units.

@addisoncrump addisoncrump changed the title Fraction computation with large times produces strange timestamps which are not printed correctly Fraction computation with large times produces strange timestamps which are not printed sanely Jul 30, 2024
@BurntSushi
Copy link
Owner

Catching up here, here is a concrete program that fails that I believe should succeed:

use jiff::{Span, Unit};

fn main() -> anyhow::Result<()> {
    let span1: Span = "Pt843517081,1H".parse()?;
    let iso = span1.to_string();
    eprintln!("span1 serialized: {iso:?}");
    let span2: Span = iso.parse()?;
    assert_eq!(span1.total(Unit::Hour)?, span2.total(Unit::Hour)?);

    Ok(())
}

Output:

$ cargo -q r
span1 serialized: "PT175307616h10518456960m1774446656760s"
Error: failed to parse ISO 8601 duration string into `Span`: failed to set value 1774446656760 as second unit on span: parameter 'seconds' with value 1774446656760 is not in the required range of -631107417600..=631107417600

So this looks to me like there is a real bug here. Nice find! I'll investigate more later.

But yes, there are some shenanigans happening here:

  • Spans like 1 minute 120 seconds are unbalanced and weird, but totally valid. Serializing them should generally proceed as-is whenever possible.
  • There is a hiccup with ISO 8601 where it doesn't represent fractional seconds as individual units of milliseconds, microseconds and nanoseconds. This results in a loss of fidelity when serializing Span values, but is a necessary trade-off for ISO 8601 support. However, the actual elapsed time represented by the span should never change.
  • Each unit of a Span has minimum and maximum values that are less than the minimum and maximum Span value. So you can use the maximum amount of hours, but still add minutes on top of that to represent the span. And that is likely the source of the unbalanced spans you're seeing.

But I can't investigate more closely right now. I'll do so later.

@BurntSushi BurntSushi added the bug Something isn't working label Jul 30, 2024
BurntSushi added a commit that referenced this issue Aug 3, 2024
Because Jiff spans (like Temporal durations) represent each unit
individually, and because ISO 8601 require the use of fractional seconds
to represent sub-second units, there is an inherent loss of fidelity
when spans are serialized to the ISO 8601 duration format.

For example, a Jiff span like `1 second 2000 milliseconds` is distinct from
`3 seconds` even though they represent the same amount of elapsed time.
These types of spans are unbalanced.

When a span like `1 second 2000 milliseconds` is ISO 8601 serialized,
it's turned into `PT3s` because there simply is no way to represent
greater-than-second durations in smaller-than-second units in the ISO
8601 duration format. So when `PT3s` is deserialized, Jiff of course has
no idea that `1 second 2000 milliseconds` was actually intended, and
thus treats it as `3 seconds`.

On top of this, Jiff imposes fairly strict limits on the individual
units of a `Span`. Other than nanoseconds, every individual unit can
express the full range of time supported by Jiff (`-9999-01-01` through
`9999-12-31`) and nothing more. So if one serializes a span to the ISO
8601 format with large millisecond, microsecond or nanosecond
components, those have to be folded into the larger hour, minute or
second units. This in turn can create a ISO 8601 duration whose hour,
minute and second units exceed Jiff's unit limits. So in order to
preserve the ability to at least roundtrip all Jiff spans (even if the
individual unit values are lost), Jiff will automatically rebalance
these "larger" units into smaller units at parse time.

This is a big complicated mess and it turns out I got one part of this
wrong: I was only re-balancing units at parse time when we parsed a
fractional component. But in fact, we should be re-balancing spans even
when there isn't a fractional component. Namely, the milliseconds,
microseconds and nanosecond components can add up to a whole number of
seconds, resulting in a whole number of seconds in the corresponding ISO
8601 duration.

This bug was found by @addisoncrump's fuzzer. Nice work.

Fixes #59
BurntSushi added a commit that referenced this issue Aug 3, 2024
Because Jiff spans (like Temporal durations) represent each unit
individually, and because ISO 8601 require the use of fractional seconds
to represent sub-second units, there is an inherent loss of fidelity
when spans are serialized to the ISO 8601 duration format.

For example, a Jiff span like `1 second 2000 milliseconds` is distinct from
`3 seconds` even though they represent the same amount of elapsed time.
These types of spans are unbalanced.

When a span like `1 second 2000 milliseconds` is ISO 8601 serialized,
it's turned into `PT3s` because there simply is no way to represent
greater-than-second durations in smaller-than-second units in the ISO
8601 duration format. So when `PT3s` is deserialized, Jiff of course has
no idea that `1 second 2000 milliseconds` was actually intended, and
thus treats it as `3 seconds`.

On top of this, Jiff imposes fairly strict limits on the individual
units of a `Span`. Other than nanoseconds, every individual unit can
express the full range of time supported by Jiff (`-9999-01-01` through
`9999-12-31`) and nothing more. So if one serializes a span to the ISO
8601 format with large millisecond, microsecond or nanosecond
components, those have to be folded into the larger hour, minute or
second units. This in turn can create a ISO 8601 duration whose hour,
minute and second units exceed Jiff's unit limits. So in order to
preserve the ability to at least roundtrip all Jiff spans (even if the
individual unit values are lost), Jiff will automatically rebalance
these "larger" units into smaller units at parse time.

This is a big complicated mess and it turns out I got one part of this
wrong: I was only re-balancing units at parse time when we parsed a
fractional component. But in fact, we should be re-balancing spans even
when there isn't a fractional component. Namely, the milliseconds,
microseconds and nanosecond components can add up to a whole number of
seconds, resulting in a whole number of seconds in the corresponding ISO
8601 duration.

This bug was found by @addisoncrump's fuzzer. Nice work.

Fixes #59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants