Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize normalizeLongitude method #373

Open
zongweil opened this issue Jan 7, 2020 · 16 comments · Fixed by #460
Open

Optimize normalizeLongitude method #373

zongweil opened this issue Jan 7, 2020 · 16 comments · Fixed by #460

Comments

@zongweil
Copy link
Contributor

zongweil commented Jan 7, 2020

Currently, our normalizeLongitude() methods use a while loop to get the longitude into an acceptable range. However, this can take a long time if the integrating application passes in a really large value for the longitude. We can resolve this while maintaining the same behavior by computing the normalized longitude in a single step. Something like:

if longitude < -180
    divisor = (int) longitude / -360
    longitude = longitude + ((divisor + 1) * 360)
else if longitude > 180
    divisor = (int) longitude / 360
    longitude = longitude - ((divisor + 1) * 360)
if longitude == 180
    longitude = -180

This is the master issue for tracking the work to be done across the various implementations. See #370 for more context.

@kpym
Copy link

kpym commented Jan 14, 2020

You don't need different cases, you can usually use something like mod(longitude + 180, 360) - 180. This code, like yours, has two drawbacks:

  • the code will be quite different between different languages,
  • this code is slower if the longitude is not very far from being normalized, which is, I think, the most used case.

@bocops
Copy link
Contributor

bocops commented Apr 21, 2021

If this is still an open issue, I suggest creating a project for it so that we can track this across all language versions.

@fulldecent
Copy link
Contributor

fulldecent commented Apr 25, 2021

Recommended tag: implementation // specification

@fulldecent
Copy link
Contributor

I am considering to add to the OLC specification that longitude normalization MUST run in O(1) time.

I consider this a security issue.

@fulldecent
Copy link
Contributor

Related #370

@fulldecent
Copy link
Contributor

Added test cases at #444 for vulnerable behavior.

Related: #241

@bocops
Copy link
Contributor

bocops commented Nov 15, 2021

There are still several language versions that normalize via while-loop - for example c, cpp, python - so I'd suggest to keep this open and properly list all language versions that still need an update, as already requested above.

@kpym
Copy link

kpym commented Nov 15, 2021

IMO, normalizing with loops is optimal, especially for longitudes close to the normal form. In any case making this kind of changes without bench-marking (against day to day cases) is not a good idea, I think.

@fulldecent
Copy link
Contributor

Benchmarking is not necessary.

Using an input of 100e100 creates a DOS security vulnerability. This approach fixes it.

Using an extra MOD function increases computation cost by one arithmetic operation. We cannot reasonably expect any platform we are targeting to be harmed by an additional single instruction.

@kpym
Copy link

kpym commented Nov 16, 2021

@fulldecent I see, I wasn't thinking about malicious use but about performance in "standard" cases. Maybe we should keep the best performance in reasonable cases and prevent malicious use by a time-limited algorithm only if the value is too large?

@bocops
Copy link
Contributor

bocops commented Nov 16, 2021

Quick back-of-the-envelope calculation:

The old algorithm has one comparison and one addition/subtraction per loop, plus another comparison for the loop it doesn't need to take. That's ~ 2x+1 operations, with x being the number of "full circles" the non-normalized longitude is off.

The new algorithm has two comparisons, plus if necessary two additions/subtractions, two modulo operations. That's a fixed amount of 6 operations in the worst case.

Looping is strictly worse if x>2, but adding x=1,2 as special cases to the new algorithm would mean that we'd still loop once or twice (2*2+1=5) after performing four comparisons - two to drop out early, two to loop instead of calculating directly. That means we'd optimize a 6 op algorithm by making it a 7 or 9 op algorithm.

@kpym
Copy link

kpym commented Nov 16, 2021

I benchmarked two version of normalize in go :

// the 'loop' version
func normalize(value, max float64) float64 {
  for value < -max {
    value += 2 * max
  }
  for value >= max {
    value -= 2 * max
  }
  return value
}

// the 'mod' version
func normalize2(value, max float64) float64 {
  if value >= -max && value < max {
    return value
  }
  return math.Mod(value+max, 2*max) - max
}

The normalize version is around 4 times faster (3ns vs 14ns) up to value=2000 and continues to be faster up to value=20000. So if we do not want to loose 400% of speed for "normal" values, and want to bound the normalization time, we can combine both methods to obtain :

// threshold for the normalization method
const toobig float64 = 2e4

func normalize(value, max float64) float64 {
  // value's period
  period := 2 * max
  // if the value is too big use modulo
  if value < -toobig || value > toobig {
    return math.Mod(value+max, period) - max
  }
  // else use only additions to normalize
  for value < -max {
    value += period
  }
  for value >= max {
    value -= period
  }
  return value
}

@fulldecent
Copy link
Contributor

Holding your phone closer to your face will make the calculation appear faster because the photons have less distance to travel to your eyes.

That is roughly the magnitude of speed difference between these two algorithms in normal operating cases.

In extreme cases (the attack scenario) one algorithm will execute "immediately" and the other will take longer on modern hardware than the lifespan of the Sun.

@bocops
Copy link
Contributor

bocops commented Nov 16, 2021

I'm not seeing the same when normalizing an array with 1M random entries using the original "non-optimized, looping" algorithm vs. the "non-looping" algorithm in Kotlin/JVM.

If entry values are at least somewhat reasonable (tested with +/- 500, 1000, 10000), looping over the array with either algorithm finishes in about the same time with negligible difference. If entry values are decidedly not reasonable (tested with +/- 5000000), the looping algorithm takes 50x as much time, which is exactly the DOS-style behaviour that prompted this whole issue.

At this point, I wonder what exactly we're optimizing for? If it is individual plus code encodings, even a 400% increase might not be noticed by anyone if we're talking about microseconds, so avoiding DOS and code readability should be priority. If it is encoding millions of entries we're concerned with, it might be better to sanitize those entries before encoding them.

For what it's worth, consider that applications such as Google Maps simply refuse to deal with non-normalized coordinates as input, so something like that is not really a concern.

@fulldecent
Copy link
Contributor

This is not an optimization. This is a fix for a specific problem.

Anybody can optimize the time it takes to get in their house by removing the door.


Converting 1M Plus Codes is not a normal use case. That is a batch operation run in a server non-interactively.

If this was run interactively, then again the server would have more resources, parallel task running and again this would be on the order of time of a photon moving between a phone and an eyeball.

@sonyaa
Copy link
Contributor

sonyaa commented Nov 18, 2021

Reopening to track fixes for other languages.

@sonyaa sonyaa reopened this Nov 18, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants