-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large numbers in magnitudes cause compiler crashes or long compilation times #217
Comments
Very well spotted! This is a known issue, although not previously tracked in this project. We encountered this in the mp-units project after I contributed vector space magnitudes there (mpusz/mp-units#300). The problem is more practical than you might imagine! Once you start using more "extreme" units, such as those found in subatomic or cosmic scales, you find magnitudes whose smallest prime factor is surprisingly large, and you hit this compiler iteration limit. In mp-units, I assumed this problem was due to naive trial division, so I learned about wheel factorization and implemented it there. Unfortunately, it seems that for wheel division, the space requirements grow very rapidly while the iteration count reduces only slowly. Ultimately, while it might make find some answers a little faster, it's not a solution to this problem. Our current plan is to write a paper proposing a new standard library function, perhaps something like I'm planning to finish this paper before the Feb. 15 mailing deadline, and have it discussed at the Tokyo meeting in March. Pollard's rho algorithm sounds fascinating and exciting! I'm a little hesitant to code it up in this project just yet, because of its pseudorandom character. It sounds like it can fail, and I'm not confident I could write test cases that correctly exercise these failure cases, and cover the selection of a new parameter for the retry. Perhaps we'll see how things develop with the paper. Perhaps we'll bite the bullet and try Pollard's rho in the meantime, ideally with plenty of tests and perhaps also fuzzing to gain enough confidence. Perhaps both! |
What piqued my interest the most in your post was the idea of implementing vector space magnitudes in rust. In C++, Au and mp-units use variadic templates for the implementation. But I had thought rust didn't support variadic templates! (I've never used rust.) So how can you implement vector space magnitudes? The reason I'm familiar with this is because I checked out the iliekturtles/uom project when it briefly ascended to the top of the After thinking it through, I speculated that they were more-or-less forced into this design choice because rust's lack of variadic templates prevents them from being able to represent unit magnitudes in the type. (Yes, they could have used some analogue of |
That's very interesting to hear, thanks! I was briefly looking at parsec or mole to check if their conversion factors to SI happened to have large prime factors, but then gave up on the search. Out of curiosity, is there a specific one you could point to?
I was initially under the impression that it would always succeed if retried with (a number of) different starting values, but after some more googling I realize now that that assumption is wrong, so you're right. I think for my purposes, I can "simply" choose to use slow trial division (or throw a compiler error) if Pollard's rho doesn't find a factor but the primality test says that the number is composite. That is not exactly a great solution, but it could be an improvement, since the small primes can still be covered by trial division, so everything that worked before will still work. Note that the normal Miller-Rabin test is also probabilistic, but apparently it's been shown to always give the correct result for 64 bit ints if a large enough set of witnesses are tested. I won't claim to have double checked that proof though. I'd be curious to know if fuzzing can help with this problem. Very handwavy again, but I wouldn't be surprised if the unpredictable nature of integer factorization prevents fuzzing from slowly encroaching on a failing test case.
So, I should preface this by saying that I am just in the process of implementing this and haven't seen all the parts come together, but I think that I have verified that all individual components of my implementation should work. My library works using const generics. The currently stable version of that feature only allows one to write code that is generic over ints/bool/chars. However, there are also unstable features that allow one to use generic const expressions and to have "arbitrary" structs as const generics, which is what my library uses. These still only allow a tiny subset of what C++ templates can do, but it definitely makes things a little easier. Now, real variadics don't exist, so my current plan is to express the vector space as a fixed size array of at most 15 integer factors (plus special entries for pi/zero/...). 15 since that should make all 64 bit integers representable (the primorial numbers are the worst case here and the number 32589158477190044730, which is the product of the first 16 primes is already too large for 64 bits).
I think your assumption is mostly correct. I don't know how exactly the decision was made in However, I'm afraid mine and all other Rust dimensional-analysis libraries that I am aware of currently also represent quantities by converting them down to a base representation. For now, the reason I want the compile-time magnitudes is mainly to make constructing quantities easier (again, inspired by your talk - I want to be able to write I don't think it is necessarily the lack of variadic templates that is keeping us from doing this properly - with const generics, we could choose to make our type generic over the exponent of every available unit (i.e. instead of
I'm curious what exactly you mean by this point. My impression was that the main advantage of proper unit safety was to make it possible to simultaneously represent quantities in units that are not easily convertible into one another (for example because their ratio is non-integral, or it might overflow the underlying storage type, ...), but now I am thinking I might have understood this incorrectly. |
Here, have two! I tracked down the motivating units.
Thanks for the interesting explanation! I don't yet have any experience writing rust, so I appreciate the insights. I wonder if you might want to (conservatively) double that size of 15? After all, you'll very commonly need to represent ratios, and the numerator and denominator won't have any prime factors in common. Of course, even this wouldn't be enough because you could have fractional exponents, and magnitudes that could be represented in
Yes --- definitely! The robust unit safety is IMO clearly worth the cost of forcing users to change their computations in order to get it. Of course, it'd be even better if you could get the benefit without the cost, but that seems pretty challenging for rust at the moment.
Let me elaborate. What I had in mind here was the principle that I referred to in my talk as "same program, only safer". Imagine a user writing a program without a units library. Let's suppose that their preferred choice for some particular length variable is to use an Now let's say they want to use a units library so they can prevent unit errors. Some libraries have one unit for each dimension. I believe uom is like this. So in switching to this units library, they would have to convert their value to meters. (Obviously, this is majorly lossy in this case. It's not clear to me whether uom would raise a compiler error, or silently truncate.) Some other libraries have one storage type for all user facing types. The nholthaus C++ library is like this. So in this case, the user can store their value natively in mils, but they will have to use What "same program, only safer" means is that the quantity type should support both the same unit and storage type that the user would have had without any units library. In this case, something like Au's |
Thanks for the nice examples! I especially like that I do have the definition of the proton mass in the code for which I wrote my library, so it is quite likely that I would have run into this issue.
That's true, in order to do ratios I will probably have to go to 30 basis vectors.
Thank you, I understand your point now. I believe implementing the "same program, only safer"-approach in Rust could be possible, but only at the cost of lots of convenience (and error message quality), which could make it a bit harder to convince people to move to a unit library. This is definitely something to think about though. |
This is more of a curiosity than an actual issue, so feel free to close this if it doesn't seem relevant.
I was inspired by the nice talk you (@chiphogg ) gave at CppCon to implement vector space magnitudes in my rust compile time units library (not because the library is mature enough to really need them, but because rusts const-exprs like integers a lot better than floats).
While doing so I stumbled over the problem that factorizing the integers in the magnitude in the standard/naive way via trial division runs into performance problems (especially if the user really wants to run into performance problems). In my library, the magnitudes are currently read from the exponent / mantissa of a float, but the largest primes that fit into the 52+1 bit mantissa of a double (9007199254740881 for example), can already take a really long time. While implementing some more sophisticated algorithms, I checked what
au
does to solve this issue and realized that it also uses the standard factorization algorithm, so I tried to confirm my suspicion and compiled the following code:if compiled with
clang
, this givesand with
gcc
, I getit eventually compiles with
g++ -fconstexpr-loop-limit=2147483647 -fconstexpr-ops-limit=2147483647
but takes about ~100s on my machine. Putting in a ~64bit prime should take about an hour.
This is probably not much of a real-world concern since I maliciously wrote in large prime numbers, but if somebody, for example, were to autogenerate their unit definitions from a bunch of sensor data, they might get unlucky and accidentally stumble into this. Probably not.
There are much faster algorithms that handle 64 bit integers just fine. If you're at all interested in fixing this, I am by no means knowledgeable in number theory, but I implemented the Miller Rabin test to check for primality first and then the Pollard's Rho algorithm to factor the composite numbers and with some tweaking, that seems to work fine.
The text was updated successfully, but these errors were encountered: