Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move to numpy 2.0 #42

Open
CSchoel opened this issue Aug 6, 2024 · 8 comments · Fixed by #56
Open

Move to numpy 2.0 #42

CSchoel opened this issue Aug 6, 2024 · 8 comments · Fixed by #56
Assignees

Comments

@CSchoel
Copy link
Owner

CSchoel commented Aug 6, 2024

This is just a note to myself: I've seen some tests for the Lorenz system breaking under numpy >= 2.0. I'll need to investigate what has changed there and make sure we can support newer versions of numpy.

@CSchoel CSchoel self-assigned this Aug 6, 2024
@CSchoel CSchoel added this to the Release nolds 1.0 milestone Aug 10, 2024
@CSchoel
Copy link
Owner Author

CSchoel commented Aug 10, 2024

We need to address this at some point. For now, we should make sure that we have regression tests that run all algorithms with settings that make them deterministic and check for exact result values. This should hopefully make us bulletproof against accidentally changing the output by updating a dependency.

@bramiozo
Copy link

What errors do you get with numpy>=2 ?

I was surprised to see my numpy being downgraded during a poetry install :D.

@toni-neurosc
Copy link

Hi, we're using Nolds in our project (PyNeuromodulation) and I was wondering what the issue with moving to Numpy 2 is. Nolds is the only package that is downgrading us to Numpy 1.26 right now, which is not a big deal but I was wondering if there's anything I can do to help with the migration here, or what are the tests that are failing right now.

@toni-neurosc
Copy link

I have done a bit of digging into the issue and I have pinpointed the problem to the function datasets.lorenz_eurler, it seems to be basically a problem of casting between dtypes float32 and float64, more specifically in how the intermediate results of the following calculation are being interpreted:

    return np.array([
      sigma * (y - x), 
      rho * x - y - x * z,
      x * y - beta * z
    ], dtype="float32")

This is what i get in Numpy 1.26

Types: 
 x=<class 'numpy.float32'> 
 y=<class 'numpy.float32'> 
 z=<class 'numpy.float32'> 
 sigma=<class 'int'> 
 rho=<class 'int'> 
 beta=<class 'float'>
Values: x=1.0 
  y=1.0 
  z=1.0 
  sigma=10 
 rho=28 
 beta=2.6666666666666665
Intermediate types: 
 sigma * (y - x): <class 'numpy.float64'> 
 rho * x - y - x * z: <class 'numpy.float64'> 
 x * y - beta * z: <class 'numpy.float64'>
Result type: float32

But in Numpy 2.0

Input types: 
 x=<class 'numpy.float32'> 
 y=<class 'numpy.float32'> 
 z=<class 'numpy.float32'> 
 sigma=<class 'int'> 
 rho=<class 'int'> 
 beta=<class 'float'>

Intermediate types: 
 sigma * (y - x): <class 'numpy.float32'> 
 rho * x - y - x * z: <class 'numpy.float32'> 
 x * y - beta * z: <class 'numpy.float32'>
 
Result type: float32

It seems that Numpy 1.26 was either casting the inputs or the results of the intermediate calculations to np.float64, while in Numpy 2.0 the precision is maintained between inputs and outputs. This change is documented in the Numpy 2.0 migration guide: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion

The floating point error is basically accumulating over the iterations, producing a different result in each version of Numpy.

I'm not sure if this is a bug in the lorenz function or in the test output. I would probably just calculate everything in float64 dtype to maintain as much precision as possible, but then the test fails to match the expected result.

@CSchoel
Copy link
Owner Author

CSchoel commented Sep 30, 2024

Nice find! Thank you very much for putting in the work to dig through the code and the numpy changelog. 🙏 👍

I figured it would be something minor like that, since the test results are not far off from the expected value. To safeguard against this, I want to create regression tests in #50. I'll prioritize this issue to double-check that there aren't any other changes introduced with numpy 2.0.

I didn't plan to release another version between 0.6.0 and 1.0.0 (see https://github.com/CSchoel/nolds/milestone/1), but if the downgrade to numpy < 2.0 causes issues, I can try to fit in a 0.6.1 that makes nolds fully compatible with numpy 2.0.

@CSchoel
Copy link
Owner Author

CSchoel commented Sep 30, 2024

Good news: After implementing the regression tests and checking them for separate versions of numpy and scikit-learn, I can confirm that none of the algorithms behave differently. It's only the code for the Lorenz system itself that seems to be affected - which makes sense, since a chaotic system per definition is sensitive to small changes in its parameters. 😄

I think I should be able to publish version 0.6.1 with relaxed version restrictions without issues.

@toni-neurosc
Copy link

Hi @CSchoel, glad that you figured out that the functionality isn't broken between versions of Numpy. Our program doesn't really break by keeping 1.26, but it makes it hard to do some optimizations like calling internal Numpy functions to skip checks (our data is already validated) and save some time during real time data processing, because the internal implementations have changed slightly between versions. Thank you for taking the time to fix this, and have a nice day!

@CSchoel CSchoel mentioned this issue Oct 1, 2024
@CSchoel
Copy link
Owner Author

CSchoel commented Oct 1, 2024

I just released version 0.6.1 in #59. Please let me know if it works. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants