Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support np.memmap #150

Open
kwsp opened this issue Sep 5, 2024 · 2 comments
Open

Support np.memmap #150

kwsp opened this issue Sep 5, 2024 · 2 comments

Comments

@kwsp
Copy link

kwsp commented Sep 5, 2024

I am trying to migrate some large datasets to nrrd and indeed pynrrd, and one thing I desire is to have nrrd.read_data support np.memmap. All my data is saved raw without compression, and say I get gigabytes of data in a file, it's much faster for me to do a np.memmap where I'll then read the segment of data I need, rather than loading everything into memory first.

@addisonElliott
Copy link
Collaborator

Definitely sounds like a useful feature. Some questions/comments.

Sounds like it should only apply when encoding is raw. Should it apply to ASCII/text encodings? Not sure that makes sense.

Should the memmap be read-only? It's definitely odd to do nrrd.read() and then have a mutable object. I'm particularly concerned about if the size changes (is that possible with memmap?)

I'm leaning towards a separate function like read_memmap instead of adapting read.

I'm also unclear about Fortan vs C-style ordering, but hopefully the order parameter in np.memmap would suffice.

@kwsp
Copy link
Author

kwsp commented Sep 6, 2024

@addisonElliott I agree it

  • should only be used on raw. I don't think its suitable for compressed files as you need to read all into memory to decompress and then access the data right? Personally I don't use any compression because the compression factor on my data is negligible and it slows down IO by an order of magnitude.
  • should be read-only for the file, but we can actually use the "copy on write" mode (use np.memmap(..., mode="c")), this way values can be modified in memory but not to file. I think this more closely matches the behavior of nrrd.read which just reads into memory. Or just read-only np.memmap(..., mode="r") to reduce the cognitive complexity.
  • probably warrants a separate function like read_memmap
  • Could add an optional parameter to nrrd.read(memmap=True) that's only used when "raw"
  • I'm pretty sure the order parameter in np.memmap would suffice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants