Very slow performance indexing into netcdf with a vector #135

rafaqz · 2024-01-21T22:43:11Z

Indexing with a vector can be incredibly slow with the current implementation of batchgetindex.

From this comment by @Alexander-Barth
#131 (comment)

NetCDF.jl + DiskArray

julia> using NetCDF

julia> v = NetCDF.open("http://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0","water_temp");

julia> size(v)
(4500, 4251, 40, 14969)

julia> v1 = @time v[1000,1000,1,1:10];
  1.478774 seconds (3.71 M allocations: 250.669 MiB, 3.42% gc time, 59.32% compilation time)

julia> v1 = @time v[1000,1000,1,1:10];
  0.467219 seconds (45 allocations: 1.781 KiB)

julia> v1 = @time v[1000,1000,1,collect(1:10)];
739.847348 seconds (1.32 M allocations: 819.944 MiB, 0.03% gc time, 0.10% compilation time)
From this comment by @Alexander-Barth 
https://github.com/meggart/DiskArrays.jl/issues/131#issuecomment-1902759762

NCDatasets 0.12 without DiskArray

julia> using NCDatasets

julia> ds = NCDataset("http://tds.hycom.org/thredds/dodsC/GLBy0.08/expt_93.0"); v = ds["water_temp"];

julia> v1 = @time v[1000,1000,1,1:10];
  0.726571 seconds (134.15 k allocations: 8.849 MiB, 14.90% compilation time)

julia> v1 = @time v[1000,1000,1,1:10];
  0.218793 seconds (62 allocations: 1.984 KiB)

julia> v1 = @time v[1000,1000,1,collect(1:10)];
  0.879592 seconds (982.55 k allocations: 66.829 MiB, 9.49% gc time, 65.79% compilation time)

So 12 minutes versus 1 seconds for this example. Yes, I know that I can bypass DiskArrays by providing my own batchgetindex. But I don't think that DiskArray should make the current assumptions if it aims to be generic (if this is the goal).

Can we have the current assumptions (i.e. reading a chunk similarly fast than reading a subset of chunk) documented?

Concerning the API, another think I am wondering if we need a function named batchgetindex at all. Why not having getindex simply pass the indices to DiskArrays.readblock! and let DiskArrays.readblock! figure out how to best load the data. DiskArrays.readblock! is already specific to the storage format.

The text was updated successfully, but these errors were encountered:

rafaqz · 2024-02-19T15:31:56Z

This is fixed in #146. It was mostly wait time from the server blocking repeated calls.

Alexander-Barth mentioned this issue Jan 22, 2024

Don't load whole chunk when you don't need to, e.g. with NetCDF #136

Closed

meggart mentioned this issue Feb 1, 2024

WIP: Complete reimplementation of getindex and setindex #146

Merged

rafaqz closed this as completed Feb 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Very slow performance indexing into netcdf with a vector #135

Very slow performance indexing into netcdf with a vector #135

rafaqz commented Jan 21, 2024 •

edited

Loading

rafaqz commented Feb 19, 2024

Very slow performance indexing into netcdf with a vector #135

Very slow performance indexing into netcdf with a vector #135

Comments

rafaqz commented Jan 21, 2024 • edited Loading

rafaqz commented Feb 19, 2024

rafaqz commented Jan 21, 2024 •

edited

Loading