You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Earthmover have released Icechunk, the version-control-of-ARCO aspect of arraylake. This does not contain the cataloguing features of arraylake (that remains something you can only get when paying for arraylake).
Is there a use for Icechunk in this project?
✔ Setup is simple
✔ All changes to datasets are tracked
✔ N new versions of the dataset don't take up O(N x dataset_size) size
✔ Could lend itself of provenance tracking
✖ Might be overkill for datasets that are never updated
✖ Another code library for users to get their heads around (examples would be essential to mitigate, but it takes us a little away from "use the data as if it were on disk" idea which is easier with some boilerplate "load the data and get out the way" fsspec code
✖ Integrating icechunk with EIDC could be trickier, given icechunk does modify the storage format of the files
Originally posted by @mattjbr123 in #19
The text was updated successfully, but these errors were encountered: