-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add some Zarr-based datatypes #19040
base: dev
Are you sure you want to change the base?
Conversation
3f376ed
to
35af2aa
Compare
I think the URI datatype is a little wonky also. I don't want to encode remote URIs this way if we can avoid it. Lets focus on a syntax in tool data parameters that allows accepting URIs instead of doing this. The other datatypes and enhancements all look really great to me. I'll create an issue for the data parameter handling. |
4559c26
to
25bff00
Compare
Thanks again for the great feedback! I've dropped the "wonky" URI datatypes and implemented #19077 as discussed. Once that is merged, I'll rebase here and should be ready to go :) |
This one needs now a final rebase :) |
- Rename generic to ZarrDirectory - Detect Zarr version in metadata - Add zarr.zip datatype
…older and metadata file
…file Instead of the default behavior of downloading an empty file.
Should help when opening the Zarr ZipStore if there is compression involved
25bff00
to
3b7c640
Compare
Co-authored-by: Björn Grüning <[email protected]>
|
||
# This wouldn't be needed if the CompressedFile.extract function didn't | ||
# create an extra folder under the dataset's extra_files_path. | ||
# Maybe this can be avoided somehow? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we merge with this or do we need to investigate this more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would try to get rid of this ? Otherwise I'd like to see a tool actually use this metadata element.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain the issue in more detail to see if we can eliminate the need for store_root
metadata. This store_root
isn't true metadata—it's more of a workaround that indicates the folder containing the actual root of the Zarr directory.
Both tools and visualizations require a path to this root directory to access the correct contents.
When we upload a zip file containing a Zarr directory, it’s common for the zip to include the parent folder of the Zarr store. Many Zarr zips I’ve encountered are structured this way. Ideally, the Zarr store would be zipped without this extra parent folder, but even if it isn’t, when we extract it using the converter, it creates a new folder (like dataset_{uuid}
) within extra_files_path
, resulting in an additional layer.
Currently, to access the Zarr directory correctly, any tool needs to reference it as follows:
input_zarr = zarr.open('$zarrinput.extra_files_path/$zarrinput.metadata.store_root', mode='r')
This approach, however, is not fully reliable—what if the Zarr store is nested deeper within subdirectories? A better solution might be to use a dedicated converter (rather than archive_to_directory.xml
) that finds and extracts the root store directly to extra_files_path
, without any parent folders, would this be better?
Another drawback is that tool developers must remember to reference the $zarrinput.extra_files_path
, and even add /$zarrinput.metadata.store_root
to reach the actual Zarr store.
Any ideas on how to make this process more elegant and eliminate the store_root
?
Requires #17614 and #19077
Includes the following datatypes:
General Zarr datatypes
CompressedZarrZipArchive
(zarr.zip
): represents a Zarr ZipStore. It seems to have some limitations (i.e. it doesn't work with zips containing the zarr store in a subfolder).ZarrDirectory
(zarr
): represents a Zarr DirectoryStore. Contains the zarr structure in theextra_files_path
of the dataset. I wonder if there is a way to make the "main" dataset to really point to the actual root folder in extra_files like a symlink or something like this, but I don't know if this really makes sense. Right now, to access your zarr store root you will need to use the input like this'$zarrinput.extra_files_path/$zarrinput.metadata.store_root'
(seeHandling different input types
section below)OME-Zarr datatypes
CompressedOMEZarrZipArchive
(ome_zarr.zip
): Similar to CompressedZarrZipArchive but expects to find anOME/METADATA.ome.xml
file in the store root so it can be easily converted/extracted to anOMEZarr
directory.OMEZarr
(ome_zarr
): Similar to ZarrDirectory but identify this datatype as an OME Zarr image.Handling different input types
If your zarr input is defined like this:
You need to access the zarr store in a "slightly" different way depending on how the store is provided as input.
Input as
zarr.zip
UploadOMEZarr.zip.mp4
You can upload a zip file containing the zarr store. In order to open it in your tool you need to do the following:
Please note that the root store must be in the root of the zip (i.e. no subdirectory containing the store)
Input as
zarr
(Directory)You cannot directly upload a directory, but you can upload a zarr.zip and then extract it to a "zarr directory dataset".
ConvertZipToOMEZarr.mp4
To open this kind of zarr in your tool you must do the following:
Note that you need to reference the extra_files_path and append the
$zarrinput.metadata.store_root
to it. This will point to the right internal directory where the zarr store has been extracted and will work even if the uploaded zip had a parent subfolder containing the zarr store.Input as a
deferred zarr
(URI to a remote zarr)DeferredZarr.mp4
In your tool, you can just directly use '$zarrinput', it will point to the remote URI. But you need to check this is actually a deferred dataset like this:
How to test the changes?
Edit Attributes
then select theDatatypes
section and convert it tozarr
.https://uk1s3.embassy.ebi.ac.uk/idr/zarr/v0.4/idr0062A/6001240.zarr
. You can find more examples here.Defer dataset resolution
License