-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
idr0013-neumann-mitocheck S-BIAD865 #644
Comments
Reimport still in progress - cancelled once because of long wait on FILESET_UPLOAD_PREP. |
As discussed today, it is probably worth to try and import without chunks, then to add the chunks back by sym-linking to the full plate from the ManagedRepository. This workflow has allowed me to import big plates from idr0125. In a single-image case, I recently achieved the same thing by making a copy of the NGFF Image, then deleting chunks Then, import the metadata only Plate. E.g. for idr0125 - 384-well plate, 9 fields per Well - took ~2 hours. Then, try to view images in the Plate - they should appear as black. Then you can delete the metadata-only plate in Managed Repo and replace it with symlink to the full plate.
But on |
Thanks @will-moore
So would you recommend to delete all the A-P files ? |
No, those A-P are directories that contain important files etc. You only want to delete the chunks, which are files named 0, 1 etc. You can list them with e.g.
count them:
|
And only delete the chunks from a copy of the Plate - Don't delete the originals. Delete chunks with e.g:
|
After having done the workflow suggested by @will-moore I have
Then tried
Both attempts above end in |
@pwalczysko it might be that the plate name has to end with |
Indeed, thank you @will-moore , this did the trick. The data are now imported as http://localhost:1080/webclient/?show=plate-253 ( |
Looks great! I adjusted rendering settings and "Saved to all" so the thumbnails are clearer - they all regenerated fine 👍 |
Try to guess how much space is needed for conversion. ScreenA 1344 x 1024 x 93 x 384 x 510 plates = 25TB |
On pilot-zarr2-dev: Converting one plate takes ~30min, zipping ~50min (without compression 7min!). Converted plate size 36Gb, zipped 28Gb. 7zip (p7zip): 5min (also 28Gb), (without compression 4min) There are 538 plates in total. |
Created batch directories for each 10 plates under /data/ngff/idr0013 . Trying to do 10 conversions and 10 zip/uploads/delete a time, due to the disk space limitation. |
For conversion:
For zipping:
For upload:
Then add zu files.tsv and delete:
|
Failing plate:
I guess there will be more. I'll start and append to this list here to keep track of them:
|
Wrapped it all into one script:
It's running now in three sessions (screens) in /data/ngff/idr0013_new/run_1 / 2 /3 (there is a run_4 as well, but that might be a bit too much). |
This is currently doing 3 conversions in a bit more than an hour. So should all be done in ~8 days. |
Finished. Only |
Really finished now, exported the LT0012_29 plate with omero cli zarr. (LT0012_29.ome.zarr.zip) |
Looking into submission error with file names in Looks like problem is that each row doesn't include the directory with But I also noticed a zip called To try and make this consistent with the others, I downloaded it (via web page), renamed it and uploaded via Aspera...
Checked on https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0013 that the file sizes of renamed file matched the old file, then deleted Upload new idr0013_files.tsv |
Failed with ResourceError. Checked Blitz logs..
|
Viewing a different Plate from idr0004 with missing Wells gives same error:
|
To see if a non-sparse Plate would work, updated
http://localhost:1080/webclient/?show=well-802140 ... but this failed due to goofys: |
Goofys failed again, when re-running
|
A big problem with goofys failing (twice above) is that we need to restart the server to re-mount and this means that previously generated Need to move to a workflow of creating and executing the sql immediately...
|
|
http://localhost:1080/webclient/?show=well-802140 eventually viewable...
663949 ms is 11 minutes |
Got about 40 complete - most others are
Kinda painful to pick up where we left off with Updated Now we just need to update the command to append to the sql file instead of writing to it, to avoid overwriting the existing files. We also want to use the old
|
Needed another server restart to re-mount goofys... |
Needed another server restart to re-mount goofys... |
Since running the mkngff for this and idr0016 at the same time on idr-testing is causing goofys issues, going to pause on this one now until idr0016 is done.... |
Picking up where we left off...
Kept these 4 rows (no sql exported) deleted the other completed rows from idr0013.csv on idr-testing..
|
Repeated several times, each time processing 20 - 40 Filesets... |
Restarted again... seems to be 39 or 40 each time.
|
Restarted again after another 39...
|
Need to fix naming of sql. Using
|
Check for
Edited idr0013.csv to contain just the 163 rows with
|
Since Still to do "idr0013.csv"... on idr0138-pilot... as wmoore user...
|
Following Images/Filesets found to be incomplete when regenerating memo files on idr-testing...
On pilot-zarr1-dev, screen
Can't seem to read the data...
EDIT: seems to work when I'm not in that old screen. |
Checking that files missing from previous plates are present in newly-generated ones... This was missing
Similar checks with the other plates for Renamed to shorten names...
EDIT:
Delete these 3 from https://www.ebi.ac.uk/biostudies/submissions/files?path=%2Fuser%2Fidr0013 Upload...
|
Checked https://uk1s3.embassy.ebi.ac.uk/bia-integrator-data/pages/S-BIAD865.html again. Resubmitted plates above not updated yet... LT0066_23--ex2005_08_03--sp2005_06_07--tt17--c3 LT0103_13--ex2006_11_22--sp2005_08_16--tt19--c4 LT0080_37--ex2005_07_20--sp2005_07_04--tt17--c4 |
Let's host those 3 plates on our s3 for testing mkngff etc.
Looking good:
On idr0125-pilot...
As idr0013.csv
screen -r mkngff
|
Check sql output - all have
$ less 18568.sql...
Updated SECRET to 9630ba1e-ed3a-42e3-9296-xxxxxxxx then ran
Ooops....
Fileset info looks good...
Checking http://localhost:1080/webclient/?show=image-1556033 - view image.... |
Lets check_pixels...
|
We have re-submitted data now available on EBI s3... Test on idr-testing, using Fileset IDs from idr-testing! Install IDR/omero-mkngff#14 to create new Filesets without extra
idr0013.csv
Then, update SECRET and... (again using
|
Updated sql scripts to use original Fileset IDs in IDR/mkngff_upgrade_scripts@3f8e169 |
idr0013-neumann-mitocheck
The text was updated successfully, but these errors were encountered: