Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline-runnable code to "decollage" the combined FlowCam images #21

Closed
metazool opened this issue Aug 12, 2024 · 3 comments
Closed

Pipeline-runnable code to "decollage" the combined FlowCam images #21

metazool opened this issue Aug 12, 2024 · 3 comments

Comments

@metazool
Copy link
Collaborator

metazool commented Aug 12, 2024

This code currently exists as a command-line script repurposed by @Kzra (based on https://sarigiering.co/posts/extract-individual-particle-images-from-flowcam/) and run by hand on a VM with internal storage available as a volume mount

#20 (Automating file transfer off the FlowCam) suggests we may be able to change this workflow beneficially and take the VM out of the loop.

This issue is about setup work to add the code with tests to the decollage package and also to preserve more metadata while doing it - step towards replacing #4 (minimal metadata, currently just a file listing!) and recording more detail (coordinates, date, depth, and also image size)

Not end-to-end as per #11 (diagrams of the instrument-to-storage workflow) but could be a chance to get a handle on the Luigi package in passing (a python analogue to R's {targets} recommended by @albags - rapid prototyping for work with an Airflow destination?)

@metazool
Copy link
Collaborator Author

See also https://github.com/NERC-CEH/cyto-ML/issues/6 (private to internal contributors)

  • comment here about embedding the metadata in the output image EXIF headers which makes a lot of sense here, the files are still in transit to cloud storage where they'll get fixed addresses, it saves rewriting external metadata

And is better for compatibility with scivision if that's still actively being supported by Turing Inst (who does one ask about that?)

@metazool
Copy link
Collaborator Author

metazool commented Aug 12, 2024

This is interesting! The decollage script depends on the presence of a .lst file (which we have available) - it only uses the pixel coordinates to fish the image out, but all the geometry analytics based on the binary mask is included in this file too

(screen photo from the FlowCam walkthrough about 2/3 of the way down:
20240725_161806

it's an "almost but not quite CSV" with 53 lines of column_name|type headers and there's bound to be a lot we can get out of these metrics alone (I collaborated on a paper about this once, characterisation of grains in sandstone, comparing shape-based to deep learning approaches but it got boiled down to "does the deep learning approach work" rather than digging into the tradeoffs between the approaches...)

@metazool
Copy link
Collaborator Author

Glad to close this, there's a next step preserving the analytic metadata in the .lst files off the FlowCam but that's out of scope here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant