Luigi Pipeline for Decollaging and Uploading FlowCam Images

Overview

This Luigi pipeline is designed to process large .tif images generated by a FlowCam device. The pipeline breaks down these large images into smaller "vignette" images, adds metadata (e.g., latitude, longitude, date, and depth) to the resulting images, and then uploads the processed images to a specified destination (e.g., an S3 bucket or an external API).

The pipeline is structured as a series of Luigi tasks, each handling a specific step in the workflow:

Reading Metadata: Parses .lst files to extract metadata.
Decollaging: Extracts individual images from large .tif files.
Uploading: Uploads processed images to a specified endpoint.

Pipeline Architecture

The pipeline consists of the following Luigi tasks:

1. `ReadMetadata`

Purpose: Reads the .lst file to extract metadata for image slicing.
Input: .lst file generated by the FlowCam device.
Output: A .csv file (metadata.csv) containing parsed metadata.

2. `DecollageImages`

Purpose: Uses metadata to slice a large .tif image into smaller vignette images.
Input: The metadata.csv file generated by ReadMetadata.
Output: Individual vignette images with EXIF metadata, saved in the specified output directory.

3. `UploadDecollagedImagesToS3`

Purpose: Uploads processed vignette images to a specified S3 bucket or an external API.
Input: Processed vignette images generated by DecollageImages.
Output: A confirmation file (s3_upload_complete.txt) indicating successful uploads.

4. `FlowCamPipeline` (Wrapper Task)

Purpose: A wrapper task that runs all the above tasks in sequence.
Dependencies: It manages the dependencies and order of execution of the entire pipeline.

Prerequisites

Python 3.7 or above
The following Python packages:
- luigi
- pandas
- numpy
- scikit-image
- requests
- pytest (for testing)
- boto3 (for S3 interactions)
- aioboto3 (for async S3 interactions)
- fastapi and uvicorn (for the external API)

Setup and Installation

Clone the Repository

git clone https://github.com/your_username/plankton_pipeline_luigi.git
cd flowcam-pipeline

Setup JASMIN credentials

If using S3 for uploading, make sure your AWS credentials are set in a .env file in the root directory:
```
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key
AWS_URL_ENDPOINT=your_endpoint_url
```

Running the pipeline

Start the Luigi Central Scheduler
```
luigid --background
```

Run the Pipeline Script

python -m luigi --module pipeline.pipeline_decollage FlowCamPipeline \
 --directory /path/to/flowcam/data \
 --output-directory /path/to/output \
 --experiment-name test_experiment \
 --s3-bucket your-s3-bucket-name

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
pipeline		pipeline
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
run_pipeline.py		run_pipeline.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Luigi Pipeline for Decollaging and Uploading FlowCam Images

Overview

Pipeline Architecture

1. `ReadMetadata`

2. `DecollageImages`

3. `UploadDecollagedImagesToS3`

4. `FlowCamPipeline` (Wrapper Task)

Prerequisites

Setup and Installation

Running the pipeline

About

Releases

Packages

Languages

NERC-CEH/plankton_pipeline_luigi

Folders and files

Latest commit

History

Repository files navigation

Luigi Pipeline for Decollaging and Uploading FlowCam Images

Overview

Pipeline Architecture

1. ReadMetadata

2. DecollageImages

3. UploadDecollagedImagesToS3

4. FlowCamPipeline (Wrapper Task)

Prerequisites

Setup and Installation

Running the pipeline

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

1. `ReadMetadata`

2. `DecollageImages`

3. `UploadDecollagedImagesToS3`

4. `FlowCamPipeline` (Wrapper Task)

Packages