Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

Add a CLI to the data pipeline #145

Open
esheehan-gsl opened this issue Feb 3, 2023 · 0 comments
Open

Add a CLI to the data pipeline #145

esheehan-gsl opened this issue Feb 3, 2023 · 0 comments

Comments

@esheehan-gsl
Copy link
Contributor

Problem

It would be useful to be able to run the ETL pipeline locally for generating local data files for development, and possibly for back-filling missing data in AWS.

Solution

Add a command-line interface (CLI) that can download diag files from S3 (or access them locally) and run them through the ETL pipeline. I was just doing this interactively in ipython and it’s not too hard to handle.

import os
import boto3
from ugdata import aws, diag

os.environ["UG_DIAG_ZARR"] = "./diag.zarr"

s3 = boto3.client("s3")
paginator = s3.get_paginator("list_objects_v2")
bucket = "parallelcluster-rtma-cluster"
prefix = "UDD_3DRTMA_HRRR_DIAG/"

for page in paginator.paginate(Bucket=bucket, Prefix=prefix):
    for obj in page["Contents"]:
        # This was just to avoid parsing known bad data
        if diag.parse_diag_filename(obj['Key']).initialization_time < "2023-02-02T14:00":
            continue
        records.append({"s3": {"bucket": {"name": page["Name"]}, "object": {"key": obj["Key"]}}})

aws.lambda_handler({"Records": records}, {})

The CLI should support:

  • Setting the Zarr location as a parameter instead of an environment variable
  • Filtering by initialization time

No Gos

Describe any features or behaviors that have been considered and rejected as out of scope for this project.

Rabbit Holes

Describe any solutions to problems that pose a risk to completing this project on time.

@esheehan-gsl esheehan-gsl added project shaping Status for project issues that are still being discussed labels Feb 3, 2023
@esheehan-gsl esheehan-gsl removed the shaping Status for project issues that are still being discussed label Feb 13, 2023
@esheehan-gsl esheehan-gsl modified the milestone: Demo #9 Apr 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant