241 consolidate data and app services #326

esheehan-gsl · 2023-05-05T20:15:34Z

No description provided.

Move the ETL (extract, transform, load) modules from the data service to unified_graphics.etl to consolidate all of our applications. This will make it possible to expose command line tools in the application via the `flask` command to load data (which will be handy for back-filling and for development), will allow us to share SQLAlchemy models when we add a database for metadata (#300), and allows us to eliminate duplicate test fixtures.

The tests fail because we have to update the dependencies and the fixtures.

Added missing dependencies needed by the ETL pipeline and modified some of the version requirements for existing dependencies to accommodate botocore and boto3. **Downgraded from Python 3.11 to 3.9.** The available Lambda docker images from Amazon only support Python 3.9, so we are downgrading the application Python version to 3.9 as well. We could, in the future, look into using Nox or Tox to test against multiple versions of Python to allow us to use 3.11 for the application, but I’m not sure we get much of a win, since we still have to target 3.9 which means we miss out on things like the new match statement anyway.

Update the test fixtures in conftest.py to accommodate the ETL tests. Notably, the diag_file fixture was copied over from the data service and updated to generate test data that matches our other test fixtures to get all tests passing. Other than that, the ETL tests were able to make use of the existing fixtures no problem.

Rename the Dockerfile for the Lambda function to Dockerfile-diag-etl in the API service. Update it to run the handler function from the unified_graphics package instead of ugdata. I’m pretty sure this is all we need to do to get this working. It feels a little bad installing the whole unified_graphics package in the container for the Lambda function, since it doesn’t need Flask or anything like that. Maybe we can look into groups in poetry to at least reduce the size of the dependencies.

Just to make sure we aren’t wasting a lot of time copying Python bytecode and temporary files over to the docker containers.

Now that we have moved the ETL pipeline into the same package as the Flask application, we have no real need for the services directory. All of the files from services/api have been moved into the root of the project now.

ian-noaa · 2023-05-05T20:29:25Z

This looks like a good change. A couple of thoughts:

AWS now has a Python 3.10 image available for Lambdas if we wanted to bump up to 3.10. https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-base
We'll want to downgrade the application's Dockerfile to use Python 3.9/3.10 instead of the current 3.11.
I think the maximum Lambda image size is 10 GB (uncompressed) - if that does become an issue, looking into Poetry's dependency groups would make some sense.

Moving all of the Dockerfiles into a separate directory. This allows us to keep some extra files for things like our nginx config separate from other source files.

Now that the nginx config has been moved into the Docker-specific directory, the COPY command building the image needs to change.

esheehan-gsl · 2023-05-08T14:10:53Z

@ian-noaa How do we distinguish between container for the Lambda function and the container for the Flask app? It looks like they both get tagged initially with the same tag:

unified-graphics/.github/workflows/data.yaml

Line 115 in 145e807

docker build -t ${{ env.REGISTRY }}:${{ env.BRANCH }} services/data

Are we banking on never making changes to both containers in the same PR?

EDIT: Nevermind, I figured it out.

.github/workflows/api.yaml

Having consolidated the code for both the ETL pipeline and the app itself, there’s less value in having two separate workflows. They will trigger on basically the same code changes, and the test suites are combined, so running the tests twice in different pipelines will be gratuitous. I created two build jobs and two scan jobs—one for each container. The deploy job pushes both containers to the AWS registry only if they both pass their scans (so that we don’t deploy one without the other, since they will end up sharing data models).

Move the ETL pipeline image cleanup into the same workflow that cleans up the application image, since they run on the same triggers now.

Now we’ve got a few more things to ignore because we’re running prettier from the root of the project, not a subdirectory.

Because we’ve moved all of our code to the project root and are now running checks there, we’re catching issues in non-application code, like the import order of our GitHub scripts. I see no harm in applying the same formatting requirements to our tools that we apply to the application, so I went ahead and sorted the imports.

esheehan-gsl · 2023-05-09T13:04:47Z

@ian-noaa I think this is ready for review

github-actions · 2023-05-09T13:59:24Z

Package	Line Rate	Branch Rate	Health
unified_graphics	87%	77%	✔
unified_graphics.etl	95%	94%	✔
Summary	89% (325 / 364)	83% (80 / 96)	✔

Minimum allowed line rate is 60%

ian-noaa

Looks good to me. Nice catch on the CONTRIBUTING readme. I always forget about that one.

ian-noaa · 2023-05-09T17:53:38Z

.prettierignore

+# FIXME: Maybe we should lint/format the k8s files?
+kubernetes/


I'll spin that out into an issue. It'd be good to do.

We now have an issue for this here: #328

esheehan-gsl added 9 commits May 5, 2023 08:52

Move ETL tests over to the api service

6ff62da

The tests fail because we have to update the dependencies and the fixtures.

Remove the data service

a59d21f

Configure coverage to ignore tests/

d0e0c3b

Copy over the dockerignore patterns from data/

6394725

Just to make sure we aren’t wasting a lot of time copying Python bytecode and temporary files over to the docker containers.

Eliminate the services/ directory

145e807

Now that we have moved the ETL pipeline into the same package as the Flask application, we have no real need for the services directory. All of the files from services/api have been moved into the root of the project now.

esheehan-gsl linked an issue May 5, 2023 that may be closed by this pull request

Consolidate data and app services #241

Closed

esheehan-gsl added 3 commits May 8, 2023 07:20

Move Docker files

9ba23e2

Moving all of the Dockerfiles into a separate directory. This allows us to keep some extra files for things like our nginx config separate from other source files.

Downgrade app container to Python 3.9 for now

fd3d50b

Update the COPY command for the webserver config

a98d6d4

Now that the nginx config has been moved into the Docker-specific directory, the COPY command building the image needs to change.

esheehan-gsl temporarily deployed to vlab May 8, 2023 15:13 — with GitHub Actions Inactive

esheehan-gsl temporarily deployed to vlab May 8, 2023 15:20 — with GitHub Actions Inactive

ian-noaa reviewed May 8, 2023

View reviewed changes

.github/workflows/api.yaml Show resolved Hide resolved

esheehan-gsl temporarily deployed to vlab May 9, 2023 12:57 — with GitHub Actions Inactive

esheehan-gsl added 6 commits May 9, 2023 06:59

Consolidate the image cleanup workflows

1fa1f4b

Move the ETL pipeline image cleanup into the same workflow that cleans up the application image, since they run on the same triggers now.

Update the nginx container workflow

d74c846

Update triggers for nginx container cleanup

bfdb087

Update prettierignore for new root directory

63243a4

Now we’ve got a few more things to ignore because we’re running prettier from the root of the project, not a subdirectory.

esheehan-gsl force-pushed the 241-consolidate-data-and-app-services branch from 8af0a7a to 10c0540 Compare May 9, 2023 12:59

esheehan-gsl temporarily deployed to vlab May 9, 2023 13:02 — with GitHub Actions Inactive

esheehan-gsl self-assigned this May 9, 2023

esheehan-gsl marked this pull request as ready for review May 9, 2023 13:04

esheehan-gsl temporarily deployed to vlab May 9, 2023 13:06 — with GitHub Actions Inactive

Update CONTRIBUTING docs

17bccb6

esheehan-gsl temporarily deployed to vlab May 9, 2023 14:00 — with GitHub Actions Inactive

esheehan-gsl temporarily deployed to vlab May 9, 2023 14:04 — with GitHub Actions Inactive

ian-noaa approved these changes May 9, 2023

View reviewed changes

ian-noaa reviewed May 9, 2023

View reviewed changes

esheehan-gsl merged commit 7258fe2 into main May 9, 2023

esheehan-gsl deleted the 241-consolidate-data-and-app-services branch May 9, 2023 18:57

esheehan-gsl temporarily deployed to vlab May 9, 2023 18:57 — with GitHub Actions Inactive

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

241 consolidate data and app services #326

241 consolidate data and app services #326

esheehan-gsl commented May 5, 2023

ian-noaa commented May 5, 2023

esheehan-gsl commented May 8, 2023 •

edited

Loading

esheehan-gsl commented May 9, 2023

github-actions bot commented May 9, 2023

ian-noaa left a comment

ian-noaa May 9, 2023

ian-noaa May 9, 2023

		# FIXME: Maybe we should lint/format the k8s files?
		kubernetes/

241 consolidate data and app services #326

241 consolidate data and app services #326

Conversation

esheehan-gsl commented May 5, 2023

ian-noaa commented May 5, 2023

esheehan-gsl commented May 8, 2023 • edited Loading

esheehan-gsl commented May 9, 2023

github-actions bot commented May 9, 2023

ian-noaa left a comment

Choose a reason for hiding this comment

ian-noaa May 9, 2023

Choose a reason for hiding this comment

ian-noaa May 9, 2023

Choose a reason for hiding this comment

esheehan-gsl commented May 8, 2023 •

edited

Loading