Skip to content
This repository has been archived by the owner on Sep 30, 2024. It is now read-only.

241 consolidate data and app services #326

Merged
merged 19 commits into from
May 9, 2023

Conversation

esheehan-gsl
Copy link
Contributor

No description provided.

Move the ETL (extract, transform, load) modules from the data service to
unified_graphics.etl to consolidate all of our applications. This will
make it possible to expose command line tools in the application via the
`flask` command to load data (which will be handy for back-filling and
for development), will allow us to share SQLAlchemy models when we add a
database for metadata (#300), and allows us to eliminate duplicate test
fixtures.
The tests fail because we have to update the dependencies and the
fixtures.
Added missing dependencies needed by the ETL pipeline and modified some
of the version requirements for existing dependencies to accommodate
botocore and boto3.

**Downgraded from Python 3.11 to 3.9.** The available Lambda docker
images from Amazon only support Python 3.9, so we are downgrading the
application Python version to 3.9 as well. We could, in the future, look
into using Nox or Tox to test against multiple versions of Python to
allow us to use 3.11 for the application, but I’m not sure we get much
of a win, since we still have to target 3.9 which means we miss out on
things like the new match statement anyway.
Update the test fixtures in conftest.py to accommodate the ETL tests.
Notably, the diag_file fixture was copied over from the data service and
updated to generate test data that matches our other test fixtures to
get all tests passing. Other than that, the ETL tests were able to make
use of the existing fixtures no problem.
Rename the Dockerfile for the Lambda function to Dockerfile-diag-etl in
the API service. Update it to run the handler function from the
unified_graphics package instead of ugdata.

I’m pretty sure this is all we need to do to get this working. It feels
a little bad installing the whole unified_graphics package in the
container for the Lambda function, since it doesn’t need Flask or
anything like that. Maybe we can look into groups in poetry to at least
reduce the size of the dependencies.
Just to make sure we aren’t wasting a lot of time copying Python bytecode
and temporary files over to the docker containers.
Now that we have moved the ETL pipeline into the same package as the
Flask application, we have no real need for the services directory. All
of the files from services/api have been moved into the root of the
project now.
@esheehan-gsl esheehan-gsl linked an issue May 5, 2023 that may be closed by this pull request
@ian-noaa
Copy link
Collaborator

ian-noaa commented May 5, 2023

This looks like a good change. A couple of thoughts:

  1. AWS now has a Python 3.10 image available for Lambdas if we wanted to bump up to 3.10. https://docs.aws.amazon.com/lambda/latest/dg/python-image.html#python-image-base
  2. We'll want to downgrade the application's Dockerfile to use Python 3.9/3.10 instead of the current 3.11.
  3. I think the maximum Lambda image size is 10 GB (uncompressed) - if that does become an issue, looking into Poetry's dependency groups would make some sense.

Moving all of the Dockerfiles into a separate directory. This allows us
to keep some extra files for things like our nginx config separate from
other source files.
Now that the nginx config has been moved into the Docker-specific
directory, the COPY command building the image needs to change.
@esheehan-gsl
Copy link
Contributor Author

esheehan-gsl commented May 8, 2023

@ian-noaa How do we distinguish between container for the Lambda function and the container for the Flask app? It looks like they both get tagged initially with the same tag:

docker build -t ${{ env.REGISTRY }}:${{ env.BRANCH }} services/data

Are we banking on never making changes to both containers in the same PR?

EDIT: Nevermind, I figured it out.

@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 8, 2023 15:13 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 8, 2023 15:20 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 12:57 — with GitHub Actions Inactive
Having consolidated the code for both the ETL pipeline and the app
itself, there’s less value in having two separate workflows. They will
trigger on basically the same code changes, and the test suites are
combined, so running the tests twice in different pipelines will be
gratuitous.

I created two build jobs and two scan jobs—one for each container. The
deploy job pushes both containers to the AWS registry only if they both
pass their scans (so that we don’t deploy one without the other, since
they will end up sharing data models).
Move the ETL pipeline image cleanup into the same workflow that cleans
up the application image, since they run on the same triggers now.
Now we’ve got a few more things to ignore because we’re running prettier
from the root of the project, not a subdirectory.
Because we’ve moved all of our code to the project root and are now
running checks there, we’re catching issues in non-application code,
like the import order of our GitHub scripts. I see no harm in applying
the same formatting requirements to our tools that we apply to the
application, so I went ahead and sorted the imports.
@esheehan-gsl esheehan-gsl force-pushed the 241-consolidate-data-and-app-services branch from 8af0a7a to 10c0540 Compare May 9, 2023 12:59
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 13:02 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl self-assigned this May 9, 2023
@esheehan-gsl esheehan-gsl marked this pull request as ready for review May 9, 2023 13:04
@esheehan-gsl
Copy link
Contributor Author

@ian-noaa I think this is ready for review

@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 13:06 — with GitHub Actions Inactive
@github-actions
Copy link

github-actions bot commented May 9, 2023

Code Coverage

Package Line Rate Branch Rate Health
unified_graphics 87% 77%
unified_graphics.etl 95% 94%
Summary 89% (325 / 364) 83% (80 / 96)

Minimum allowed line rate is 60%

@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 14:00 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 14:04 — with GitHub Actions Inactive
Copy link
Collaborator

@ian-noaa ian-noaa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me. Nice catch on the CONTRIBUTING readme. I always forget about that one.

Comment on lines +4 to +5
# FIXME: Maybe we should lint/format the k8s files?
kubernetes/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll spin that out into an issue. It'd be good to do.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now have an issue for this here: #328

@esheehan-gsl esheehan-gsl merged commit 7258fe2 into main May 9, 2023
@esheehan-gsl esheehan-gsl deleted the 241-consolidate-data-and-app-services branch May 9, 2023 18:57
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 18:57 — with GitHub Actions Inactive
@esheehan-gsl esheehan-gsl temporarily deployed to vlab May 9, 2023 18:57 — with GitHub Actions Inactive
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Consolidate data and app services
2 participants