-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
Move the ETL (extract, transform, load) modules from the data service to unified_graphics.etl to consolidate all of our applications. This will make it possible to expose command line tools in the application via the `flask` command to load data (which will be handy for back-filling and for development), will allow us to share SQLAlchemy models when we add a database for metadata (#300), and allows us to eliminate duplicate test fixtures.
The tests fail because we have to update the dependencies and the fixtures.
Added missing dependencies needed by the ETL pipeline and modified some of the version requirements for existing dependencies to accommodate botocore and boto3. **Downgraded from Python 3.11 to 3.9.** The available Lambda docker images from Amazon only support Python 3.9, so we are downgrading the application Python version to 3.9 as well. We could, in the future, look into using Nox or Tox to test against multiple versions of Python to allow us to use 3.11 for the application, but I’m not sure we get much of a win, since we still have to target 3.9 which means we miss out on things like the new match statement anyway.
Update the test fixtures in conftest.py to accommodate the ETL tests. Notably, the diag_file fixture was copied over from the data service and updated to generate test data that matches our other test fixtures to get all tests passing. Other than that, the ETL tests were able to make use of the existing fixtures no problem.
Rename the Dockerfile for the Lambda function to Dockerfile-diag-etl in the API service. Update it to run the handler function from the unified_graphics package instead of ugdata. I’m pretty sure this is all we need to do to get this working. It feels a little bad installing the whole unified_graphics package in the container for the Lambda function, since it doesn’t need Flask or anything like that. Maybe we can look into groups in poetry to at least reduce the size of the dependencies.
Just to make sure we aren’t wasting a lot of time copying Python bytecode and temporary files over to the docker containers.
Now that we have moved the ETL pipeline into the same package as the Flask application, we have no real need for the services directory. All of the files from services/api have been moved into the root of the project now.
This looks like a good change. A couple of thoughts:
|
Moving all of the Dockerfiles into a separate directory. This allows us to keep some extra files for things like our nginx config separate from other source files.
Now that the nginx config has been moved into the Docker-specific directory, the COPY command building the image needs to change.
@ian-noaa How do we distinguish between container for the Lambda function and the container for the Flask app? It looks like they both get tagged initially with the same tag: unified-graphics/.github/workflows/data.yaml Line 115 in 145e807
Are we banking on never making changes to both containers in the same PR? EDIT: Nevermind, I figured it out. |
Having consolidated the code for both the ETL pipeline and the app itself, there’s less value in having two separate workflows. They will trigger on basically the same code changes, and the test suites are combined, so running the tests twice in different pipelines will be gratuitous. I created two build jobs and two scan jobs—one for each container. The deploy job pushes both containers to the AWS registry only if they both pass their scans (so that we don’t deploy one without the other, since they will end up sharing data models).
Move the ETL pipeline image cleanup into the same workflow that cleans up the application image, since they run on the same triggers now.
Now we’ve got a few more things to ignore because we’re running prettier from the root of the project, not a subdirectory.
Because we’ve moved all of our code to the project root and are now running checks there, we’re catching issues in non-application code, like the import order of our GitHub scripts. I see no harm in applying the same formatting requirements to our tools that we apply to the application, so I went ahead and sorted the imports.
8af0a7a
to
10c0540
Compare
@ian-noaa I think this is ready for review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me. Nice catch on the CONTRIBUTING
readme. I always forget about that one.
# FIXME: Maybe we should lint/format the k8s files? | ||
kubernetes/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll spin that out into an issue. It'd be good to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now have an issue for this here: #328
No description provided.