feat: add draft next-gen Python workflows #25

dmyersturnbull · 2024-07-12T21:01:31Z

Python CI/CD overhaul RO-4317

This CommonMark isn't being rendered correctly.
Sorry for the inexplicable line breaks.

This is a draft!
Some of the workflows have been tested (successfully), but others still need work.
These are so time-consuming to test and fix that I'll put them on hold for a few weeks.
If you have time, reviews and/or bugfixes would be helpful -- I'm still very much not a GH workflows expert.
Alternatively, you can wait until I've gotten them mostly working.

Not a great description; will improve.

Scope

Interdependent companion to RO-4311.

Supported by rules in pyproject.toml, .prettierrc, and pre-commit, along with a very modern build.
If pre-commit was installed as instructed, auto-formatting occurs per commit;
this always also happens in GitHub workflows.
That setup is correct and working fine.

These workflows

GH settings:
GitHub settings will need to be adjusted per project (but not here).
There is a GH plugin that can make those changes via a settings.yaml that I've tested.
(Alternatively, calls to the GH GraphQL API.)

Workflows and behaviors:

Auto-formats Python, Markdown, TOML, and others before merge.
Runs tests before PR merge and by /please-test and /please-check-* PR comments.
These use status checks, which should be marked as required in GitHub settings.
Tests over a matrix of Python version and OS (must pass to merge).
Tests for security issues (must pass to merge).
Requires that PR titles, PR descriptions, and commit messages follow conventions.
(a subset of conventional commits; must pass to merge).
Lints, generating warnings in status check descriptions for minor problems.
Sends coverage reports to coveralls or codecov.io.
Deploys versioned documents, including code API docs, to GitHub Pages.
Deploys new versions to PyPi, container registries,
and/or GitHub Releases with elegant auto-generated release notes.
Provides a manually triggered "bump" workflow.

henrychao-rcsb

Did a quick review of the PR as-is. Overall, looks like there's a lot of good functionality already built out, but there's also quite a few which I'm not sure if they will be needed or if we'll have to support additional platform integrations (specifically Coveralls).

My suggestion is to build this out incrementally rather than have large commits and review cycles, especially as we'll need to sync on design throughout the process. If you feel that you have enough of the functionality define to use as a minimum viable workflow, then lets schedule some time to figure out in more detail naming and file structures to match the rest of the library files.

draft/python/_py-test.yaml

henrychao-rcsb · 2024-07-15T13:28:27Z

draft/python/_py-test.yaml

+
+# This workflow is designed to run various types of tests and analyses on the codebase.
+# It is triggered from other workflows using `workflow_call`, with specific ref inputs.
+# Different jobs cover tests across multiple operating systems (Ubuntu, Windows, macOS) and Python versions.


As we do not have CI/CD runners for Windows and macOS environments right now, this may not be possible. Is this a strict need for RCSB Python projects?

I definitely think it's ideal for packages we intend for wider audiences. How hard would this be?

It's not a concern on difficulty, it's a question of effort versus value. When we get to a point where we want to start distributing and maintaining packages for more environments, we can then evaluate this integration.

henrychao-rcsb · 2024-07-15T13:32:48Z

draft/python/_py-test.yaml

+
+  preliminary:
+    name: Start Markdown table
+    runs-on: ubuntu-latest


FYI, the current process is to create and define Docker images for runners within the https://github.com/rcsb/devops-runner-images repo, associate them with a label to runners within our Kubernetes cluster (https://github.com/rcsb/rcsb-kube/blob/master/helm/gh-actions-runner-controller/values/west.yaml#L45), then refer to them in jobs by their label. We'll want to avoid or minimize the use of GitHub hosted runners as much as possible.

What would be a good way to generate textual reports (like this Markdown table)?

This would probably be the easier option: https://github.blog/news-insights/product-news/supercharging-github-actions-with-job-summaries/. Otherwise, perhaps building an output markdown build artifact file would work, provided you use an IDE or some other tool to view the markdown content.

henrychao-rcsb · 2024-07-15T13:34:07Z

draft/python/_py-test.yaml

+      - name: Install dependencies and run tests
+        id: test
+        run: |
+          pip install hatch~=1.12


Related to another comment in this review, package installs will be handled in the Docker runner image build process. This way, we can avoid running the same package install commands multiple time across different jobs, and ensure that the versions of the packages used are consistent throughout the pipeline.

Checking I understand. Currently, multiple _test.yaml jobs install Python, build the main package (let's say pkg) and download and install pkg's dependencies along with some extra packages:

pytest (builds pkg; installs dependencies, Hatch, and test frameworks)

docker (builds pkg; installs dependencies and Hatch)

docs (installs Hatch and doc-building tools)

security and codeql also set up Python

The pytest and docker redundancy isn't great. We could run tests inside Docker to avoid it. The downside is with testing non-containerized RCSB Python packages (i.e. libraries, clients, etc.): Currently, excluding a Dockerfile disables that step, while tests are still run. If we move tests to Docker, things are less straightforward.

As for having to build separately for all 4 jobs -- if I understand correctly, you're proposing a solution for this?

hatch, test frameworks, and doc-building tools would be installed as part of the Python runner image. When the pipeline runs for pkg, it'll pull and use the Python runner image and avoid having to download and install those common build tools. It'll just have to install the pkg specific dependencies and build pkg.

henrychao-rcsb · 2024-07-15T13:36:36Z

draft/python/_py-test.yaml

+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python }}


Hmm, do you envision that the pipeline will need to support multiple Python versions within the same run? If so, we'll want to look into something like pyenv within the containerized runner environment, rather than setup python each time on each run.

No hard objection, and that would undoubtably be faster, but using only pyenv is limited, and using both a matrix and pyenv gets complicated pretty quickly. For instance, to test for both FastAPI v0.x.x and v1.x.x, where v1 requires Python>3.10:

strategy: matrix: os: - ubuntu-latest - windows-latest - macos-latest python: - '3.10' - '3.13' fastapi-major: - '0' - '1' exclude: - python: '3.10' fastapi-major: '0'

If we've reached this point for the FastAPI example, then more likely than not, the matrix would be defined within the workflow of the specific project, which then calls the workflow in this library to pass in the appropriate inputs for a build of a specific python version and dependency.

henrychao-rcsb · 2024-07-15T13:43:59Z

draft/python/py-deploy-packages.yaml

+          hatch publish
+        env:
+          HATCH_INDEX_USER: __token__
+          HATCH_INDEX_AUTH: ${{ secrets.PYPI_TOKEN }}


Secrets management is a bit more of a complicated setup. Because we are using the free-tier GitHub organization, we instead pull secret values from our Hashicorp Vault platform and include them in on the runner deployment setup in Kubernetes. We can walk through that in more detail once we're ready to start deploying these workflows.

henrychao-rcsb · 2024-07-15T13:47:07Z

draft/python/py-deploy-packages.yaml

+          HATCH_INDEX_AUTH: ${{ secrets.PYPI_TOKEN }}
+        if: secrets.PYPI_TOKEN != ''
+
+  publish-docker:


We have other library workflows already for building and pushing Docker images to our internal Harbor container registry (https://github.com/rcsb/devops-cicd-github-actions/blob/master/.github/workflows/workflow-docker.yaml), so we'll have to sync and see what can be re-used or changed to support the Python workflows.

henrychao-rcsb · 2024-07-15T13:50:05Z

draft/python/py-format-pr.yaml

+          git commit -m "style: auto-format"
+        continue-on-error: true
+
+      - name: Push changes


Leaving this more of a reminder for myself, but we'll definitely want to review auto-commits from the pipeline in more detail.

draft/python/py-update-coverage.yaml

draft/python/py_bump-version.yaml

dmyersturnbull · 2024-07-24T17:10:06Z

Thanks for reviewing!

I opened this PR to facilitate review, with no plan for it being merged.
For real PRs, I plan to add the files to .github/workflows/ and modify them to use only workflow_call.

My suggestion is to build this out incrementally rather than have large commits

I'll use one PR per workflow. They're all independent, except for test, which update-coverage depends on. (Its other dependent, test-pr, belongs in project repos, not here.)

there's also quite a few which I'm not sure if they will be needed or if we'll have to support additional platform integrations (specifically Coveralls)

Some features are just retained from old workflows (copied from another repo). Some we can and should drop, but others I'd like to keep. For example, I like CodeCov or Coveralls -- those tasks are just skipped unless secrets for them are provided.

Container registry targets are also opt-in, though this functionality is a candidate for removal.
Documentation publishing is also opt-in: repos should not include the workflow
PR title/label rules are only enforced if the check-pr-metadata workflow is included.
deploy-packages creates a GH Release and autogenerates release notes according to release.yaml. If the PR labels aren't used, the generated release notes will be bad, but that's also true without autogenerated release notes.
There is currently no way to opt-out of GH Release generation and publishing to PyPi in deploy-packages, though I could fix that.

My thoughts are that we should drop anything that won't be used, but use a skip-if-not-configured approach for features that will be used in only some repos. I prefer this over fine-grained workflows because the logic is in one place (here) rather than copied with modifications over many project repos. Though I don't object to splitting large workflows inside here; e.g. test delegating to test-docs, test-..., etc.

feat: add draft next-gen Python workflows

f6e2c47

dmyersturnbull marked this pull request as draft July 12, 2024 21:01

dmyersturnbull assigned henrychao-rcsb Jul 12, 2024

henrychao-rcsb reviewed Jul 15, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add draft next-gen Python workflows #25

feat: add draft next-gen Python workflows #25

dmyersturnbull commented Jul 12, 2024 •

edited

Loading

henrychao-rcsb left a comment

henrychao-rcsb Jul 15, 2024

dmyersturnbull Jul 29, 2024

henrychao-rcsb Jul 30, 2024

henrychao-rcsb Jul 15, 2024

dmyersturnbull Jul 29, 2024

henrychao-rcsb Jul 30, 2024

henrychao-rcsb Jul 15, 2024

dmyersturnbull Jul 29, 2024

henrychao-rcsb Jul 30, 2024

henrychao-rcsb Jul 15, 2024

dmyersturnbull Jul 24, 2024

henrychao-rcsb Jul 30, 2024

henrychao-rcsb Jul 15, 2024

henrychao-rcsb Jul 15, 2024

henrychao-rcsb Jul 15, 2024

dmyersturnbull commented Jul 24, 2024 •

edited

Loading

feat: add draft next-gen Python workflows #25

Are you sure you want to change the base?

feat: add draft next-gen Python workflows #25

Conversation

dmyersturnbull commented Jul 12, 2024 • edited Loading

Python CI/CD overhaul RO-4317

Scope

These workflows

henrychao-rcsb left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dmyersturnbull commented Jul 24, 2024 • edited Loading

dmyersturnbull commented Jul 12, 2024 •

edited

Loading

dmyersturnbull commented Jul 24, 2024 •

edited

Loading