Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add draft next-gen Python workflows #25

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

dmyersturnbull
Copy link

@dmyersturnbull dmyersturnbull commented Jul 12, 2024

Python CI/CD overhaul RO-4317

This CommonMark isn't being rendered correctly.
Sorry for the inexplicable line breaks.


This is a draft!
Some of the workflows have been tested (successfully), but others still need work.
These are so time-consuming to test and fix that I'll put them on hold for a few weeks.
If you have time, reviews and/or bugfixes would be helpful -- I'm still very much not a GH workflows expert.
Alternatively, you can wait until I've gotten them mostly working.

Not a great description; will improve.


Scope

Interdependent companion to RO-4311.

Supported by rules in pyproject.toml, .prettierrc, and pre-commit, along with a very modern build.
If pre-commit was installed as instructed, auto-formatting occurs per commit;
this always also happens in GitHub workflows.
That setup is correct and working fine.

These workflows

GH settings:
GitHub settings will need to be adjusted per project (but not here).
There is a GH plugin that can make those changes via a settings.yaml that I've tested.
(Alternatively, calls to the GH GraphQL API.)

Workflows and behaviors:

  • Auto-formats Python, Markdown, TOML, and others before merge.
  • Runs tests before PR merge and by /please-test and /please-check-* PR comments.
    These use status checks, which should be marked as required in GitHub settings.
  • Tests over a matrix of Python version and OS (must pass to merge).
  • Tests for security issues (must pass to merge).
  • Requires that PR titles, PR descriptions, and commit messages follow conventions.
    (a subset of conventional commits; must pass to merge).
  • Lints, generating warnings in status check descriptions for minor problems.
  • Sends coverage reports to coveralls or codecov.io.
  • Deploys versioned documents, including code API docs, to GitHub Pages.
  • Deploys new versions to PyPi, container registries,
    and/or GitHub Releases with elegant auto-generated release notes.
  • Provides a manually triggered "bump" workflow.

Copy link
Contributor

@henrychao-rcsb henrychao-rcsb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a quick review of the PR as-is. Overall, looks like there's a lot of good functionality already built out, but there's also quite a few which I'm not sure if they will be needed or if we'll have to support additional platform integrations (specifically Coveralls).

My suggestion is to build this out incrementally rather than have large commits and review cycles, especially as we'll need to sync on design throughout the process. If you feel that you have enough of the functionality define to use as a minimum viable workflow, then lets schedule some time to figure out in more detail naming and file structures to match the rest of the library files.

draft/python/_py-test.yaml Show resolved Hide resolved

# This workflow is designed to run various types of tests and analyses on the codebase.
# It is triggered from other workflows using `workflow_call`, with specific ref inputs.
# Different jobs cover tests across multiple operating systems (Ubuntu, Windows, macOS) and Python versions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we do not have CI/CD runners for Windows and macOS environments right now, this may not be possible. Is this a strict need for RCSB Python projects?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely think it's ideal for packages we intend for wider audiences. How hard would this be?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not a concern on difficulty, it's a question of effort versus value. When we get to a point where we want to start distributing and maintaining packages for more environments, we can then evaluate this integration.


preliminary:
name: Start Markdown table
runs-on: ubuntu-latest
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, the current process is to create and define Docker images for runners within the https://github.com/rcsb/devops-runner-images repo, associate them with a label to runners within our Kubernetes cluster (https://github.com/rcsb/rcsb-kube/blob/master/helm/gh-actions-runner-controller/values/west.yaml#L45), then refer to them in jobs by their label. We'll want to avoid or minimize the use of GitHub hosted runners as much as possible.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would be a good way to generate textual reports (like this Markdown table)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be the easier option: https://github.blog/news-insights/product-news/supercharging-github-actions-with-job-summaries/. Otherwise, perhaps building an output markdown build artifact file would work, provided you use an IDE or some other tool to view the markdown content.

- name: Install dependencies and run tests
id: test
run: |
pip install hatch~=1.12
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Related to another comment in this review, package installs will be handled in the Docker runner image build process. This way, we can avoid running the same package install commands multiple time across different jobs, and ensure that the versions of the packages used are consistent throughout the pipeline.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checking I understand. Currently, multiple _test.yaml jobs install Python, build the main package (let's say pkg) and download and install pkg's dependencies along with some extra packages:

  • pytest (builds pkg; installs dependencies, Hatch, and test frameworks)
  • docker (builds pkg; installs dependencies and Hatch)
  • docs (installs Hatch and doc-building tools)
  • security and codeql also set up Python

The pytest and docker redundancy isn't great. We could run tests inside Docker to avoid it. The downside is with testing non-containerized RCSB Python packages (i.e. libraries, clients, etc.): Currently, excluding a Dockerfile disables that step, while tests are still run. If we move tests to Docker, things are less straightforward.

As for having to build separately for all 4 jobs -- if I understand correctly, you're proposing a solution for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hatch, test frameworks, and doc-building tools would be installed as part of the Python runner image. When the pipeline runs for pkg, it'll pull and use the Python runner image and avoid having to download and install those common build tools. It'll just have to install the pkg specific dependencies and build pkg.

Comment on lines +69 to +71
- uses: actions/setup-python@v5
with:
python-version: ${{ matrix.python }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, do you envision that the pipeline will need to support multiple Python versions within the same run? If so, we'll want to look into something like pyenv within the containerized runner environment, rather than setup python each time on each run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No hard objection, and that would undoubtably be faster, but using only pyenv is limited, and using both a matrix and pyenv gets complicated pretty quickly. For instance, to test for both FastAPI v0.x.x and v1.x.x, where v1 requires Python>3.10:

strategy:
  matrix:
    os:
      - ubuntu-latest
      - windows-latest
      - macos-latest
    python:
      - '3.10'
      - '3.13'
    fastapi-major:
      - '0'
      - '1'
    exclude:
      - python: '3.10'
        fastapi-major: '0'

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we've reached this point for the FastAPI example, then more likely than not, the matrix would be defined within the workflow of the specific project, which then calls the workflow in this library to pass in the appropriate inputs for a build of a specific python version and dependency.

hatch publish
env:
HATCH_INDEX_USER: __token__
HATCH_INDEX_AUTH: ${{ secrets.PYPI_TOKEN }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Secrets management is a bit more of a complicated setup. Because we are using the free-tier GitHub organization, we instead pull secret values from our Hashicorp Vault platform and include them in on the runner deployment setup in Kubernetes. We can walk through that in more detail once we're ready to start deploying these workflows.

HATCH_INDEX_AUTH: ${{ secrets.PYPI_TOKEN }}
if: secrets.PYPI_TOKEN != ''

publish-docker:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have other library workflows already for building and pushing Docker images to our internal Harbor container registry (https://github.com/rcsb/devops-cicd-github-actions/blob/master/.github/workflows/workflow-docker.yaml), so we'll have to sync and see what can be re-used or changed to support the Python workflows.

git commit -m "style: auto-format"
continue-on-error: true

- name: Push changes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this more of a reminder for myself, but we'll definitely want to review auto-commits from the pipeline in more detail.

draft/python/py-update-coverage.yaml Show resolved Hide resolved
draft/python/py_bump-version.yaml Show resolved Hide resolved
@dmyersturnbull
Copy link
Author

dmyersturnbull commented Jul 24, 2024

Thanks for reviewing!

I opened this PR to facilitate review, with no plan for it being merged.
For real PRs, I plan to add the files to .github/workflows/ and modify them to use only workflow_call.

My suggestion is to build this out incrementally rather than have large commits

I'll use one PR per workflow. They're all independent, except for test, which update-coverage depends on. (Its other dependent, test-pr, belongs in project repos, not here.)

there's also quite a few which I'm not sure if they will be needed or if we'll have to support additional platform integrations (specifically Coveralls)

Some features are just retained from old workflows (copied from another repo). Some we can and should drop, but others I'd like to keep. For example, I like CodeCov or Coveralls -- those tasks are just skipped unless secrets for them are provided.

  • Container registry targets are also opt-in, though this functionality is a candidate for removal.
  • Documentation publishing is also opt-in: repos should not include the workflow
  • PR title/label rules are only enforced if the check-pr-metadata workflow is included.
  • deploy-packages creates a GH Release and autogenerates release notes according to release.yaml. If the PR labels aren't used, the generated release notes will be bad, but that's also true without autogenerated release notes.
  • There is currently no way to opt-out of GH Release generation and publishing to PyPi in deploy-packages, though I could fix that.

My thoughts are that we should drop anything that won't be used, but use a skip-if-not-configured approach for features that will be used in only some repos. I prefer this over fine-grained workflows because the logic is in one place (here) rather than copied with modifications over many project repos. Though I don't object to splitting large workflows inside here; e.g. test delegating to test-docs, test-..., etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants