Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

♻️ 💅 Cleanup/polish the code to automatically cascade images from the server -> server dependencies #1082

Open
14 tasks
shankari opened this issue Aug 12, 2024 · 15 comments

Comments

@shankari
Copy link
Contributor

shankari commented Aug 12, 2024

Copied over pending tasks from:
e-mission/e-mission-server#961 (comment)

  • change all the images to be stored in the emission dockerhub instead of my personal dockerhub account
    • even better, store them in github ghcr.io so that we don't run into issues with usage limits
  • Explore whether we can stop uploading the date of of the run as an artifact since we are using an .env file in the dashboards. Can't this just be an output of the run?
  • Also we might want to just rebuild on releases (marked with a tag) instead of every push anyway.
  • and may also want to make the push conditional on the tests passing
  • Instead of having two separate files for the analysis config (which then have to be kept in sync), just edit the dev in prod mode to turn off the assertions
  • remove the secret auth method since it is unused
  • consider using a similar cascade from e-mission-common to the various repos
    • also, consider pulling out e-mission-core as a separate library that is similar to e-mission-common
    • we will only need to pull e-mission-core into the dashboard repos then; we don't need the full analysis pipeline
  • we currently have two copies of the cascading code (in the two dashboard repos). We should explore unifying them. Maybe nested workflows?
  • Consider switching from $GITHUB_ENV to step outputs. I think that the step outputs are the recommended approach to pass values from one step to another, but we should verify
  • Discuss where to put in the cert configuration. We originally had it in the internal repos, then we moved it to the external repos. But now that we have one internal dockerfile per external dockerfile, maybe we can have it be internal after all. It doesn't actually hurt anything to be external, but it is unnecessary in the external repo.
  • upload/store the correct tag. Right now the tag in dockerhub is of the format <main_branch>_<date> but the uploaded/stored tag is only the date, so we need to prepend the branch externally. And the branch is not always the same.
@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

I have segregated the tasks into two individual groups (CI/CD, Core code) + one combo group(CI/CD + Core)

A. CI/CD

  • 1. change all the images to be stored in the emission dockerhub instead of my personal dockerhub account
    • even better, store them in github ghcr.io so that we don't run into issues with usage limits
  • 2. Explore whether we can stop uploading the date of of the run as an artifact since we are using an .env file in the dashboards. Can't this just be an output of the run?
  • 3. Also we might want to just rebuild on releases (marked with a tag) instead of every push anyway.

  • 4. and may also want to make the push conditional on the tests passing
  • 5. we currently have two copies of the cascading code (in the two dashboard repos). We should explore unifying them. Maybe nested workflows?
  • 6. Consider switching from $GITHUB_ENV to step outputs. I think that the step outputs are the recommended approach to pass values from one step to another, but we should verify

  • 7. Discuss where to put in the cert configuration. We originally had it in the internal repos, then we moved it to the external repos. But now that we have one internal dockerfile per external dockerfile, maybe we can have it be internal after all. It doesn't actually hurt anything to be external, but it is unnecessary in the external repo.
  • 
8. upload/store the correct tag. Right now the tag in dockerhub is of the format <main_branch>_ but the uploaded/stored tag is only the date, so we need to prepend the branch externally. And the branch is not always the same.

B. Core code

  • 1. Instead of having two separate files for the analysis config (which then have to be kept in sync), just edit the dev in prod mode to turn off the assertions
  • 2. remove the secret auth method since it is unused

C. Core Code + CI/CD

  • 1. consider using a similar cascade from e-mission-common to the various repos
  • 2. also, consider pulling out e-mission-core as a separate library that is similar to e-mission-common
    • we will only need to pull e-mission-core into the dashboard repos then; we don't need the full analysis pipeline

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

The order I plan to work on these:
A. CI/CD: 7, 6, 2, 8, 5, 4, 3, 1
B. Core: 2, 1
C. Core + CI/CD: 1, 2


First round of changes (comment): A-2, 6, 7, 8
Second round of changes (comment): A- 4, 5

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

Task A-7: Certificates location - External or Internal?

Fixed as a part of the redesign changes itself.

Added certificates externally only in server repo (commit)
Only need them in images that need to connect to the AWS Document DB (comment)

  • Dashboard repos are based on server image so they will have certificates.
  • Node.js based images do not need them.

Since the same base server image is used by the server image dependent containers (webapp, analysis, admin-dash, public-dash-notebook), we have added it right at the source image which is the external server image. This way it is ensured that the certificates are present in all the cascading images.


Based on comment below, added them to internal and removed from external.

@shankari
Copy link
Contributor Author

@MukuFlash03 right we have implemented this. But I am suggesting that we revisit that decision

Discuss where to put in the cert configuration. We originally had it in the internal repos, then we moved it to the external repos. But now that we have one internal dockerfile per external dockerfile, maybe we can have it be internal after all. It doesn't actually hurt anything to be external, but it is unnecessary in the external repo.

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

Task A-6: Switching from $GITHUB_ENV to step outputs using GITHUB_OUTPUT

Shankari's comments

Consider switching from $GITHUB_ENV to step outputs. I think that the step outputs are the recommended approach to pass values from one step to another, but we should verify

I couldn’t find any official statement on which is the recommended approach.
But I did see this warning that says “set-output” is deprecated, switch to using environment files

Screenshot 2024-09-20 at 11 49 22 AM

Note that this doesn’t say that “steps.output” is deprecated; it says “set-output” is deprecated.
So, step outputs are still valid and we essentially have to choose between: GITHUB_ENV and GITHUB_OUTPUT


One argument in favor of GITHUB_OUTPUT is this:

  • This comment mentions that Environment variables are only available in the same job.
  • Right now we do have just one job in the dashboard build workflows.
  • GITHUB_OUTPUT documentation mentions that job outputs available to all dependent downstream jobs.
  • But in the future if we have more jobs, then GITHUB_OUTPUT could be more suited.

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

Task A-2: Avoid uploading date as Artifacts -> Use .env file instead?

Shankari's comments

Explore whether we can stop uploading the date of of the run as an artifact since we are using an .env file in the dashboards. Can't this just be an output of the run?


REST API endpoints

I looked at some REST API endpoints to see if we can access outputs of jobs/steps in a run outside of the workflow run.
We can then use them directly in the internal script to pull all tags.

Not Suitable

  1. Get a workflow run
    I did not find an API endpoint to directly reference outputs inside in a job.
    This endpoint gets details about a workflow run but does not have info on outputs.

  2. List jobs for a workflow run
    This lists all jobs and steps but it only lists their completion status, time of execution and not outputs or any details.

  3. Download workflow run logs
    I did however find an API endpoint that allows downloading workflow run logs.
    This does give the outputs but it also gives everything from the logs as seen in the UI.
    This is a lot of redundant information that we will need to parse to only fetch the outputs.
    It downloads logs a zip file which would again be another hassle to: download -> extract -> read / parse -> clean up files.


Optimal Approach

  1. Get repository content
    This endpoint allows to directly read contents of a file, in our case .env files containing tags.
    For instance, this API response is from my forked repo where I've stored the tags in a .env file.
    The API response has the file content in base64 encoded format:
...
 "download_url": "https://raw.githubusercontent.com/MukuFlash03/em-public-dashboard/cleanup-cicd/.env",
  "type": "file",
  "content": "Tk9URUJPT0tfSU1BR0VfVEFHPTIwMjQtMDktMjAtLTEzLTM2CkZST05URU5E\nX0lNQUdFX1RBRz0yMDI0LTA5LTIwLS0wNC0zOApTRVJWRVJfSU1BR0VfVEFH\nPTIwMjQtMDktMjAtLTA5LTEwCg==\n",
...

Decoding this base64 data (for decoding, the newline characters '\n\ need to be removed and combine the entire encoded string)

NOTEBOOK_IMAGE_TAG=2024-09-20--13-36
FRONTEND_IMAGE_TAG=2024-09-20--04-38
SERVER_IMAGE_TAG=2024-09-20--09-10

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 20, 2024

Task A-2: Avoid uploading date as Artifacts -> Use .env file instead? [Contd.]

With the API endpoint mentioned above, we can directly read contents of files.

Proposed Approach

To have a separate .env file in each of the repositories: server, join-page, admin-dash, public-dash.
This file would contain the image tags of the latest uploaded images.
Server image tag is needed in the dashboard repos' .env files since the Dockerfiles use the server tag as an ARG

In server repo: SERVER_IMAGE_TAG
In join-page repo: JOIN_IMAGE_TAG
In admin-dash repo: ADMIN_DASH_IMAGE_TAG, SERVER_IMAGE_TAG
In public-dash: PUBLIC_DASH_NOTEBOOK_IMAGE_TAG, PUBLIC_DASH_FRONTEND_IMAGE_TAG, SERVER_IMAGE_TAG


Pros of this approach:

  • Greatly reduces code in internal script to pull tags
    • We don't have to fetch latest workflow runs as we can simply read the file contents of a file available in a public repository.
  • Need only one file .env in each repo that can store all image tags relevant to the repository.
    • Currently in public-dash we have two .env files (.env for server tag, .env.repoTags for frontend tag).
  • Will not need to keep track of Server image tag in the dependent dashboard repos since we will be reading the SERVER_IMAGE_TAG from the .env file in server repo directly.
    • Can use this API endpoint in the dashboard workflows as well as internal script.
    • This would mean we would no longer have to check the separate PUSH and WORKFLOW_DISPATCH events related to server tag. For public-dash, still need the checks as frontend tag depends on event type.

@shankari
Copy link
Contributor Author

shankari commented Sep 21, 2024

why use something complicated which retrieves the file as a base64 encoded string instead of just reading the files directly using the raw option?

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 21, 2024

why use something complicated which retrieves the file as a base64 encoded string instead of just reading the files directly using the raw option?

I just thought of sticking to using Github REST API and was looking for options within that.

But reading the raw contents as text is a much simpler option indeed.
It also does not require any headers, authorization token for the request which is valid since it is publicly available data anyways.
Will switch out the URLs and remove the base64 code.
Thank you for pointing that out!

MukuFlash03 pushed a commit to MukuFlash03/em-public-dashboard that referenced this issue Sep 23, 2024
MukuFlash03 pushed a commit to MukuFlash03/em-public-dashboard that referenced this issue Sep 23, 2024
Refer to details in cleanup issue:
Task A-2: e-mission/e-mission-docs#1082 (comment)

Added .env file initialized with the current latest tag of admin-dash and server image.
Internal script can read docker image tags directly from .env.tags using curl to fetch raw file contents.
Storing server tag as well since admin-dash Dockerfile uses it.

Removed workflow dispatch inputs
No longer need inputs since reading from .env.tags in server repo directly

------

Read raw file contents directly instead of using REST API
REST API endpoint returns base64 encoded data which then needs to be decoded.
Can simply read the Raw file contents from the publicly available file.

------

Also changed tag name to match tag name in internal repository
This way we can directly use the tag name without having extra fields like envVar in the internal script.

-----

For now not removing artifacts until the internal script is updated to handle this change.
MukuFlash03 pushed a commit to MukuFlash03/em-public-dashboard that referenced this issue Sep 23, 2024
…ternal

Task A-8: Prefixing branch name to the docker tag along with the date.
In the internal script we will not need to maintain the different branch lists as the images will be completely tagged in the external workflows themselves.
We can simply use the tags without modifications then.

For now, not prefixing the tag to the artifact since we will be removing the artifact anyways.
And current internal script works with artifacts.
Once I update the internal script, will come back and remove artifacts.
Also removing prefixed branch name from frontend image artifact in case of workflow dispatch event since it uses the existing frontend image tag generated during push event which already has prefixed branch name

In Dockerfile, removing hardcoded branch name, since in this change, we are already included the branch name in image tag.

----------

Task A-7: Certifcates added to internal Dockerfiles.

Refer to issue comment for details:
Task A-7: e-mission/e-mission-docs#1082 (comment)

The certificates are relevant to our internal AWS configuration and not needed externally.
They can be present externally too without having any major effect.
But removing them helps keeping the base image clean.
Additionally, anyone working with the code can customize with their own certificates if needed or adopt an approach which doesn't even need certificates in the first place.
MukuFlash03 pushed a commit to MukuFlash03/nrel-openpath-join-page that referenced this issue Sep 23, 2024
MukuFlash03 pushed a commit to MukuFlash03/nrel-openpath-join-page that referenced this issue Sep 23, 2024
Refer to details in cleanup issue:
Task A-2: e-mission/e-mission-docs#1082 (comment)

Storing latest tag as well so that artifacts are not needed.

For now not removing artifacts until the internal script is updated to handle this change.

----

Task A-8: Prefixing branch name to the docker tag along with the date.
In the internal script we will not need to maintain the different branch lists as the images will be completely tagged in the external workflows themselves.
We can simply use the tags without modifications then.

For now, not prefixing the tag to the artifact since we will be removing the artifact anyways.
And current internal script works with artifacts.
Once I update the internal script, will come back and remove artifacts.
MukuFlash03 pushed a commit to MukuFlash03/op-admin-dashboard that referenced this issue Sep 23, 2024
MukuFlash03 pushed a commit to MukuFlash03/op-admin-dashboard that referenced this issue Sep 23, 2024
Refer to details in cleanup issue:
Task A-2: e-mission/e-mission-docs#1082 (comment)

Added .env file initialized with the current latest tag of admin-dash and server image.
Internal script can read docker image tags directly from .env.tags using curl to fetch raw file contents.
Storing server tag as well since admin-dash Dockerfile uses it.

Removed workflow dispatch inputs
No longer need inputs since reading from .env.tags in server repo directly

------

Read raw file contents directly instead of using REST API
REST API endpoint returns base64 encoded data which then needs to be decoded.
Can simply read the Raw file contents from the publicly available file.

-----

For now not removing artifacts until the internal script is updated to handle this change.
MukuFlash03 pushed a commit to MukuFlash03/op-admin-dashboard that referenced this issue Sep 23, 2024
…ternal

Task A-8: Prefixing branch name to the docker tag along with the date.
In the internal script we will not need to maintain the different branch lists as the images will be completely tagged in the external workflows themselves.
We can simply use the tags without modifications then.

For now, not prefixing the tag to the artifact since we will be removing the artifact anyways.
And current internal script works with artifacts.
Once I update the internal script, will come back and remove artifacts.

In Dockerfile, removing hardcoded branch name, since in this change, we are already included the branch name in image tag.

----------

Task A-7: Certifcates added to internal Dockerfiles.

Refer to issue comment for details:
Task A-7: e-mission/e-mission-docs#1082 (comment)

The certificates are relevant to our internal AWS configuration and not needed externally.
They can be present externally too without having any major effect.
But removing them helps keeping the base image clean.
Additionally, anyone working with the code can customize with their own certificates if needed or adopt an approach which doesn't even need certificates in the first place.
MukuFlash03 pushed a commit to MukuFlash03/e-mission-server that referenced this issue Sep 23, 2024
MukuFlash03 pushed a commit to MukuFlash03/e-mission-server that referenced this issue Sep 23, 2024
Refer to details in cleanup issue:
Task A-2: e-mission/e-mission-docs#1082 (comment)

Storing server tag as well so that artifacts are not needed.
Can also remove image tag passed as input in Workflow dispatch POST request.
Workflow input also removed in dashboard workflows

For now not removing artifacts until the internal script is updated to handle this change.

----

Task A-8: Prefixing branch name to the docker tag along with the date.
In the internal script we will not need to maintain the different branch lists as the images will be completely tagged in the external workflows themselves.
We can simply use the tags without modifications then.

For now, not prefixing the tag to the artifact since we will be removing the artifact anyways.
And current internal script works with artifacts.
Once I update the internal script, will come back and remove artifacts.
MukuFlash03 pushed a commit to MukuFlash03/e-mission-server that referenced this issue Sep 23, 2024
…ally

Refer to issue comment for details:
Task A-7: e-mission/e-mission-docs#1082 (comment)

The certificates are relevant to our internal AWS configuration and not needed externally.
They can be present externally too without having any major effect.
But removing them helps keeping the base image clean.
Additionally, anyone working with the code can customize with their own certificates if needed or adopt an approach which doesn't even need certificates in the first place.
@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 23, 2024

Task A-8: Stored tag format changed to branch_name-timestamp

Currently the docker images are tagged as <branch_name>_ but the artifact tags only had the timestamp.
The branch names are hardcoded in the internal scripts and are prefixed to the retrieved timestamps.

However, this causes a problem if the docker image upload is done from a different branch in the external GitHub workflow, since the complete returned tag in the internal script would still have the default hardcoded branch_name but latest timestamp.

Hence now storing the same tag that the docker image is tagged with including both the branch_name and timestamp.

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 23, 2024

All PRs tracker (comment)

First set of cleanup PRs added for Tasks A-2, 6, 7, 8.

PRs: e-mission-server, join-page, admin-dash, public-dash

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 30, 2024

Task A-5: Reusable workflows for unifying workflows to reduce duplicated code

Shankari's comments

we currently have two copies of the cascading code (in the two dashboard repos). We should explore unifying them. Maybe nested workflows?


Initially, I thought that only the dashboard workflows would be affected. But after the first round of changes for this cleanup issue, the join repo workflow is also almost identical to the dashboard repo workflows.

Reusable workflows are the perfect way to achieve DRY.
It essentially works like a function call in normal programming.
I'm storing the reusable workflow in the e-mission-server repo.

I have added conditional checks that check for the repo name in the reusable workflow file that determine which statements to execute depending on for which repo the workflow is running.

This is used for both push events specific to a repo as well as for the workflow dispatch events triggered on pushes to server repo.


Advantages of reusable workflows:

  • No repeated code the image build process. All the other repos (join, admin-dash, public-dash) reuse the same workflow file.
  • For any future changes related to GitHub actions workflow file, will no longer need to have 3 additional PRs for each repo (join, admin-dash, public-dash). Can simply modify the reusable workflow file as this is the core “function” workflow that is being called.

Testing done
Workflow runs: Server, Join, Admin-dash, Public-dash

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 30, 2024

Highlighting some problems faced while implementing and approaches to resolve them:

1. git push failing in Github Actions workflow file

  • Default GitHub Token is used if no token specified in actions/checkout (documentation)
  • But then the "git push" command was failing as it was unable to access files in other repos. This was because the reusable workflow is located in the server repo but it was trying to make changes to contents of other repos.
  • And the default GITHUB_TOKEN is scoped to the current repository, hence it does not authorize this.

Hence, I passed in the existing fine-grained token while checking out:

      - name: Checkout repository
        uses: actions/checkout@v4
        with:
          repository: e-mission/${{ inputs.repo }}
          ref: ${{ inputs.branch }}
          token: ${{ secrets.GH_FG_PAT_TAGS }}

2. After solving the above problem, adding fine-grained token resulted in infinite workflow runs

  • So we have the initial trigger which is either a normal commit / push OR triggered via workflow dispatch by the server.
  • Now, infinite workflow calls result because the "git push" in the workflow file to update the .env file is also in fact a push event and this triggers yet another workflow, which in turn triggers another workflow and runs infinitely.
    - name: Add, Commit, Push changes to .env file
      run: |
        git config --local user.email "[email protected]"
        git config --local user.name "Github Actions bot to update .env with latest tags"
        if git diff --quiet; then
          echo "Latest timestamp already present in .env file, no changes to commit"
        else
          git add .env
          git commit -m "Updated docker image tag in .env file to the latest timestamp"
          git push origin 
        fi

Now, the question I asked myself was - why isn't this happening in the current codebase where the exact same workflow file was there? We haven't seen infinite workflow runs due to the commits by GitHub actions to update the .env files.

The answer is that currently, in the dashboard workflows that update the .env file, we are using the default GITHUB_TOKEN which has a built-in mechanism to avoid recursive workflow calls documentation.
Confirmed from another developer on stackoverflow (link).


Dilemma

  • Now, due to Problem 1, we need to use fine-grained token with access to write to other repos.
  • But due to Problem 2, default GITHUB_TOKEN is needed to avoid infinite workflow runs.

Resolution

1. Authorize git push from another repo's workflow using fine-grained token with additional permission

  • Can still use Fine-grained token with permission to update repository contents set as Read and Write.
  • These are the permissions for the current fine-grained token in use as noted by Shankari (comment).
  • Need to add the Contents: Write permission as well (documentation)
Screenshot 2024-09-29 at 6 06 37 PM



  • The reason we did not need it currently is because the "git push" to update the .env file is in a workflow file that is in the same repository as the .env file for the dashboard repos.
  • But with the reusable workflow, the workflow is in the server repository while it attempts to update the .env files in the dashboard repositories.

2. Prevent infinite workflow runs by checking for latest commit message details

In the reusable workflow, these are the git user details used:

      - name: Add, Commit, Push changes to .env file
        run: |
          git config --local user.email "[email protected]"
          git config --local user.name "Github Actions bot to update .env with latest tags"

In the caller workflows in the join repository and the dashboard repositories, I am now checking for the username:

jobs:
  build:  
    if: ${{ !contains(github.event.head_commit.author.name, 'Github Actions bot to update .env with latest tags') }}
    uses: e-mission/e-mission-server/.github/workflows/reusable_image_build_push.yml@master

How the process would look now:

  • So, for a normal push or workflow dispatch, the commit would be made by a developer.
  • The workflow would run for this commit as it currently does.
  • Now, this also triggers another workflow due to the push event to update the .env file.
  • However with the check in place, it prevents the workflow from running if the last commit was the automated commit with the predefined username.

@MukuFlash03
Copy link
Contributor

Task A-4: Run image_build_push only after tests pass successfully

Approach:

  • Convert test-with-docker and test-with-manual-install to reusable workflows.
  • Add them as jobs in image_build_push workflow.
  • Build job in this workflow needs these test jobs as dependencies assuring that image build occurs after test jobs are successful.

Test Workflow runs

  • End to end run with tests, server build, dispatch: server, join, admin-dash, public-dash
  • Both tests success (run)
  • Docker success, Manual install failure (run)
  • Docker failure, Manual install success (run)

Details

Github actions didn't have out of the box solution for running a workflow based on results of multiple workflows where ALL workflows must have completed successfully.
We need this since both the test-with-docker and test-with-manual-install must pass.
So this needs an AND logic.

workflow_run is there but this triggers the dependent workflow when either of the workflow dependencies defined as prerequisites are completed.
So this has an OR logic.


Found an alternative suggestion here.
This suggests converting the pre-requisite workflows into reusable workflows.

These workflows can then be called in jobs in the workflow that needs these workflows to be run before.
Finally, these jobs can be added as dependencies for the image build job.

jobs:
  test-with-docker:
    uses: e-mission/e-mission-server/.github/workflows/test-with-docker.yml@master

  test-with-manual-install:
    uses: e-mission/e-mission-server/.github/workflows/test-with-manual-install.yml@master

  build:
    runs-on: ubuntu-latest
    needs: [test-with-docker, test-with-manual-install]

Also, removed the push trigger from the two test workflows since the image_build_push workflow would also be triggered on a push event which in its workflow triggers these two test workflows.
Thus if not removed, each test workflow would run twice.

@MukuFlash03
Copy link
Contributor

MukuFlash03 commented Sep 30, 2024

All PRs tracker (comment)

Second set of cleanup PRs added for Tasks A-4, 5.

PRs: e-mission-server, join-page, admin-dash, public-dash

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants