Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Specs for Continous Integration and Code Reviews #248

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions ci.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# OCR-D Recommendations for Using CI in Your Repository

Regarding a fast and safe integration of new changes to your software, the OCR-D Coordination Project (CP) recommends setting up and using a CI/CD tool for automating the software's unit and integration test, its build and (if applicable) deployment.
Ideally, this should be done as one of the first steps after setting up a new repository so that your software evidently build and passes quality assurance.

Depending on the version control server you use there are several options to choose from:

* GitHub
* CircleCI
* Github Actions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Github Actions
* GitHub Actions

* Jenkins
* ...
* GitLab
* GitLab CI

The choice of a CI/CD tools heavily relies on your version control server as well as your project's requirements and your instution's infrastructure.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The choice of a CI/CD tools heavily relies on your version control server as well as your project's requirements and your instution's infrastructure.
The choice of a CI/CD tools heavily relies on your version control server as well as your project's requirements and your institution's infrastructure.


The OCR-D CP can assist you with the following tools in case you have questions:

* CircleCI
* GitLab CI

## CI/CD Best Practices

- try to keep your pipelines fast to get rapid feedback. some aspects to consider is running the fastest test first (see below), using small images for your Docker containers, and caching your data.
- your CI/CD pipelines should be furnished with proper unit and integration tests. these should run as one of the first stages in your pipelines in order to detect defects early.
- the test stage should run on every push to the remote repository in order to make sure that a commit doesn't break anything.
- commit often in order to detect breaking changes early.
- run your fastest test first. if they fail, they fail early and save a lot of resources compared to running slower tests first.
- run your tests locally before running them via pipelines. this will save a lot of resources as well.
- commit often in order to detect breaking changes early.
- either minimize your branching model or make sure that the pipelines run on feature/bugfix branches as well. otherwise you might not notice breaking changes in time.
- ensure that the main functionality of your software is still up and running after having deployed (smoke testing).
- provide an easy way to rollback in case something goes really wrong, e.g. being able to re-deploy a previous release.
- reuse existing configuration, e.g. by leveraging CircleCI's orbs or GitLab templates.
- secure your pipelines. secret variable should be secret, and not everyone in your organization should have admin rights in your CI/CD environment.
Comment on lines +25 to +36
Copy link
Contributor

@stweil stweil Jul 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- try to keep your pipelines fast to get rapid feedback. some aspects to consider is running the fastest test first (see below), using small images for your Docker containers, and caching your data.
- your CI/CD pipelines should be furnished with proper unit and integration tests. these should run as one of the first stages in your pipelines in order to detect defects early.
- the test stage should run on every push to the remote repository in order to make sure that a commit doesn't break anything.
- commit often in order to detect breaking changes early.
- run your fastest test first. if they fail, they fail early and save a lot of resources compared to running slower tests first.
- run your tests locally before running them via pipelines. this will save a lot of resources as well.
- commit often in order to detect breaking changes early.
- either minimize your branching model or make sure that the pipelines run on feature/bugfix branches as well. otherwise you might not notice breaking changes in time.
- ensure that the main functionality of your software is still up and running after having deployed (smoke testing).
- provide an easy way to rollback in case something goes really wrong, e.g. being able to re-deploy a previous release.
- reuse existing configuration, e.g. by leveraging CircleCI's orbs or GitLab templates.
- secure your pipelines. secret variable should be secret, and not everyone in your organization should have admin rights in your CI/CD environment.
- Try to keep your pipelines fast to get rapid feedback. Some aspects to consider are running the fastest test first (see below), using small images for your Docker containers, and caching your data.
- Your CI/CD pipelines should be furnished with proper unit and integration tests. These should run as one of the first stages in your pipelines in order to detect defects early.
- The test stage should run on every push to the remote repository in order to make sure that a commit doesn't break anything.
- Commit often in order to detect breaking changes early.
- Run your fastest test first. If they fail, they fail early and save a lot of resources compared to running slower tests first.
- Run your tests locally before running them via pipelines. This will save a lot of resources as well.
- Commit often in order to detect breaking changes early.
- Either minimize your branching model or make sure that the pipelines run on feature/bugfix branches as well. Otherwise you might not notice breaking changes in time.
- Ensure that the main functionality of your software is still up and running after having deployed (smoke testing).
- Provide an easy way to rollback in case something goes really wrong, e.g. being able to re-deploy a previous release.
- Reuse existing configuration, e.g. by leveraging CircleCI's orbs or GitLab templates.
- Secure your pipelines. Secret variable should be secret, and not everyone in your organization should have admin rights in your CI/CD environment.



## Sources

- <https://www.digitalocean.com/community/tutorials/an-introduction-to-ci-cd-best-practices>
- <https://www.fpcomplete.com/blog/continuous-integration-delivery-best-practices/>
- <https://circleci.com/blog/top-5-ci-cd-best-practices>
- … and of course our daily work with CI/CD tools.
66 changes: 66 additions & 0 deletions code_review.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Code Review Guidelines in OCR-D

The OCR-D coordination project (CP) is responsible for ensuring an optimal level of code quality during the third phase of the OCR-D project.
To achieve this, the CP has created a development workflow in GitHub the implementation projects (IP) should stick to.

The main part of this development workflow are GitHub's pull requests (PR).
IPs create pull requests and assign one or more of the CP's developers as reviewers.
After a successful review, the code can be merged into the codebase.

## How the Pull Requests Should Look Like

Since the CP manages the pull requests (PRs) of all IPs, they should follow a certain pattern to make reviewers' lifes easier.

Each pull request should contain only a small feature/increment.
This way it is easier for reviewers to grasp what is happening on the code level, and it shortens the time the review takes.
This also minimizes the time developers have to wait for feedback on their code.

PRs should be provided with appropriate tests (at least unit tests) so that reviewers can check if the functionality the feature provides is fully implemented.

The PR should have a meaningful description; Please keep in mind that reviewers are not necessarily familiar with every aspect and requirement of the respective IP.
Providing a description of the changes and their rationale not only helps reviewers to quickly understand the context of the PR but also serves as a kind of project documentation as well.

Last but not least developers in the IPs should keep the code review checklist below in mind during their work as well and can of course give feedback about it.

## How the Review Process Works

The CP performs informal code reviews.
A survey [1] finds these to be just as effective as more formalized forms of reviews.
Different developers have varying degrees of experience and/or expertise and therefore have different perspectives on code;
Therefore we try to cushion this effect by using an additional checklist the reviewers can consult during the review.
Of course reviewers are not bound to this checklist:
If they spot something interesting while reading the code that's not on the list they are highly encouraged to note it anyway!
Aspects of the list that do not apply in a given PR can be skipped.
The checklist serves as a general guide to streamline the reviews but doesn't have to be worked off meticulously.

### The Code Review Checklist

During the review process reviewers should ask themselves the following questions:

- Does the feature's functionality meet the IP's requirements?
- Is the code readable? Can I understand what it's doing by simply reading it from top to bottom? Are functions and variables properly named? Do they reflect what's going on?
- Does the code stick to the repository's style guidelines?
- Is a dependency introduced that is not necessary?
- Can the code be easily expanded or changed if need be? Is the new feature/class/function coupled loosely (or not at all) to another system? Is the configuration injected instead of hard-coded?
- Does the new feature introduce any security risks? Can the code be exploited in some way?
- Is the code performant? Does it only allocate resources it actually needs? Could it achieve the same result with less disk usage/memory/...? Does the execution take as little time as possible without compromising the result?
- Does the code provide appropriate documentation where necessary, e.g. a README? Does the documentation explain _why_ things happen instead of describing _what_ happens? Do all of the comments help to understand why a certain decision has been made? Could some of the comments be removed by improving the readability of the code? Have all TODOs and commented out code been removed?
- Does the code use language features to the full extent? Can you spot any design patterns? If not, could it be feasible to introduce them?
- Are errors anticipated and handled gracefully? Are the error messages user-friendly? Are errors logged in a way such that users can understand what went wrong?
- Is the feature scalable, i.e. does it handle a lot of input or requests well?
- Is the code reusable? Does it follow the YAGNI and DRY principles?
- Does the feature reuse functionality that already exists (in a similar way) in the codebase?
- Does the feature have at least unit tests? Are the tests sufficient or are there any cases missing? Are the tests executed automatically by a CI/CD runner?
- Is there anything missing in the feature (think of edge cases, unexpected inputs etc.)?
- Is there anything to improve on the feature's architecture?
- Are the changes backwards compatible if necessary (i.e. when parts of the software that are already widely used are changed)?
- Does the CLI stick to the OCR-D guidelines?

Sources for this list:

- <https://www.codementor.io/blog/code-review-checklist-76q7ovkaqj>
- <https://www.michaelagreiler.com/wp-content/uploads/2019/08/Code_Review_Checklist_Greiler.pdf>

General sources:

[1] <https://t2informatik.de/wissen-kompakt/code-review/>