Skip to content

Commit

Permalink
Lots of work
Browse files Browse the repository at this point in the history
  • Loading branch information
ErikSchierboom committed Aug 6, 2024
1 parent 936b93c commit dd658c3
Show file tree
Hide file tree
Showing 6 changed files with 232 additions and 58 deletions.
4 changes: 2 additions & 2 deletions building/tooling/analyzers/creating-from-scratch.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Firstly, thank you for your interest in creating an Analyzer!

These are the steps to get going:

1. Check [our repository list for an existing `...-analyzer`](https://github.com/exercism?q=-analyzer) to ensure that one doesn't already exist.
1. Check [our repository list for an existing `...-analyzer`](https://github.com/search?q=org%3Aexercism+analyzer&type=repositories) to ensure that one doesn't already exist.
2. Scan the [contents of this directory](/docs/building/tooling/analyzers) to ensure you are comfortable with the idea of creating an Analyzer.
3. Open an issue at [exercism/exercism][exercism-repo] introducing yourself and telling us which language you'd like to create a Analyzer for.
3. Start a new topic on [the Exercism forum][building-exercism] telling us which language you'd like to create a Test Runner for.
4. Once an Analyzer repo has been created, use [the Analyzer interface document](/docs/building/tooling/analyzers/interface) to help guide your implementation.

We have an incredibly friendly and supportive community who will be happy to help you as you work through this! If you get stuck, please start a new topic on [the Exercism forum][building-exercism] or create new issues at [exercism/exercism][exercism-repo] as needed 🙂
Expand Down
261 changes: 225 additions & 36 deletions building/tooling/best-practices.md
Original file line number Diff line number Diff line change
@@ -1,39 +1,18 @@
# Best Practices

## Follow best practices
## Follow official best practices

The official [Dockerfile best practices](https://docs.docker.com/develop/develop-images/dockerfile_best-practices/) have lots of great content on how to improve your Dockerfiles.

## Prefer official images

There are many Docker images on [Docker Hub](https://hub.docker.com/), but try to use [official ones](https://hub.docker.com/search?q=&image_filter=official).

## Pin versions

To ensure that builds are stable (i.e. they don't suddenly break), you should always pin your base images to specific tags.
That means instead of:

```dockerfile
FROM alpine:latest
```

you should use:

```dockerfile
FROM alpine:3.20.2
```

With the latter, builds will always use the same version.

## Performance

You should primarily optimize for performance (especially for test runners).
This will ensure your tooling runs as fast as possible and does not time-out.

### Experiment with different Base images

Try experimenting with different base images (e.g. Ubuntu instead of Alpine), to see if one (significantly) outperforms the other.
If performance is relatively equal,
Try experimenting with different base images (e.g. Alpine instead of Ubuntu), to see if one (significantly) outperforms the other.
If performance is relatively equal, go for the image that is smallest.

### Try Internal Network

Expand All @@ -49,12 +28,29 @@ Tooling runs as one-off, short-lived Docker container:
3. The Docker container is destroyed

Therefore, code that runs in step 2 runs for _every single tooling run_.
For this reason, reducing the amount of code to run in step 2 is a great way to improve performance
For this reason, reducing the amount of code that runs in step 2 is a great way to improve performance
One way of doing this is to move code from _run-time_ to _build-time_.
Whilst run-time code runs every single tooling run, build_time code only runs once (when the Docker image is built).
Whilst run-time code runs on every single tooling run, build-time code only runs once (when the Docker image is built).

As build-time code runs as part of a GitHub Actions workflow, the student will never notice it.
This also means that the code at build-time could be relatively slow, it's only running once after all!
Build-time code runs once as part of a GitHub Actions workflow.
Therefore, its fine if the code that runs at build-time is (relatively) slow.

#### Example: pre-compilation

When running tests in the Haskell test runner, it requires some base libraries to be compiled.
As each test run happens in a fresh container, this means that this compilation was done _in every single test run_!
To circumvent this, the [Haskell test runner's Dockerfile](https://github.com/exercism/haskell-test-runner/blob/5264c460054649fc672c3d5932c2f3cb082e2405/Dockerfile) has the following two commands:

```dockerfile
COPY pre-compiled/ .
RUN stack build --resolver lts-20.18 --no-terminal --test --no-run-tests
```

First, the `pre-compiled` directory is copied into the image.
This directory is setup as a sort of fake exercise and depends on the same base libraries that the actual exercise depend on.
Then we run the tests on that directory, which is similar to how tests are run for an actual exercise.
Running the tests will result in the base being compiled, but the difference is that this happens at _build time_.
The resulting Docker image will thus have its base libraries already compiled, which means that no longer has to happen at _run time_, resulting in (much) faster execution times.

## Size

Expand All @@ -64,9 +60,32 @@ You should try to reduce the image's size, which means that it'll be:
- Reduces costs for us
- Marginally improves startup time of each container

### Try different Base images
### Try different distributions

Different distribution images will have different sizes.
For example, the `alpine:3.20.2` image is **ten times** smaller than the `ubuntu:24.10` image:

```
REPOSITORY TAG SIZE
alpine 3.20.2 8.83MB
ubuntu 24.10 101MB
```

In general, Alpine-based images are amongst the smallest images, so many tooling images are based on Alpine.

Some base images are
### Try slimmed-down images

Some images have special "slim" variants, in which some features will have been removed resulting in smaller image sizes.
For example, the `node:20.16.0-slim` image is **five times** smaller than the `node:20.16.0` image:

```
REPOSITORY TAG SIZE
node 20.16.0 1.09GB
node 20.16.0-slim 219MB
```

The reason "slim" variants are smaller is that they'll have less features.
Your image might not need the additional features, and if not, consider using the "slim" variant.

### Removing unneeded bits

Expand All @@ -77,18 +96,188 @@ These can include things like:
- Files targeting different architectures from the Docker image
- Documentation

### Cleanup package manager
#### Remove package manager files

Most Docker images need to install additional packages, which is usually done via a package manager.
These packages must be installed at _build time_ (as no internet connection is available at _run time_).
Therefore, any package manager caching/bookkeeping files should be removed after installing the additional packages.

##### apk

Distributions that uses the `apk` package manager (such as Alpine) should use the `--no-cache` flag when using `apk add` to install packages:

```dockerfile
RUN apk add --no-cache curl
```

##### apt-get/apt

Distributions that uses the `apt-get`/`apk` package manager (such as Ubuntu) should run the `apt-get autoremove -y` and `rm -rf /var/lib/apt/lists/*` commands _after_ installing the packages:

```dockerfile
RUN apt-get update && \
apt-get install curl -y && \
apt-get autoremove -y && \
rm -rf /var/lib/apt/lists/*
```

### Use multi-stage builds

https://docs.docker.com/build/building/multi-stage/
Docker has a feature called [multi-stage builds](https://docs.docker.com/build/building/multi-stage/).
These allow you to partition your Dockerfile into separate _stages_, with only the last stage ending up in the produced Docker image (the rest is only there to support building the last stage).
You can think of each stage as its own mini Dockerfile; stages can use different base images.

Multi-stage builds are particularly useful when your Dockerfile requires packages to be installed that are _only_ needed at build time.
In this situation, the general structure of your Dockerfile looks like this:

1. Define a new stage (we'll call this the "build" stage).
This stage will _only_ be used at build time.
2. Install the required additional packages (into the "build" stage).
3. Run the commands that require the additional packages (within the "build" stage).
4. Define a new stage (we'll call this the "runtime" stage).
This stage will make up the resulting Docker image and executed at run time.
5. Copy the result(s) from the commands run in step 3 (in the "build" stage) into this stage (the "runtime" stage).

With this setup, the additional packages are _only_ installed in the "build" stage and _not_ in the "runtime" stage, which means that they won't end up in the Docker image that is produced.

TODO
#### Example: downloading files

The Fortran test runner requires `curl` to download some files.
However, its run time image does _not_ need `curl`, which makes this a perfect use case for a multi-stage build.

First, it's [Dockerfile](https://github.com/exercism/fortran-test-runner/blob/783e228d8449143d2040e68b95128bb791833a27/Dockerfile) defines a stage (named "build") in which the `curl` package is installed.
It then uses curl to download files into that stage.

```dockerfile
FROM alpine:3.15 AS build

RUN apk add --no-cache curl

WORKDIR /opt/test-runner
COPY bust_cache .

WORKDIR /opt/test-runner/testlib
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/testlib/CMakeLists.txt
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/testlib/TesterMain.f90

WORKDIR /opt/test-runner
RUN curl -R -O https://raw.githubusercontent.com/exercism/fortran/main/config/CMakeLists.txt
```

The second part of the Dockerfile defines a new stage and copies the downloaded files from the "build" stage into its own stage using the `COPY` command:

```dockerfile
FROM alpine:3.15

RUN apk add --no-cache coreutils jq gfortran libc-dev cmake make

WORKDIR /opt/test-runner
COPY --from=build /opt/test-runner/ .

COPY . .
ENTRYPOINT ["/opt/test-runner/bin/run.sh"]
```

##### Example: installing libraries

The Ruby test runner needs the `git`, `openssh`, `build-base`, `gcc` and `wget` packages to be installed before its required libraries (gems) can be installed.
Its [Dockerfile](https://github.com/exercism/ruby-test-runner/blob/e57ed45b553d6c6411faeea55efa3a4754d1cdbf/Dockerfile) starts with a stage (given the name `build`) that install those packages (via `apk add`) and then installs the libaries (via `bundle install`):

```dockerfile
FROM ruby:3.2.2-alpine3.18 AS build

RUN apk update && apk upgrade && \
apk add --no-cache git openssh build-base gcc wget git

COPY Gemfile Gemfile.lock .

RUN gem install bundler:2.4.18 && \
bundle config set without 'development test' && \
bundle install
```

It then defines the stage that will form the resulting Docker image.
This stage does _not_ install the dependencies the previous stage installed, instead it uses the `COPY` command to copy the installed libraries from the build stage into its own stage:

```dockerfile
FROM ruby:3.2.2-alpine3.18

RUN apk add --no-cache bash

WORKDIR /opt/test-runner

COPY --from=build /usr/local/bundle /usr/local/bundle

COPY . .

ENTRYPOINT [ "sh", "/opt/test-runner/bin/run.sh" ]
```

```exercism/note
The [C# test runner's Dockerfile](https://github.com/exercism/csharp-test-runner/blob/b54122ef76cbf86eff0691daa33c8e50bc83979f/Dockerfile) does something similar, only in this case the build stage can use an existing Docker image that has pre-installed the additional packages required to install libraries.
```

## Safety

TODO
Safety is a main reason why we're using Docker containers to run our tooling.

### Prefer official images

There are many Docker images on [Docker Hub](https://hub.docker.com/), but try to use [official ones](https://hub.docker.com/search?q=&image_filter=official).
These images are curated and have (far) less chance of being unsafe.

## Support read-only filesystem
### Pin versions

To ensure that builds are stable (i.e. they don't suddenly break), you should always pin your base images to specific tags.
That means instead of:

TODO
```dockerfile
FROM alpine:latest
```

you should use:

```dockerfile
FROM alpine:3.20.2
```

With the latter, builds will always use the same version.

### Run as a non-privileged user

By default, many images will run with a user that has root privileges.
You should consider running as a non-privileged user.

```dockerfile
FROM alpine

RUN groupadd -r myuser && useradd -r -g myuser myuser

# <RUN COMMANDS THAT REQUIRES ROOT USER, LIKE INSTALLING PACKAGES ETC.>

USER myuser
```

### Update package repositories to latest version

It is (almost) always a good idea to install the latest versions

```dockerfile
RUN apt-get update && \
apt-get install curl
```

### Support read-only filesystem

We encourage Docker files to be written using a read-only filesystem.
The only directories you should assume to be writeable are:

- The solution dir (passed in as the second argument)
- The output dir (passed in as the third argument)
- The `/tmp` dir

```exercism/caution
Our production environment currently does _not_ enforce a read-only filesystem, but we might in the future.
For this reason, the base template for a new test runner/analyzer/representer starts out with a read-only filesystem.
If you can't get things working on a read-only file, feel free to (for now) assume a writeable file system.
```
15 changes: 0 additions & 15 deletions building/tooling/docker.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,21 +50,6 @@ Different languages perform better/worse with different configurations (e.g. Rub
You can experiment locally by using the `--network` flag when running your docker. `--network none` is supported by default.
To use the internal network, first run `docker network create --internal internal` to create the network, then use `--network internal` when running the container.

### Read-only filesystem

We encourage Docker files to be written using a read-only filesystem.
The only directories you should assume to be writeable are:

- The solution dir (passed in as the second argument)
- The output dir (passed in as the third argument)
- The `/tmp` dir

```exercism/caution
Our production environment currently does _not_ enforce a read-only filesystem, but we might in the future.
For this reason, the base template for a new test runner/analyzer/representer starts out with a read-only filesystem.
If you can't get things working on a read-only file, feel free to (for now) assume a writeable file system.
```

### Memory

Languages can set the maximum memory they need to use to run their jobs. Setting this to be as low as possible means that we can run more jobs more quickly in parallel. It also means that people who try and abuse memory will not be able to succeed. Different languages need wildly different maximum memory usage. Benchmarking the execution of a docker run to establish the maximum memory it uses is advised and appreciated.
Expand Down
4 changes: 2 additions & 2 deletions building/tooling/representers/creating-from-scratch.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Firstly, thank you for your interest in creating a Representer!

These are the steps to get going:

1. Check [our repository list for an existing `...-representer`](https://github.com/exercism?q=-representer) to ensure that one doesn't already exist.
1. Check [our repository list for an existing `...-representer`](https://github.com/search?q=org%3Aexercism+representer&type=repositories) to ensure that one doesn't already exist.
2. Scan the [contents of this directory](/docs/building/tooling/representers) to ensure you are comfortable with the idea of creating an Representer.
3. Open an issue at [exercism/exercism][exercism-repo] introducing yourself and telling us which language you'd like to create a Representer for.
3. Start a new topic on [the Exercism forum][building-exercism] telling us which language you'd like to create a Test Runner for.
4. Once a Representer repo has been created, use [the Representer interface document](/docs/building/tooling/representers/interface) to help guide your implementation.

We have an incredibly friendly and supportive community who will be happy to help you as you work through this! If you get stuck, please start a new topic on [the Exercism forum][building-exercism] or create new issues at [exercism/exercism][exercism-repo] as needed 🙂
Expand Down
2 changes: 1 addition & 1 deletion building/tooling/test-runners/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ Test Runners give us two advantages:
Each language has its own Test Runner, written in that language.
The website acts as the orchestrator between the Test Runners and students' submissions.

Each Test Runner lives in the Exercism GitHub organization in a repository named `$LANG-test-runner` (e.g. `ruby-test-runner`).
Each Test Runner lives in the Exercism GitHub organization in a repository named `$LANG-test-runner` (e.g. [`exercism/ruby-test-runner`](https://github.com/exercism/ruby-test-runner)).
You can explore the different Test Runners [here](https://github.com/exercism?q=-test-runner).

If you would like to get involved in helping with an existing Test Runner, please open an issue in its repository asking if there is somewhere you can help.
Expand Down
4 changes: 2 additions & 2 deletions building/tooling/test-runners/creating-from-scratch.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ Firstly, thank you for your interest in creating a Test Runner!

These are the steps to get going:

1. Check [our repository list for an existing `...-test-runner`](https://github.com/exercism?q=-test-runner) to ensure that one doesn't already exist.
1. Check [our repository list for an existing `...-test-runner`](https://github.com/search?q=org%3Aexercism+test-runner&type=repositories) to ensure that one doesn't already exist.
2. Scan the [contents of this directory](/docs/building/tooling/test-runners) to ensure you are comfortable with the idea of creating an Test Runner.
3. Open an issue at [exercism/exercism][exercism-repo] introducing yourself and telling us which language you'd like to create a Test Runner for.
3. Start a new topic on [the Exercism forum][building-exercism] telling us which language you'd like to create a Test Runner for.
4. Once a Test Runner repo has been created, use [the Test Runner interface document](/docs/building/tooling/test-runners/interface) to help guide your implementation. There is a [generic test runner repository template](https://github.com/exercism/generic-test-runner/) that you can use to kick-start development.

We have an incredibly friendly and supportive community who will be happy to help you as you work through this! If you get stuck, please start a new topic on [the Exercism forum][building-exercism] or create new issues at [exercism/exercism][exercism-repo] as needed 🙂
Expand Down

0 comments on commit dd658c3

Please sign in to comment.