Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/dynamic upstream config for non resolveable hosts #103

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -63,6 +63,8 @@ EXPOSE 8082
ENV REGISTRIES="k8s.gcr.io gcr.io quay.io"
# A space delimited list of registry:user:password to inject authentication for
ENV AUTH_REGISTRIES="some.authenticated.registry:oneuser:onepassword another.registry:user:password"
# A space delimited list of domain=IP1,IP2 to inject upstream -> server mapping for
ENV UPSTREAM_MAPPINGS=""
# Should we verify upstream's certificates? Default to true.
ENV VERIFY_SSL="true"
# Enable debugging mode; this inserts mitmproxy/mitmweb between the CONNECT proxy and the caching layer
Expand Down
38 changes: 20 additions & 18 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,17 +10,17 @@ Caches the potentially huge blob/layer requests (for bandwidth/time savings), an

### NEW: avoiding DockerHub Pull Rate Limits with Caching

Starting November 2nd, 2020, DockerHub will
[supposedly](https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates/)
Starting November 2nd, 2020, DockerHub will
[supposedly](https://www.docker.com/blog/docker-hub-image-retention-policy-delayed-and-subscription-updates/)
[start](https://www.docker.com/blog/scaling-docker-to-serve-millions-more-developers-network-egress/)
[rate-limiting pulls](https://docs.docker.com/docker-hub/download-rate-limit/),
also known as the _Docker Apocalypse_.
[rate-limiting pulls](https://docs.docker.com/docker-hub/download-rate-limit/),
also known as the _Docker Apocalypse_.
The main symptom is `Error response from daemon: toomanyrequests: Too Many Requests. Please see https://docs.docker.com/docker-hub/download-rate-limit/` during pulls.
Many unknowing Kubernetes clusters will hit the limit, and struggle to configure `imagePullSecrets` and `imagePullPolicy`.

Since version `0.6.0`, this proxy can be configured with the env var `ENABLE_MANIFEST_CACHE=true` which provides
Since version `0.6.0`, this proxy can be configured with the env var `ENABLE_MANIFEST_CACHE=true` which provides
configurable caching of the manifest requests that DockerHub throttles. You can then fine-tune other parameters to your needs.
Together with the possibility to centrally inject authentication (since 0.3x), this is probably one of the best ways to bring relief to your distressed cluster, while at the same time saving lots of bandwidth and time.
Together with the possibility to centrally inject authentication (since 0.3x), this is probably one of the best ways to bring relief to your distressed cluster, while at the same time saving lots of bandwidth and time.

Note: enabling manifest caching, in its default config, effectively makes some tags **immutable**. Use with care. The configuration ENVs are explained in the [Dockerfile](./Dockerfile), relevant parts included below.

Expand Down Expand Up @@ -51,13 +51,13 @@ ENV MANIFEST_CACHE_DEFAULT_TIME="1h"

## What?

Essentially, it's a [man in the middle](https://en.wikipedia.org/wiki/Man-in-the-middle_attack): an intercepting proxy based on `nginx`, to which all docker traffic is directed using the `HTTPS_PROXY` mechanism and injected CA root certificates.
Essentially, it's a [man in the middle](https://en.wikipedia.org/wiki/Man-in-the-middle_attack): an intercepting proxy based on `nginx`, to which all docker traffic is directed using the `HTTPS_PROXY` mechanism and injected CA root certificates.

The main feature is Docker layer/image caching, including layers served from S3, Google Storage, etc.
The main feature is Docker layer/image caching, including layers served from S3, Google Storage, etc.

As a bonus it allows for centralized management of Docker registry credentials, which can in itself be the main feature, eg in Kubernetes environments.

You configure the Docker clients (_err... Kubernetes Nodes?_) once, and then all configuration is done on the proxy --
You configure the Docker clients (_err... Kubernetes Nodes?_) once, and then all configuration is done on the proxy --
for this to work it requires inserting a root CA certificate into system trusted root certs.

## master/:latest is unstable/beta
Expand Down Expand Up @@ -91,6 +91,8 @@ for this to work it requires inserting a root CA certificate into system trusted
If you have trouble pushing, set this to `false` first, then fix remainig timeouts.
Default is `true` to not change default behavior.
ENV PROXY_REQUEST_BUFFERING="true"
- Env `UPSTREAM_MAPPINGS` to configure upstream server mappings (similar in functionality to /etc/hosts entries but with round-robin selection).
Useful when configured resolvers are unable to resolve a host. e.g. `UPSTREAM_MAPPINGS="registry1=10.0.1.10:443,10.0.1.11 registry2=5.0.1.10"`
- Timeouts ENVS - all of them can pe specified to control different timeouts, and if not set, the defaults will be the ones from `Dockerfile`. The directives will be added into `http` block.:
- SEND_TIMEOUT : see [send_timeout](http://nginx.org/en/docs/http/ngx_http_core_module.html#send_timeout)
- CLIENT_BODY_TIMEOUT : see [client_body_timeout](http://nginx.org/en/docs/http/ngx_http_core_module.html#client_body_timeout)
Expand Down Expand Up @@ -159,10 +161,10 @@ docker run --rm --name docker_registry_proxy -it \

### Google Container Registry (GCR) auth

For Google Container Registry (GCR), username should be `_json_key` and the password should be the contents of the service account JSON.
Check out [GCR docs](https://cloud.google.com/container-registry/docs/advanced-authentication#json_key_file).
For Google Container Registry (GCR), username should be `_json_key` and the password should be the contents of the service account JSON.
Check out [GCR docs](https://cloud.google.com/container-registry/docs/advanced-authentication#json_key_file).

The service account key is in JSON format, it contains spaces ("` `") and colons ("`:`").
The service account key is in JSON format, it contains spaces ("` `") and colons ("`:`").

To be able to use GCR you should set `AUTH_REGISTRIES_DELIMITER` to something different than space (e.g. `AUTH_REGISTRIES_DELIMITER=";;;"`) and `AUTH_REGISTRY_DELIMITER` to something different than a single colon (e.g. `AUTH_REGISTRY_DELIMITER=":::"`).

Expand Down Expand Up @@ -324,7 +326,7 @@ Since `0.4` there is a separate `-debug` version of the image, which includes `n
This allows very in-depth debugging. Use sparingly, and definitely not in production.

```bash
docker run --rm --name docker_registry_proxy -it
docker run --rm --name docker_registry_proxy -it
-e DEBUG_NGINX=true -e DEBUG=true -e DEBUG_HUB=true -p 0.0.0.0:8081:8081 -p 0.0.0.0:8082:8082 \
-p 0.0.0.0:3128:3128 -e ENABLE_MANIFEST_CACHE=true \
-v $(pwd)/docker_mirror_cache:/docker_mirror_cache \
Expand All @@ -347,15 +349,15 @@ docker run --rm --name docker_registry_proxy -it

### Why not use Docker's own registry, which has a mirror feature?

Yes, Docker offers [Registry as a pull through cache](https://docs.docker.com/registry/recipes/mirror/), *unfortunately*
Yes, Docker offers [Registry as a pull through cache](https://docs.docker.com/registry/recipes/mirror/), *unfortunately*
it only covers the DockerHub case. It won't cache images from `quay.io`, `k8s.gcr.io`, `gcr.io`, or any such, including any private registries.

That means that your shiny new Kubernetes cluster is now a bandwidth hog, since every image will be pulled from the
That means that your shiny new Kubernetes cluster is now a bandwidth hog, since every image will be pulled from the
Internet on every Node it runs on, with no reuse.

This is due to the way the Docker "client" implements `--registry-mirror`, it only ever contacts mirrors for images
This is due to the way the Docker "client" implements `--registry-mirror`, it only ever contacts mirrors for images
with no repository reference (eg, from DockerHub).
When a repository is specified `dockerd` goes directly there, via HTTPS (and also via HTTP if included in a
When a repository is specified `dockerd` goes directly there, via HTTPS (and also via HTTP if included in a
`--insecure-registry` list), thus completely ignoring the configured mirror.

### Docker itself should provide this.
Expand All @@ -365,7 +367,7 @@ Yeah. Docker Inc should do it. So should NPM, Inc. Wonder why they don't. 😼
### TODO:

- [x] Basic Docker-for-Mac set-up instructions
- [x] Basic Docker-for-Windows set-up instructions.
- [x] Basic Docker-for-Windows set-up instructions.
- [ ] Test and make auth work with quay.io, unfortunately I don't have access to it (_hint, hint, quay_)
- [x] Hide the mitmproxy building code under a Docker build ARG.
- [ ] "Developer Office" proxy scenario, where many developers on a fast LAN share a proxy for bandwidth and speed savings (already works for pulls, but messes up pushes, which developers tend to use a lot)
24 changes: 24 additions & 0 deletions entrypoint.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,30 @@ else
echo "Not using resolver config, keep existing '$confpath' -- mounted by user?"
fi

# Generate nginx upstream blocks into file. Function similar to a /etc/hosts file but includes round-robin selection
# e.g when UPSTREAM_MAPPINGS="registry1=10.0.1.10:443,10.0.1.11 registry2=5.0.1.10", the following file is generated
# upstream registry1 {
# server 10.0.1.10:443;
# server 10.0.1.11;
# }
# upstream registry2 {
# server 5.0.1.10;
# }
echo -n "" >> /etc/nginx/upstreams.conf

if [ ! -z "$UPSTREAM_MAPPINGS" ]; then

for UPSTREAM in ${UPSTREAM_MAPPINGS}; do
echo "upstream ${UPSTREAM%=*} {" >> /etc/nginx/upstreams.conf
comma_separated_hosts="${UPSTREAM#*=}"
hosts=`echo $comma_separated_hosts | tr ',' ' '`
for host in ${hosts}; do
echo -e "\tserver $host;" >> /etc/nginx/upstreams.conf
done
echo "}" >> /etc/nginx/upstreams.conf
done
fi

# The list of SAN (Subject Alternative Names) for which we will create a TLS certificate.
ALLDOMAINS=""

Expand Down
3 changes: 3 additions & 0 deletions nginx.conf
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,9 @@ http {

gzip off;

# Entrypoint generates the upstreams.conf config.
include /etc/nginx/upstreams.conf;

# Entrypoint generates the proxy_cache_path here, so it is configurable externally.
include /etc/nginx/conf.d/cache_max_size.conf;

Expand Down