Skip to content

Commit

Permalink
Preparing v1.5 release
Browse files Browse the repository at this point in the history
  • Loading branch information
ricsanfre committed Oct 12, 2022
1 parent ad25d56 commit f856835
Show file tree
Hide file tree
Showing 3 changed files with 232 additions and 2 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ The following picture shows the set of opensource solutions used so far in the c

## Cluster architecture and hardware

Home lab architecture, showed in the picture below, consist of a Kubernetes cluster of 4 nodes (1 master and 3 workers) and a firewall, built with another Raspberry PI, to isolate cluster network from your home network.
Home lab architecture, showed in the picture below, consist of a Kubernetes cluster of 5 nodes (1 master and 4 workers) and a firewall, built with another Raspberry PI, to isolate cluster network from your home network.


<p align="center">
Expand Down
2 changes: 1 addition & 1 deletion docs/_docs/certmanager.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ CertManager is configured to deploy in the cluster a private PKI (Public Key Inf

Such private PKI will be used internally by Linkerd to issue certiticates to each POD to implement mTLS communictions.

CertManager also is configured to deliver valid certificates, using your own DNS domain, through its integration with Let's Encrypt using ACM DNS challenges. Configuration is provided for using IONOS DNS provider, using developer API available to automate challenge resolution. Similar configuration can be implemented for other supported DNS providers.
CertManager also is configured to deliver valid certificates, using your own DNS domain, through its integration with Let's Encrypt using ACME DNS challenges. Configuration is provided for using IONOS DNS provider, using developer API available to automate challenge resolution. Similar configuration can be implemented for other supported DNS providers.

Valid certificates signed by Letscript will be used for cluster exposed services.

Expand Down
230 changes: 230 additions & 0 deletions docs/_posts/2022-10-12-announcing-relase-1.5.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,230 @@
---
layout: post
title: Kubernetes Pi Cluster relase v1.5
date: 2022-10-12
author: ricsanfre
---

Today I am pleased to announce the fifth release of Kubernetes Pi Cluster project (v1.5).

Main features/enhancements of this release are:


## Let's Encrypt certificates integration

Adding Let's Encrypt integration in CertManager to generate automatically valid TLS certificates.

CertManager is configured to deliver valid certificates through its integration with Let's Encrypt using ACME DNS challenges. ACME HTTPS challenge, also supported by CertManager-LetsEncrypt, is not configured since it requires to expose the cluster services to the public internet.

Configuration is provided for using IONOS DNS provider, using developer API available to automate challenge resolution and [IONOS cert-manager webhook](https://github.com/fabmade/cert-manager-webhook-ionos).

Similar configuration can be implemented for other supported DNS providers. See supported list and further documentation in [Certmanager documentation: "ACME DNS01" ](https://cert-manager.io/docs/configuration/acme/dns01/).

Valid certificates signed by Letscript are used for cluster exposed services. For internal services, like Linkerd, self-signed certificates are used.

[Cerbot](https://certbot.eff.org/) and [certbot-dns-ionos plugin](https://github.com/helgeerbe/certbot-dns-ionos) installation details are also provided to generate Let's Encrypt certificates outside the cluster, using the same ACME DNS challenge.


## Adding CSI Snapshot support

Enabling within K3S cluster the new Kubernetes CSI feature: [Volume Snapshots](https://kubernetes.io/docs/concepts/storage/volume-snapshots/) to be able to programmatically create backups and so orchestrate consistent backups within Velero

CSI Snapshot feature is supported by Longhorn and Velero. See Longhorn documentation: [CSI Snapshot Support](https://longhorn.io/docs/1.2.2/snapshots-and-backups/csi-snapshot-support/create-a-backup-via-csi/) and [Velero CSI Snapshots documentation](https://velero.io/docs/v1.9/csi/).

K3S currently does not come with a preintegrated Snapshot Controller, needed to enable CSI Snapshot functionallity. An [external snapshot controller](https://github.com/kubernetes-csi/external-snapshotter) has been deployed.

## Prometheus memory footprint optimization

Memory footprint reduction is achieved by removing all metrics duplicates from K3S monitoring. See details in [issue #67](https://github.com/ricsanfre/pi-cluster/issues/67)

Before the optimization, K3S duplicates came from monitoring kube-proxy, kubelet and apiserver components. kube-controller-manager and kube-scheduler monitoring was already removed in the past. See [issue #22](https://github.com/ricsanfre/pi-cluster/issues/22)

**Before removing K3S duplicates**:

| Active Series | Memory Usage |
|:---:|:---:|
| ![Prometheus_Active_series_before](https://user-images.githubusercontent.com/84853324/187235196-15aa874d-7ffe-434e-b14a-1c2a41364b79.png) | ![Prometheus_memory_before](https://user-images.githubusercontent.com/84853324/187235370-75064b56-ce58-4f4a-92a1-5d52d429d58c.png) |


Number of active time series: 157k

Memory usage: 1GB

**After removing duplicates**

| Active Series | Memory Usage |
|:---:|:---:|
![Prometheus_Active_series_after](https://user-images.githubusercontent.com/84853324/187251837-6b49bc30-29a3-436f-9627-a86ecbb48f37.png) | ![Prometheus_memory_after](https://user-images.githubusercontent.com/84853324/187251961-7eae10e5-bc04-4375-94da-49680654e4c9.png) |

Number of active time series: 73k

Memory usage: 550 MB

Number of active time series has been reduced from 150k to 73k ( 50% reduction) and memory consumption has be reduced from 1GB to 550 MB (50% reduction)


## Upgrade Linkerd to version 2.12

Upgrade Linkerd to the latest stable version, 2.12, released in Aug. See this [linkerd announcement](https://buoyant.io/blog/announcing-linkerd-2-12).

New features of release 2.12:
- Per-route polices
- [Kubernetes Gateway API](https://gateway-api.sigs.k8s.io/) support
- Access logging

Installation procedure in this release is completely different to previous releases.


## Ansible Playbooks Improvements

### Encrypt passwords and keys used in playbooks with Ansible Vault

Encrypt all passwords/keys that previously were stored in plain-text within ansible variables. [Ansible Vault](https://docs.ansible.com/ansible/latest/user_guide/vault.html) is used.


Solution implemented:

- Include all secrets, keys in a specific var yaml file: `vautl.yml` located in `vars` directory.

```yml
---
# Encrypted variables - Ansible Vault
vault:
# SAN
san:
iscsi:
node_pass: s1cret0
password_mutual: 0tr0s1cret0
# K3s secrets
k3s:
k3s_token: s1cret0
# traefik secrets
traefik:
basic_auth_passwd: s1cret0
# Minio S3 secrets
minio:
root_password: supers1cret0
longhorn_key: supers1cret0
velero_key: supers1cret0
restic_key: supers1cret0
# elastic search
elasticsearch:
admin_password: s1cret0
# Fluentd
fluentd:
shared_key: s1cret0
# Grafana
grafana:
admin_password: s1cret0
```
- Encrypt the file with Ansible vault
```shell
ansible-vault encrypt vault.yml
```

Provide ansible vault password to encrypt the file.

The file can be decrypted using the following command

```shell
ansible-vault decrypt vault.yml
```

- Reference the vault variables in playbooks, group_vars, etc.

For example in: k3s_cluster group variables.

```yml
# k3s shared token
k3s_token: "{{ vault.k3s.k3s_token }}"
```

All referenced variables that are encrypted by ansible vault belong to `vault` yaml dictionary, so they can be clearly identified and their values located in `vault.yml` file.

- Include task to load vault variables file in each playbook's pre-task section:

```yml
- name: my_playbook
hosts: my_server
pre_tasks:
- name: Include vault variables
include_vars: "vars/vault.yml"
tags: ["always"]
roles:
....
```

- Execute ansible playbooks with `--ask-vault-pass` argument, so the password used to encrypt vault file can be provided when starting the playbook.

```shell
ansible-playbook my-playbook.yml --ask-vault-pass
```

### Automatic provision of Prometheus Rules from yaml files

Automation of creation of `PrometheusRule` resources, used by PrometheusOperator, to configure Prometheus rules. Individual rules, defined as yaml files.

Functionality for automatically provision Grafana Dashboards, json files, located within a directory (`dashboards`) has been replicated. Prometheus rules, in yaml format, located in `rules` directory will be used to create `PrometheusRule` objects.

## Upgrade software components to latest stable version


| Type | Software | Latest Version tested | Notes |
|-----------| ------- |-------|----|
| OS | Ubuntu | 20.04.3 | OS need to be tweaked for Raspberry PI when booting from external USB |
| Control | Ansible | 2.12.1 | |
| Control | cloud-init | 21.4 | version pre-integrated into Ubuntu 20.04 |
| Kubernetes | K3S | v1.24.6 | K3S version|
| Kubernetes | Helm | v3.6.3 ||
| Metrics | Kubernetes Metrics Server | v0.5.2 | version pre-integrated into K3S |
| Computing | containerd | v1.6.8-k3s1 | version pre-integrated into K3S |
| Networking | Flannel | v0.19.2 | version pre-integrated into K3S |
| Networking | CoreDNS | v1.9.1 | version pre-integrated into K3S |
| Networking | Metal LB | v0.13.5 | Helm chart version: metallb-0.13.5 |
| Service Mesh | Linkerd | v2.12.1 | Helm chart version: linkerd-control-plane-1.9.3 |
| Service Proxy | Traefik | v2.9.1 | Helm chart: traefik-13.0.0 |
| Storage | Longhorn | v1.3.1 | Helm chart version: longhorn-1.3.1 |
| SSL Certificates | Certmanager | v1.9.1 | Helm chart version: cert-manager-v1.9.1 |
| Logging | ECK Operator | 2.4.0 | Helm chart version: eck-operator-2.4.0 |
| Logging | Elastic Search | 8.1.2 | Deployed with ECK Operator |
| Logging | Kibana | 8.1.2 | Deployed with ECK Operator |
| Logging | Fluentbit | 1.9.9 | Helm chart version: fluent-bit-0.20.9 |
| Logging | Fluentd | 1.15.2 | Helm chart version: 0.3.9. [Custom docker image](https://github.com/ricsanfre/fluentd-aggregator) from official v1.15.2|
| Monitoring | Kube Prometheus Stack | 0.60.1 | Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | Prometheus Operator | 0.59.2 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | Prometheus | 2.39 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | AlertManager | 0.24 | Installed by Kube Prometheus Stack. Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | Grafana | 9.1.7 | Helm chart version grafana-6.32.10. Installed as dependency of Kube Prometheus Stack chart. Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | Prometheus Node Exporter | 1.3.1 | Helm chart version: prometheus-node-exporter-3.3.1. Installed as dependency of Kube Prometheus Stack chart. Helm chart version: kube-prometheus-stack-41.0.0 |
| Monitoring | Prometheus Elasticsearch Exporter | 1.5.0 | Helm chart version: prometheus-elasticsearch-exporter-4.15.0 |
| Backup | Minio | RELEASE.2022-09-22T18-57-27Z | |
| Backup | Restic | 0.12.1 | |
| Backup | Velero | 1.9.2 | Helm chart version: velero-2.31.9 |
{: .table }


## Release v1.5.0 Notes

Upgrade backup service adding Kubernetes CSI Snapshot feature, Prometheus memory optimization removing K3S duplicate metrics, enabling Let's Encrypt TLS certificates, and upgrading Linkerd to release 2.12.

### Release Scope:

- Use of Let's Encrypt TLS certificates
- Certmanager configuration of Let's Encrypt support. ACME DNS01 challenge provider
- Certbot deployment
- IONOS DNS provider integration
- Upgrade backup service adding CSI Snapshot support
- Enable Kubernetes CSI Snapshot feature, installing external snapshot controller.
- Configure Longhorn CSI Snapshots support
- Configure Velero CSI Snapshot support
- Prometheus memory footprint optimization
- Removing of duplicate metrics coming from K3S endpoints.
- Upgrade Linkerd to version 2.12
- Ansible Playbooks improvements
- Encrypt passwords and keys used in playbooks with Ansible Vault
- Automatic provsion of Prometheus Rules from yaml files.



0 comments on commit f856835

Please sign in to comment.