From 4fa1aca99368c4697fe12a051ed9175f5b5ba579 Mon Sep 17 00:00:00 2001 From: clux Date: Thu, 5 Oct 2023 23:59:09 +0100 Subject: [PATCH] less dense prose for reconciler Signed-off-by: clux --- docs/controllers/reconciler.md | 26 +++++++++----------------- includes/links.md | 1 + 2 files changed, 10 insertions(+), 17 deletions(-) diff --git a/docs/controllers/reconciler.md b/docs/controllers/reconciler.md index 06c7ad5..07beea6 100644 --- a/docs/controllers/reconciler.md +++ b/docs/controllers/reconciler.md @@ -64,20 +64,18 @@ Typically, this is accomplished with a call to [Controller::owns] on any owned [ ## Reasons for reconciliation -Notice that the **reason** for why the reconciliation started is **not included** in the signature of `reconcile`; **you only get the object**. The reason for this omission is **fault-tolerance**: +Notice that the [ReconcileReason] is **not included** in the signature of `reconcile`; **you only get the object**. The reason for this omission is to encourage **fault-tolerance**: !!! note "Fault-tolerance against missed messages" - If your controller is down / crashed earlier, you **might have missed messages**. No matter how well you guard against downtime (e.g with multiple replicas, rolling upgrades, pdbs, leases), the Kubernetes watch api is **not sufficiently safe** to guarantee unmissed messages. + If your controller has downtime you can miss messages. Additionally, the Kubernetes watch api does not guarantee delivery. You will likely miss some messages, and when this happens, Kubernetes will coalesce the ones you have not received into one single message. -It is unsafe to give you a reason for why you got a `reconcile` call, because it is sometimes impossible to know. +As a result, the reason is just a __best-effort property__ (that we only include for telemetry). You should not attempt to write logic against the reason in your reconciler. -We therefore **omit** this information, so you that you write a more **defensive reconciler**. - -You should: +Instead, you must write a **defensive reconciler** and handle missed messages, and paritial runs: - assume nothing about why reconciliation started -- assume the reconciler could have failed at any point during the function +- assume the reconciler could have failed at any question mark - check every property independently (you always start at the beginning) The type of **defensive** function writing described above is intended to grant a formal property called **idempotency**. @@ -95,7 +93,7 @@ Let us create a reconciler for a custom `PodManager` resource that will: - create an associated `Pod` with [ownerReferences] - note its `creation` time on the status object of `PodManager` -Both of these operations can be done in isolation in an idempotent manner (we will show below), but it is possible to compose the two operations in an erroneous way. +Both of these operations are idempotent by themselves, but you must be careful with how you combine them. ### Combining Idempotent Operations @@ -108,9 +106,7 @@ if pod_missing { } ``` -Is this a good optimization? Can we avoid calling set timestamps over and over again? - -Unfortunately, this reasoning is **flawed**; if the timestamp creation fails after the pod got created the first time, the second action will **never** get done! +Now, what happens if the timestamp creation fails after the pod got created? The second action will **never** get done! !!! warning "Reconciler interruptions" @@ -153,8 +149,6 @@ fn create_owned_pod(source: &PodManager) -> Pod { } ``` - - one approach of achieving idempotency is to check every property carefully:: ```rust @@ -241,13 +235,11 @@ async fn reconcile(object: Arc, ctx: Arc) -> Result