Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(operator): Fix structured logs common fields #11700

Closed
wants to merge 6 commits into from

Conversation

periklis
Copy link
Collaborator

@periklis periklis commented Jan 17, 2024

What this PR does / why we need it:
The Loki Operator is using the logr.Logger interface extensively to emit structured logs per controller and handler. In addition the controller-runtime is leveraging the Builder factory to link reconciliation errors (i.e. any kind of error that is bubbled up and returned by the controllers' Reconcile() method) to the managing controller. However the present implementation makes it hard to link top-bottom logs for a specific combo of a controller, a handler and a LokiStack name due to the following issues:

  • The controller-runtime constructs a custom logger with fields extracted from Builder.For() and Builder.Named methods. This approach works when linking reconciler errors to the orginating controller/LokiStack if and only if the controller is built with Builder.For(lokiv1.LokiStack{} as the source type. However this is not the case for controllers like the AlertingRuleReconciler, RecordingRuleReconciler and more critical for the CertRotationReconciler rendering searching for errors per LokiStack impossible.
  • Although a logger sink per controller is provided in main.go the resulting logger instance (used for info/warning logging/non-reconciliation-error logging) needs to be manually augmented with similar fields (controller, lokistack.name, lokistack.namespace) per controller, handler and incoming request. This manual augmentation happens via logr.Logger.WithValues() and has been not consistent.

With the following PR the implementation is streamlined to:

  1. Use the same set of fields to identify the controller, the handler (called event), the lokistack (name and namespace) whether the structured log emitted is a reconciler error from controller-runtime or an info/warning log from the Loki Operator code base.
  2. Controllers without a Builder.For(lokiv1.LokiStack{}) setup will use consistently the same fields for lokistack (name and namespace) wherever applicable, i.e. AlertingRuleReconciler, RecordingRuleReconciler and RulerConfigReconciler are manually mimicking the same fields to link annotation error with the target LokiStack.

Examples (for the sake of simplicity modeled with jq):

  • Searching for a lokistack's events when rotate certificates:
    jq -c 'select(.lokistack.name=="lokistack-dev" and .event=="createOrRotateCertificates")'
  • Searching for all controller logs (info/error) for a handler:
    jq -c 'select(.controller=="lokistack" and .event=="createOrUpdateLokiStack")'
  • Search for all reconciler errors for a single lokistack for a controller:
    jq -c 'select(.controller=="certrotation" and ._message=="Reconciler error" and .lokistack.name=="lokistack-dev")'

Which issue(s) this PR fixes:
Fixes #

Special notes for your reviewer:
This PR leaves out the duplicates cleanup of usage references of kverrors.Wrap, log.Info and log.Error across the code base where the Lokistack identifiers (name, namespace) might be logger under an additional key (e.g. sometimes key, ref, name, etc.). This is left as future work in a separate PR. ✔️

Checklist

  • Reviewed the CONTRIBUTING.md guide (required)
  • Documentation added
  • Tests updated
  • CHANGELOG.md updated
    • If the change is worth mentioning in the release notes, add add-to-release-notes label
  • Changes that require user attention or interaction to upgrade are documented in docs/sources/setup/upgrade/_index.md
  • For Helm chart changes bump the Helm chart version in production/helm/loki/Chart.yaml and update production/helm/loki/CHANGELOG.md and production/helm/loki/README.md. Example PR
  • If the change is deprecating or removing a configuration option, update the deprecated-config.yaml and deleted-config.yaml files respectively in the tools/deprecated-config-checker directory. Example PR

@periklis periklis self-assigned this Jan 17, 2024
@periklis periklis force-pushed the operator-structured-logs branch 2 times, most recently from ab5f494 to 1c86d62 Compare January 17, 2024 20:31
@periklis periklis marked this pull request as ready for review January 17, 2024 21:13
@periklis periklis requested review from xperimental and a team as code owners January 17, 2024 21:13
@periklis periklis force-pushed the operator-structured-logs branch 5 times, most recently from 1d00997 to 9c236f8 Compare January 18, 2024 09:49
operator/main.go Outdated Show resolved Hide resolved
operator/controllers/loki/controllers.go Show resolved Hide resolved
operator/internal/handlers/dashboards_delete.go Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Food for thought, IIUC our logs will benefit from these event values, but if an error occurs in a handler they will not have event values. Should we add event values when we throw an error?

@periklis periklis force-pushed the operator-structured-logs branch 2 times, most recently from efac682 to 9b460da Compare March 5, 2024 13:48
@periklis periklis changed the title operator: Fix structured logs common fields fix(operator): Fix structured logs common fields Mar 5, 2024
Copy link
Collaborator

@JoaoBraveCoding JoaoBraveCoding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Comment on lines +13 to +19
AlertingRuleCtrlName = "alertingrule"
CertRotationCtrlName = "certrotation"
DashboardsCtrlName = "dashboard"
LokiStackCtrlName = "lokistack"
LokiStackZoneLabelsCtrlName = "lokistack-zone-labels"
RecordingRuleCtrlName = "recordingrule"
RulerConfigCtrlName = "rulerconfig"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually use prefixes to group similar constants together. Does this make sense here as well? (calling them ControllerNameAlertingRule, etc., for example)

RulerConfigCtrlName = "rulerconfig"
)

type LogContructorType func(request *reconcile.Request) logr.Logger
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small typo and suggestion for renaming (don't know if we did a similar thing already):

Suggested change
type LogContructorType func(request *reconcile.Request) logr.Logger
type LogConstructorFunc func(request *reconcile.Request) logr.Logger

Also this type does not need to be public, as far as I can see...

}
}

func logWithLokiStackRef(log logr.Logger, req ctrl.Request, name string) logr.Logger {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why we need this, when we have the lokiStackLogConstructor above which should already produce a lokistack field. Do we maybe just need to get our logger from the Context (there's a helper ctrl.LoggerFrom(ctx))?

l = l.WithValues(
"controllerGroup", lokiv1.GroupVersion.Group,
"controllerKind", "LokiStack",
"lokistack", klog.KRef(req.Namespace, req.Name),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we can avoid the extra dependency to klog just for getting an ObjectRef to produce a string representation of namespace/name.

--edit: Oh, wait it's not producing namespace/name but an object with namespace and name attributes in it. Is that what we want?

@@ -55,27 +57,26 @@ func (r *CertRotationReconciler) Reconcile(ctx context.Context, req ctrl.Request
}

checkExpiryAfter := expiryRetryAfter(rt.TargetCertRefresh)
r.Log.Info("Checking if LokiStack certificates expired", "name", req.String(), "interval", checkExpiryAfter.String())
ll.Info("checking if certificates expired", "interval", checkExpiryAfter.String())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recently wondered if we should remove a few of these outputs that happen every reconciliation and only print output if there's anything "noteworthy". Don't know if this should also happen in this PR though.

@periklis
Copy link
Collaborator Author

Closing for a later stage to review this

@periklis periklis closed this Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants