Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flow: separate parent component and controller IDs into new labels #6786

Merged
merged 8 commits into from
Apr 3, 2024

Conversation

tpaschalis
Copy link
Member

@tpaschalis tpaschalis commented Mar 28, 2024

PR Description

This PR amends the registerers and loggers we propagate into the graph to separate the so that the parent component/controller IDs are stored into a separate label.

This makes these metrics easier to read and manage and unlocks other solutions for large names in the case of nested modules in the future.

NOTE: Please review this PR commit-by-commit and look at the extended commit messages. Each contains a snippet of how this change affects the Agent's logs and metrics

The metrics and logs examples are validated using the following example that uses nested modules.

--- entrypoint.river ---
logging {
	format = "json"
}

import.file "a_namespace" {
	filename = "a.river"
}

local.file "default" {
	filename = "entrypoint.river"
}

a_namespace.a "default" {
	a_argument = 47
}

--- a.river ---
declare "a" {
	import.file "b_namespace" {
		filename = "b.river"
	}

	local.file "default" {
		filename = "entrypoint.river"
	}

	argument "a_argument" {
		comment = "Where to send collected metrics."
	}

	prometheus.exporter.self "default" { }

	b_namespace.b "default" {
		b_argument = 147
	}

	export "a_export" {
		value = argument.a_argument.value + b_namespace.b.default.b_export + 1
	}
}

--- b.river ---
declare "b" {
	import.file "c_namespace" {
		filename = "c.river"
	}

	local.file "default" {
		filename = "entrypoint.river"
	}

	argument "b_argument" {
		optional = false
	}

	prometheus.exporter.self "default" { }

	c_namespace.c "default" {
		c_argument = 101
	}

	export "b_export" {
		value = argument.b_argument.value + c_namespace.c.default.c_export + 1
	}
}

--- c.river ---
declare "c" {
	argument "c_argument" {
		optional = false
	}

	local.file "default" {
		filename = "entrypoint.river"
	}

	prometheus.exporter.self "default" { }

	export "c_export" {
		value = argument.c_argument.value + 1
	}
}

Which issue(s) this PR fixes

Closes #6260

Notes to the Reviewer

The tracer implementation still takes the full globals.ControllerID since all it needs is to call path.Ext and get the final component name as the dot-style extension.

One thing to agree upon is whether path.Split makes sense for separating the parent and final IDs, or if we should add a custom helper. Currently, global IDs are currently constructed like a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default, i.e. without a leading slash. So for the root controller, we're left with an empty label instead of "/" which was discussed at some point.

TODO:

  • Update mixin

PR Checklist

  • CHANGELOG.md updated (N/A)
  • Documentation added (N/A)
  • Tests updated (N/A)
  • Config converters updated (N/A)

This commit changes the wrapped register passed to components through
getManagedOptions to separate the parent controller ID from the
component ID.

This makes use of path.Split, which results in the following change in
metrics for the local.file component used in the example.

```diff
< agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default"} 1.7117163012858703e+09
< agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/b_namespace.b.default/local.file.default"} 1.711716301286142e+09
< agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/local.file.default"} 1.7117163012854662e+09
< agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default"} 1.711716301285211e+09
---
> agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path=""} 1.7117162894922016e+09
> agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/"} 1.7117162894936502e+09
> agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/b_namespace.b.default/"} 1.7117162894930658e+09
> agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/"} 1.711716289493361e
```

Signed-off-by: Paschalis Tsilias <[email protected]>
This commits changes the newControllerCollector and newControllerMetrics
helpers to split out a new "controller_path" label from the existing
controller_id one.

This also makes use of path.Split, which results in the following change
in metrics for the nested controllers used in the example.

```diff
< agent_component_controller_evaluating{controller_id=""} 0
< agent_component_controller_evaluating{controller_id="a_namespace.a.default"} 0
< agent_component_controller_evaluating{controller_id="a_namespace.a.default/b_namespace.b.default"} 0
< agent_component_controller_evaluating{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"} 0
< agent_component_controller_evaluating{controller_id="remotecfg"} 0
---
> agent_component_controller_evaluating{controller_id="",controller_path=""} 0
> agent_component_controller_evaluating{controller_id="a_namespace.a.default",controller_path=""} 0
> agent_component_controller_evaluating{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/"} 0
> agent_component_controller_evaluating{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/"} 0
> agent_component_controller_evaluating{controller_id="remotecfg",controller_path=""} 0

< agent_component_controller_running_components{controller_id="",health_type="healthy"} 3
< agent_component_controller_running_components{controller_id="a_namespace.a.default",health_type="healthy"} 4
< agent_component_controller_running_components{controller_id="a_namespace.a.default/b_namespace.b.default",health_type="healthy"} 4
< agent_component_controller_running_components{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default",health_type="healthy"} 2
---
> agent_component_controller_running_components{controller_id="",controller_path="",health_type="healthy"} 3
> agent_component_controller_running_components{controller_id="a_namespace.a.default",controller_path="",health_type="healthy"} 4
> agent_component_controller_running_components{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/",health_type="healthy"} 4
> agent_component_controller_running_components{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/",health_type="healthy"} 2

< agent_component_evaluation_queue_size{controller_id=""} 1
< agent_component_evaluation_queue_size{controller_id="a_namespace.a.default"} 2
< agent_component_evaluation_queue_size{controller_id="a_namespace.a.default/b_namespace.b.default"} 2
< agent_component_evaluation_queue_size{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"} 2
< agent_component_evaluation_queue_size{controller_id="remotecfg"} 0
---
> agent_component_evaluation_queue_size{controller_id="",controller_path=""} 1
> agent_component_evaluation_queue_size{controller_id="a_namespace.a.default",controller_path=""} 2
> agent_component_evaluation_queue_size{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/"} 2
> agent_component_evaluation_queue_size{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/"} 0
> agent_component_evaluation_queue_size{controller_id="remotecfg",controller_path=""} 0
```

Signed-off-by: Paschalis Tsilias <[email protected]>
This commit reuses the same parent, id values that we propagate to
newControllerCollector and newControllerMetrics as a separate key-value
logger field.

This means that log lines now look like this for a service on the root
controller as well as components in the nested controllers.

```diff
< {"ts":"2024-03-29T14:05:12.629581007Z",level:"info",msg:"finished node evaluation",controller_id:"",trace_id:"7b2e528779dfda54bc075e5d46fa3a19",node_id:"http",duration:6352}
< {"ts":"2024-03-29T14:05:12.631090297Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default",trace_id:"2b33effd0d74b6b74c2c55b0961aa281",node_id:"prometheus.exporter.self.default",duration:3737}
< {"ts":"2024-03-29T14:05:12.631227727Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default/b_namespace.b.default",trace_id:"1ddffae931697aedcef8fc5510664aee",node_id:"prometheus.exporter.self.default",duration:5240}
< {"ts":"2024-03-29T14:05:12.63149798Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default",trace_id:"cb4ee700e51886448d4628a6bd78cb2c",node_id:"prometheus.exporter.self.default",duration:5681}
---
> {"ts":"2024-03-29T14:05:01.362283161Z",level:"info",msg:"finished node evaluation",controller_path:"",controller_id:"",trace_id:"06ca5763a8ba421a17459675df5c8fc7",node_id:"http",duration:6212}
> {"ts":"2024-03-29T14:05:01.362659726Z",level:"info",msg:"finished node evaluation",controller_path:"",controller_id:"a_namespace.a.default",trace_id:"4ec9b04a0508d6d46c0c3ba1b7b3c642",node_id:"prometheus.exporter.self.default",duration:4278}
> {"ts":"2024-03-29T14:05:01.363112694Z",level:"info",msg:"finished node evaluation",controller_path:"a_namespace.a.default/",controller_id:"b_namespace.b.default",trace_id:"340525264c0db8529881fc5793cb1ed2",node_id:"prometheus.exporter.self.default",duration:3497}
> {"ts":"2024-03-29T14:05:01.363235206Z",level:"info",msg:"finished node evaluation",controller_path:"a_namespace.a.default/b_namespace.b.default/",controller_id:"c_namespace.c.default",trace_id:"a0608493d9888dbbc2ca22154c18328c",node_id:"prometheus.exporter.self.default",duration:3206}
```

Signed-off-by: Paschalis Tsilias <[email protected]>
@tpaschalis tpaschalis changed the title flow: separate parent controller ID into new label flow: separate parent component and controller IDs into new labels Mar 29, 2024
@tpaschalis tpaschalis marked this pull request as ready for review March 29, 2024 15:22
@tpaschalis tpaschalis requested review from rfratto, wildum and thampiotr and removed request for rfratto and wildum March 29, 2024 15:23
@@ -165,11 +165,13 @@ func NewBuiltinComponentNode(globals ComponentGlobals, reg component.Registratio

func getManagedOptions(globals ComponentGlobals, cn *BuiltinComponentNode) component.Options {
cn.registry = prometheus.NewRegistry()
parent, id := path.Split(cn.globalID)
Copy link
Member

@rfratto rfratto Mar 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that from the internal discussion about this, there was a mention to make component_path rooted to make it clearly indicate a global path:

Global IDs start with a path separator, similarly to absolute file paths, to clearly indicate rooting.

Additionally, if there is no parent, component_path should be /:

component_path: the global ID of the containing subgraph for a component. If there is no containing subgraph, component_path is the rooted path /.

In context:

For the global ID /custom_component.default/prometheus.remote_write, the component_path label is /custom_component.default and the component_id label is prometheus.remote_write.

(This rooting logic applies to the other metrics as well wherever the labels are being split)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the leading / too.

Registerer: prometheus.WrapRegistererWith(prometheus.Labels{
"component_id": cn.globalID,
"component_path": parent,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the mixin will need to be updated alongside this, since component_id is used in some dashboards (which is why we were doing this prior to 1.0 since it would be a breaking change to split out the labels after-the-fact).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pushed a commit to include references to component_path eg when summing up component-level metrics for the Controller and Prometheus dashboards.

Here's how it looks like in practice. I'll check the dashboards in more environments with nested modules and more data.
Screenshot 2024-04-02 171206

Finally, the OpenTelemetry dashboard doesn't drill down to the component level so it's left as-is for now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though as a nit I think component_path should before component in the list of variables.

This PR continues using path.Split to achieve two things.

First, simplify the log messages for built-in components by splitting
the generic "component" label into "component_id" and "component_path"
for built-in components as well as the generic "config" label for nested
controllers into "config_id" and "config_path".

Here's the difference in logs when the Agent is quitting for the example
configuration. These logs have been ordered and their timestamp has
been removed so that they're easier to parse.
```diff
< {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"}
< {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"}
< {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"}
< {"ts":"","level":"info","msg":"component exited","component_path":"","component_id":"local.file.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/","component_id":"local.file.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/","component_id":"prometheus.exporter.self.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"local.file.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"prometheus.exporter.self.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"local.file.default"}
< {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"prometheus.exporter.self.default"}
< {"ts":"","level":"info","msg":"import exited","config_path":"","config_id":"import.file.a_namespace"}
< {"ts":"","level":"info","msg":"import exited","config_path":"a_namespace.a.default/","config_id":"import.file.b_namespace"}
< {"ts":"","level":"info","msg":"import exited","config_path":"a_namespace.a.default/b_namespace.b.default/","config_id":"import.file.c_namespace"}
---
> {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/prometheus.exporter.self.default","err":"context canceled"}
> {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/prometheus.exporter.self.default","err":"context canceled"}
> {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/b_namespace.b.default/prometheus.exporter.self.default","err":"context canceled"}
> {"ts":"","level":"info","msg":"component exited","component":"local.file.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/local.file.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/prometheus.exporter.self.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/local.file.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/prometheus.exporter.self.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default"}
> {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/prometheus.exporter.self.default"}
> {"ts":"","level":"info","msg":"import exited","config":"import.file.a_namespace"}
> {"ts":"","level":"info","msg":"import exited","config":"a_namespace.a.default/import.file.b_namespace"}
> {"ts":"","level":"info","msg":"import exited","config":"a_namespace.a.default/b_namespace.b.default/import.file.c_namespace"}
```

Secondly, this also touches the registerer passed to the import.* blocks
for custom components. This registerer is not used anywhere yet so this
change isn't visible but I temporarily added a metric on import.file's
constructor to showcase the change below.

```diff
< import_file_custom_metric{config_id="import.file.a_namespace",config_path=""} 1.7117248839893064e+09
< import_file_custom_metric{config_id="import.file.b_namespace",config_path="a_namespace.a.default/"} 1.7117248839898353e+09
< import_file_custom_metric{config_id="import.file.c_namespace",config_path="a_namespace.a.default/b_namespace.b.default/"} 1.7117248839900513e+09
---
> import_file_custom_metric{config_id="import.file.a_namespace"} 1.7117249370875866e+09
> import_file_custom_metric{config_id="a_namespace.a.default/import.file.b_namespace"} 1.7117249370882547e+09
> import_file_custom_metric{config_id="a_namespace.a.default/b_namespace.b.default/import.file.c_namespace"} 1.7117249370885952e+09
```

Signed-off-by: Paschalis Tsilias <[email protected]>
This makes sure that a) paths are always rooted, and that b) paths have
no trailing slashes so that we're aligned with our internal discussion.

This means the metrics look like:

```
agent_component_evaluation_queue_size{controller_id="",controller_path="/"} 1
agent_component_evaluation_queue_size{controller_id="a_namespace.a.default",controller_path="/"} 2
agent_component_evaluation_queue_size{controller_id="b_namespace.b.default",controller_path="/a_namespace.a.default/"} 2
agent_component_evaluation_queue_size{controller_id="c_namespace.c.default",controller_path="/a_namespace.a.default/b_namespace.b.default/"} 0
agent_component_evaluation_queue_size{controller_id="remotecfg",controller_path="/"} 0

agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/"} 1.7119712864340181e+09
agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/"} 1.7119712864338827e+09
agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/b_namespace.b.default/"} 1.7119712864335563e+09
agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/"} 1.711971286433792e+09
```

while logs look like

```
{"ts":"2024-04-01T15:58:33.249022375Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"a_namespace.a.default","node_id":"b_namespace.b.default","duration":1056168}
{"ts":"2024-04-01T15:58:33.249143064Z","level":"info","msg":"finished node evaluation","controller_path":"/a_namespace.a.default","controller_id":"b_namespace.b.default","node_id":"export.b_export","duration":28284}
{"ts":"2024-04-01T15:58:33.249196014Z","level":"info","msg":"starting complete graph evaluation","controller_path":"/a_namespace.a.default/b_namespace.b.default","controller_id":"c_namespace.c.default","trace_id":"8cd17ad7fe83062eb212ce2605f53add"}
{"ts":"2024-04-01T15:58:33.249232123Z","level":"info","msg":"finished node evaluation","controller_path":"/a_namespace.a.default/b_namespace.b.default","controller_id":"c_namespace.c.default","trace_id":"8cd17ad7fe83062eb212ce2605f53add","node_id":"prometheus.exporter.self.default","duration":6792}

{"ts":"2024-04-02T08:14:31.079716842Z","level":"info","msg":"import exited","config_path":"/","config_id":"import.file.a_namespace"}
{"ts":"2024-04-02T08:14:31.07969433Z","level":"info","msg":"import exited","config_path":"/a_namespace.a.default","config_id":"import.file.b_namespace"}
{"ts":"2024-04-02T08:14:31.079733964Z","level":"info","msg":"import exited","config_path":"/a_namespace.a.default/b_namespace.b.default","config_id":"import.file.c_namespace"}
{"ts":"2024-04-02T08:14:31.079736279Z","level":"info","msg":"custom component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"}
```

Signed-off-by: Paschalis Tsilias <[email protected]>
Copy link
Contributor

@thampiotr thampiotr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@@ -165,11 +165,13 @@ func NewBuiltinComponentNode(globals ComponentGlobals, reg component.Registratio

func getManagedOptions(globals ComponentGlobals, cn *BuiltinComponentNode) component.Options {
cn.registry = prometheus.NewRegistry()
parent, id := path.Split(cn.globalID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like the leading / too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accidental commit?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops 🤦

@tpaschalis tpaschalis requested a review from rfratto April 2, 2024 14:59
Signed-off-by: Paschalis Tsilias <[email protected]>
Copy link
Member

@rfratto rfratto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Comment on lines 409 to 414
dashboard.newMultiTemplateVariable('component', |||
label_values(agent_wal_samples_appended_total{cluster="$cluster", namespace="$namespace", instance=~"$instance", component_id=~"prometheus\\.remote_write\\..*"}, component_id)
|||),
dashboard.newMultiTemplateVariable('component_path', |||
label_values(agent_wal_samples_appended_total{cluster="$cluster", namespace="$namespace", instance=~"$instance", component_id=~"prometheus\\.remote_write\\..*", component_path=~".*"}, component_path)
|||),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I think the order of these should be swapped; the order here is most broad to most specific, so based on that path goes before component ID.

Signed-off-by: Paschalis Tsilias <[email protected]>
@tpaschalis tpaschalis merged commit 1a035e4 into grafana:main Apr 3, 2024
10 checks passed
@github-actions github-actions bot added the frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed. label May 4, 2024
@github-actions github-actions bot locked as resolved and limited conversation to collaborators May 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
frozen-due-to-age Locked due to a period of inactivity. Please open new issues or PRs if more discussion is needed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Less bloated collection of debug metrics from modules
4 participants