-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
flow: separate parent component and controller IDs into new labels #6786
Conversation
b3f881b
to
8c19814
Compare
This commit changes the wrapped register passed to components through getManagedOptions to separate the parent controller ID from the component ID. This makes use of path.Split, which results in the following change in metrics for the local.file component used in the example. ```diff < agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default"} 1.7117163012858703e+09 < agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/b_namespace.b.default/local.file.default"} 1.711716301286142e+09 < agent_local_file_timestamp_last_accessed_unix_seconds{component_id="a_namespace.a.default/local.file.default"} 1.7117163012854662e+09 < agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default"} 1.711716301285211e+09 --- > agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path=""} 1.7117162894922016e+09 > agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/"} 1.7117162894936502e+09 > agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/b_namespace.b.default/"} 1.7117162894930658e+09 > agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/"} 1.711716289493361e ``` Signed-off-by: Paschalis Tsilias <[email protected]>
This commits changes the newControllerCollector and newControllerMetrics helpers to split out a new "controller_path" label from the existing controller_id one. This also makes use of path.Split, which results in the following change in metrics for the nested controllers used in the example. ```diff < agent_component_controller_evaluating{controller_id=""} 0 < agent_component_controller_evaluating{controller_id="a_namespace.a.default"} 0 < agent_component_controller_evaluating{controller_id="a_namespace.a.default/b_namespace.b.default"} 0 < agent_component_controller_evaluating{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"} 0 < agent_component_controller_evaluating{controller_id="remotecfg"} 0 --- > agent_component_controller_evaluating{controller_id="",controller_path=""} 0 > agent_component_controller_evaluating{controller_id="a_namespace.a.default",controller_path=""} 0 > agent_component_controller_evaluating{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/"} 0 > agent_component_controller_evaluating{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/"} 0 > agent_component_controller_evaluating{controller_id="remotecfg",controller_path=""} 0 < agent_component_controller_running_components{controller_id="",health_type="healthy"} 3 < agent_component_controller_running_components{controller_id="a_namespace.a.default",health_type="healthy"} 4 < agent_component_controller_running_components{controller_id="a_namespace.a.default/b_namespace.b.default",health_type="healthy"} 4 < agent_component_controller_running_components{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default",health_type="healthy"} 2 --- > agent_component_controller_running_components{controller_id="",controller_path="",health_type="healthy"} 3 > agent_component_controller_running_components{controller_id="a_namespace.a.default",controller_path="",health_type="healthy"} 4 > agent_component_controller_running_components{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/",health_type="healthy"} 4 > agent_component_controller_running_components{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/",health_type="healthy"} 2 < agent_component_evaluation_queue_size{controller_id=""} 1 < agent_component_evaluation_queue_size{controller_id="a_namespace.a.default"} 2 < agent_component_evaluation_queue_size{controller_id="a_namespace.a.default/b_namespace.b.default"} 2 < agent_component_evaluation_queue_size{controller_id="a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"} 2 < agent_component_evaluation_queue_size{controller_id="remotecfg"} 0 --- > agent_component_evaluation_queue_size{controller_id="",controller_path=""} 1 > agent_component_evaluation_queue_size{controller_id="a_namespace.a.default",controller_path=""} 2 > agent_component_evaluation_queue_size{controller_id="b_namespace.b.default",controller_path="a_namespace.a.default/"} 2 > agent_component_evaluation_queue_size{controller_id="c_namespace.c.default",controller_path="a_namespace.a.default/b_namespace.b.default/"} 0 > agent_component_evaluation_queue_size{controller_id="remotecfg",controller_path=""} 0 ``` Signed-off-by: Paschalis Tsilias <[email protected]>
This commit reuses the same parent, id values that we propagate to newControllerCollector and newControllerMetrics as a separate key-value logger field. This means that log lines now look like this for a service on the root controller as well as components in the nested controllers. ```diff < {"ts":"2024-03-29T14:05:12.629581007Z",level:"info",msg:"finished node evaluation",controller_id:"",trace_id:"7b2e528779dfda54bc075e5d46fa3a19",node_id:"http",duration:6352} < {"ts":"2024-03-29T14:05:12.631090297Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default",trace_id:"2b33effd0d74b6b74c2c55b0961aa281",node_id:"prometheus.exporter.self.default",duration:3737} < {"ts":"2024-03-29T14:05:12.631227727Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default/b_namespace.b.default",trace_id:"1ddffae931697aedcef8fc5510664aee",node_id:"prometheus.exporter.self.default",duration:5240} < {"ts":"2024-03-29T14:05:12.63149798Z",level:"info",msg:"finished node evaluation",controller_id:"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default",trace_id:"cb4ee700e51886448d4628a6bd78cb2c",node_id:"prometheus.exporter.self.default",duration:5681} --- > {"ts":"2024-03-29T14:05:01.362283161Z",level:"info",msg:"finished node evaluation",controller_path:"",controller_id:"",trace_id:"06ca5763a8ba421a17459675df5c8fc7",node_id:"http",duration:6212} > {"ts":"2024-03-29T14:05:01.362659726Z",level:"info",msg:"finished node evaluation",controller_path:"",controller_id:"a_namespace.a.default",trace_id:"4ec9b04a0508d6d46c0c3ba1b7b3c642",node_id:"prometheus.exporter.self.default",duration:4278} > {"ts":"2024-03-29T14:05:01.363112694Z",level:"info",msg:"finished node evaluation",controller_path:"a_namespace.a.default/",controller_id:"b_namespace.b.default",trace_id:"340525264c0db8529881fc5793cb1ed2",node_id:"prometheus.exporter.self.default",duration:3497} > {"ts":"2024-03-29T14:05:01.363235206Z",level:"info",msg:"finished node evaluation",controller_path:"a_namespace.a.default/b_namespace.b.default/",controller_id:"c_namespace.c.default",trace_id:"a0608493d9888dbbc2ca22154c18328c",node_id:"prometheus.exporter.self.default",duration:3206} ``` Signed-off-by: Paschalis Tsilias <[email protected]>
8c19814
to
00850cb
Compare
@@ -165,11 +165,13 @@ func NewBuiltinComponentNode(globals ComponentGlobals, reg component.Registratio | |||
|
|||
func getManagedOptions(globals ComponentGlobals, cn *BuiltinComponentNode) component.Options { | |||
cn.registry = prometheus.NewRegistry() | |||
parent, id := path.Split(cn.globalID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that from the internal discussion about this, there was a mention to make component_path
rooted to make it clearly indicate a global path:
Global IDs start with a path separator, similarly to absolute file paths, to clearly indicate rooting.
Additionally, if there is no parent, component_path
should be /
:
component_path: the global ID of the containing subgraph for a component. If there is no containing subgraph, component_path is the rooted path /.
In context:
For the global ID
/custom_component.default/prometheus.remote_write
, the component_path label is/custom_component.default
and the component_id label isprometheus.remote_write
.
(This rooting logic applies to the other metrics as well wherever the labels are being split)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the leading /
too.
Registerer: prometheus.WrapRegistererWith(prometheus.Labels{ | ||
"component_id": cn.globalID, | ||
"component_path": parent, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the mixin will need to be updated alongside this, since component_id
is used in some dashboards (which is why we were doing this prior to 1.0 since it would be a breaking change to split out the labels after-the-fact).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pushed a commit to include references to component_path
eg when summing up component-level metrics for the Controller and Prometheus dashboards.
Here's how it looks like in practice. I'll check the dashboards in more environments with nested modules and more data.
Finally, the OpenTelemetry dashboard doesn't drill down to the component level so it's left as-is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, though as a nit I think component_path should before component in the list of variables.
00850cb
to
ecb1af0
Compare
This PR continues using path.Split to achieve two things. First, simplify the log messages for built-in components by splitting the generic "component" label into "component_id" and "component_path" for built-in components as well as the generic "config" label for nested controllers into "config_id" and "config_path". Here's the difference in logs when the Agent is quitting for the example configuration. These logs have been ordered and their timestamp has been removed so that they're easier to parse. ```diff < {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"} < {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"} < {"ts":"","level":"error","msg":"error running exporter","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"prometheus.exporter.self.default","err":"context canceled"} < {"ts":"","level":"info","msg":"component exited","component_path":"","component_id":"local.file.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/","component_id":"local.file.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/","component_id":"prometheus.exporter.self.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"local.file.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/","component_id":"prometheus.exporter.self.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"local.file.default"} < {"ts":"","level":"info","msg":"component exited","component_path":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/","component_id":"prometheus.exporter.self.default"} < {"ts":"","level":"info","msg":"import exited","config_path":"","config_id":"import.file.a_namespace"} < {"ts":"","level":"info","msg":"import exited","config_path":"a_namespace.a.default/","config_id":"import.file.b_namespace"} < {"ts":"","level":"info","msg":"import exited","config_path":"a_namespace.a.default/b_namespace.b.default/","config_id":"import.file.c_namespace"} --- > {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/prometheus.exporter.self.default","err":"context canceled"} > {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/prometheus.exporter.self.default","err":"context canceled"} > {"ts":"","level":"error","msg":"error running exporter","component":"a_namespace.a.default/b_namespace.b.default/prometheus.exporter.self.default","err":"context canceled"} > {"ts":"","level":"info","msg":"component exited","component":"local.file.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/local.file.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/prometheus.exporter.self.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/local.file.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/prometheus.exporter.self.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default"} > {"ts":"","level":"info","msg":"component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/prometheus.exporter.self.default"} > {"ts":"","level":"info","msg":"import exited","config":"import.file.a_namespace"} > {"ts":"","level":"info","msg":"import exited","config":"a_namespace.a.default/import.file.b_namespace"} > {"ts":"","level":"info","msg":"import exited","config":"a_namespace.a.default/b_namespace.b.default/import.file.c_namespace"} ``` Secondly, this also touches the registerer passed to the import.* blocks for custom components. This registerer is not used anywhere yet so this change isn't visible but I temporarily added a metric on import.file's constructor to showcase the change below. ```diff < import_file_custom_metric{config_id="import.file.a_namespace",config_path=""} 1.7117248839893064e+09 < import_file_custom_metric{config_id="import.file.b_namespace",config_path="a_namespace.a.default/"} 1.7117248839898353e+09 < import_file_custom_metric{config_id="import.file.c_namespace",config_path="a_namespace.a.default/b_namespace.b.default/"} 1.7117248839900513e+09 --- > import_file_custom_metric{config_id="import.file.a_namespace"} 1.7117249370875866e+09 > import_file_custom_metric{config_id="a_namespace.a.default/import.file.b_namespace"} 1.7117249370882547e+09 > import_file_custom_metric{config_id="a_namespace.a.default/b_namespace.b.default/import.file.c_namespace"} 1.7117249370885952e+09 ``` Signed-off-by: Paschalis Tsilias <[email protected]>
This makes sure that a) paths are always rooted, and that b) paths have no trailing slashes so that we're aligned with our internal discussion. This means the metrics look like: ``` agent_component_evaluation_queue_size{controller_id="",controller_path="/"} 1 agent_component_evaluation_queue_size{controller_id="a_namespace.a.default",controller_path="/"} 2 agent_component_evaluation_queue_size{controller_id="b_namespace.b.default",controller_path="/a_namespace.a.default/"} 2 agent_component_evaluation_queue_size{controller_id="c_namespace.c.default",controller_path="/a_namespace.a.default/b_namespace.b.default/"} 0 agent_component_evaluation_queue_size{controller_id="remotecfg",controller_path="/"} 0 agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/"} 1.7119712864340181e+09 agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/"} 1.7119712864338827e+09 agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/b_namespace.b.default/"} 1.7119712864335563e+09 agent_local_file_timestamp_last_accessed_unix_seconds{component_id="local.file.default",component_path="/a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/"} 1.711971286433792e+09 ``` while logs look like ``` {"ts":"2024-04-01T15:58:33.249022375Z","level":"info","msg":"finished node evaluation","controller_path":"/","controller_id":"a_namespace.a.default","node_id":"b_namespace.b.default","duration":1056168} {"ts":"2024-04-01T15:58:33.249143064Z","level":"info","msg":"finished node evaluation","controller_path":"/a_namespace.a.default","controller_id":"b_namespace.b.default","node_id":"export.b_export","duration":28284} {"ts":"2024-04-01T15:58:33.249196014Z","level":"info","msg":"starting complete graph evaluation","controller_path":"/a_namespace.a.default/b_namespace.b.default","controller_id":"c_namespace.c.default","trace_id":"8cd17ad7fe83062eb212ce2605f53add"} {"ts":"2024-04-01T15:58:33.249232123Z","level":"info","msg":"finished node evaluation","controller_path":"/a_namespace.a.default/b_namespace.b.default","controller_id":"c_namespace.c.default","trace_id":"8cd17ad7fe83062eb212ce2605f53add","node_id":"prometheus.exporter.self.default","duration":6792} {"ts":"2024-04-02T08:14:31.079716842Z","level":"info","msg":"import exited","config_path":"/","config_id":"import.file.a_namespace"} {"ts":"2024-04-02T08:14:31.07969433Z","level":"info","msg":"import exited","config_path":"/a_namespace.a.default","config_id":"import.file.b_namespace"} {"ts":"2024-04-02T08:14:31.079733964Z","level":"info","msg":"import exited","config_path":"/a_namespace.a.default/b_namespace.b.default","config_id":"import.file.c_namespace"} {"ts":"2024-04-02T08:14:31.079736279Z","level":"info","msg":"custom component exited","component":"a_namespace.a.default/b_namespace.b.default/c_namespace.c.default"} ``` Signed-off-by: Paschalis Tsilias <[email protected]>
0bb6cf6
to
e18782b
Compare
Signed-off-by: Paschalis Tsilias <[email protected]>
e18782b
to
e966cce
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
@@ -165,11 +165,13 @@ func NewBuiltinComponentNode(globals ComponentGlobals, reg component.Registratio | |||
|
|||
func getManagedOptions(globals ComponentGlobals, cn *BuiltinComponentNode) component.Options { | |||
cn.registry = prometheus.NewRegistry() | |||
parent, id := path.Split(cn.globalID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the leading /
too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Accidental commit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops 🤦
Signed-off-by: Paschalis Tsilias <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
dashboard.newMultiTemplateVariable('component', ||| | ||
label_values(agent_wal_samples_appended_total{cluster="$cluster", namespace="$namespace", instance=~"$instance", component_id=~"prometheus\\.remote_write\\..*"}, component_id) | ||
|||), | ||
dashboard.newMultiTemplateVariable('component_path', ||| | ||
label_values(agent_wal_samples_appended_total{cluster="$cluster", namespace="$namespace", instance=~"$instance", component_id=~"prometheus\\.remote_write\\..*", component_path=~".*"}, component_path) | ||
|||), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I think the order of these should be swapped; the order here is most broad to most specific, so based on that path goes before component ID.
Signed-off-by: Paschalis Tsilias <[email protected]>
PR Description
This PR amends the registerers and loggers we propagate into the graph to separate the so that the parent component/controller IDs are stored into a separate label.
This makes these metrics easier to read and manage and unlocks other solutions for large names in the case of nested modules in the future.
The metrics and logs examples are validated using the following example that uses nested modules.
Which issue(s) this PR fixes
Closes #6260
Notes to the Reviewer
The tracer implementation still takes the full
globals.ControllerID
since all it needs is to call path.Ext and get the final component name as the dot-style extension.One thing to agree upon is whether path.Split makes sense for separating the parent and final IDs, or if we should add a custom helper. Currently, global IDs are currently constructed like
a_namespace.a.default/b_namespace.b.default/c_namespace.c.default/local.file.default
, i.e. without a leading slash. So for the root controller, we're left with an empty label instead of"/"
which was discussed at some point.TODO:
PR Checklist