Propagate apm config #3223

pchila · 2023-08-10T14:31:15Z

What does this PR do?

This change propagates the APM configuration set up for agent to the components that are managed.

We only support Elastic APM at the moment and at the time of this change this can only be configured in elastic-agent.yml configuration file.

For fleet-managed agents we include a workaround that will inject this configuration from the config file; this workaround inject the configuration from the config file in all the config changes received from Fleet, however it does not support hot reloading in this configuration: any changes to the apm configuration will take effect after a restart.

Here's what an example of APM configuration looks like:

agent.monitoring:
  ...
  traces: true
  apm:
    hosts:
      - https://12519f377f2449228f5095a5b549fe49.apm.us-central1.gcp.qa.cld.elstc.co:443
    environment: test-apm-changed
    secret_token: secret
    tls:
      skip_verify: true

Why is it important?

We want to manage the apm configuration from a single place for all components run via Elastic Agent.

Checklist

My code follows the style guidelines of this project
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
I have made corresponding change to the default configuration files
I have added tests that prove my fix is effective or that my feature works
~~[ ] I have added an entry in ./changelog/fragments using the changelog tool~~
I have added an integration test or an E2E test

Author's Checklist

Add a config injector to restore apm config for fleet-managed agents (fleet policy overwrites whatever we have in the config file)

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

How are we going to support this in production?
How are we going to measure its adoption?
How are we going to debug this?
What are the metrics I should take care of?
...

mergify · 2023-08-10T14:31:53Z

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

elasticmachine · 2023-08-10T14:38:19Z

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS

Expand to view the summary

Build stats

Start Time: 2023-09-28T17:13:44.391+0000
Duration: 27 min 1 sec

Test stats 🧪

Test	Results
Failed	0
Passed	6405
Skipped	59
Total	6464

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

/test : Re-trigger the build.
/package : Generate the packages.
run integration tests : Run the Elastic Agent Integration tests.
run end-to-end tests : Generate the packages and run the E2E Tests.
run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

blakerouse

I know this is in draft, but want to provide some early feedback. I like how this is looking.

internal/pkg/agent/application/application.go

mergify · 2023-08-11T13:01:59Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b propagate-apm-config upstream/propagate-apm-config
git merge upstream/main
git push upstream propagate-apm-config

elasticmachine · 2023-08-25T12:41:38Z

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

AndersonQ

Nothing serious, but I believe you still need to update to a released version of the elastic-agent-client and there was an append that was doing nothing. So either you forgot to add the element to append, or the line should be removed

go.mod

internal/pkg/agent/application/apm_config_modifier.go

pkg/component/runtime/manager_test.go

mergify · 2023-08-30T14:41:02Z

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b propagate-apm-config upstream/propagate-apm-config
git merge upstream/main
git push upstream propagate-apm-config

pkg/component/fake/component/README.md

pchila · 2023-09-04T15:58:25Z

/test

dmathieu

I don't know if that kind of doc already exists for other configs, but how about documenting how services should handle the added config?

dmathieu · 2023-09-05T09:13:57Z

internal/pkg/agent/application/coordinator/config_patcher.go

+type ConfigPatch func(change ConfigChange) ConfigChange
+
+// ConfigPatchManager is a decorator to restore some agent settings from the elastic agent configuration file
+type ConfigPatchManager struct {


It seems this struct is lacking tests?

dmathieu · 2023-09-05T09:14:37Z

internal/pkg/agent/application/apm_config_modifier.go

+	"github.com/elastic/elastic-agent/pkg/utils"
+)
+
+func InjectAPMConfig(comps []component.Component, cfg map[string]interface{}) ([]component.Component, error) {


As someone who doesn't know much/anything about elastic-agent, I find it hard to understand why we have to inject and then patch. How about documenting those two public methods?

I will add some documentation, the short version is:

PatchAPMConfig will patch agent configuration coming from fleet (which does not support APM configuration at the moment) by adding the APM config values retrieved from the elastic-agent.yaml file

InjectAPMConfig will add APM configuration to the internal datamodel agent uses to determine which components and units to run

dmathieu · 2023-09-05T09:19:42Z

internal/pkg/agent/application/coordinator/config_patcher.go

+}
+
+func (c ConfigPatchManager) Run(ctx context.Context) error {
+	go c.patch(c.inner.Watch(), c.outCh)


An unchecked goroutine here feels a bit brittle/unsafe. Could we end up with an explosion of running goroutines? Should there be a timeout?

What exactly do you mean an unchecked goroutine? Are you referring to the fact that there is no waitgroup or other syncronization object for the goroutine ?

This structure will decorate an inner ConfigManager and patch configuration using the goroutine you see here for as long as the inner ConfigManager is running (the lifespan of the goroutine depends on the inner configmanager lifespan)

Yes. If somehow, inner.Run() returns right away, but inner.Watch() doesn't at all, then you could end up with a goroutines leak.
I also don't have much context on this code at all, so maybe that's not really possible.

inner.Watch() returns the channel on which config changes are published. We use that channel to intercept configuration changes that we want to patch.
The basic pattern for a ConfigManager is to start it and then consume all the configuration changes passed on the channel.
The ConfigPatchManager does the same thing with the addition of a goroutine that will read from the inner channel, patch and publish on its own channel

elasticmachine · 2023-09-06T13:15:37Z

🌐 Coverage report

Name	Metrics % (`covered/total`)	Diff
Packages	98.78% (`81/82`)	👍
Files	65.886% (`197/299`)	👎 -0.215
Classes	65.461% (`362/553`)	👎 -0.232
Methods	52.726% (`1141/2164`)	👎 -0.018
Lines	38.346% (`12998/33897`)	👍 0.143
Conditionals	100.0% (`0/0`)	💚

blakerouse

I really looked through the code. It is very well documented and has a good amount of tests. I really like the addition in the fake component to really test this change from Elastic Agent down to the actual running component.

pchila · 2023-09-07T07:20:45Z

/test

pierrehilbert · 2023-09-08T16:42:24Z

buildkite test this

pchila · 2023-09-11T14:46:26Z

@dmathieu

I don't know if that kind of doc already exists for other configs, but how about documenting how services should handle the added config?

This feature is for internal use at the moment, so no official docs are created at this stage.

Regarding how services should handle the added config: it depends :)
The basic idea is that the agent will propagate the APM configuration it receives to the components and units it manages and it expects that such components take appropriate action to reload/reinstantiate their APM instrumentation.
How that happens in detail depends on what sort of APM objects the component uses, for example:

if the component uses a decorated http server it may be needed to stop (gracefully) the current server, recreate it with the new configuration and start the new one.
if it uses a custom Tracer object, it will need to create the new one, close the old one and swap them safely (this is for example what happens in the e2e test in this PR)

As of now the agent and the agent client will not set the APM env variables for the component or restart it, the processing is delegated to the component itself (if then the component decides that setting the env and reexecuting is the easier way to apply the configuration, that is obviously an option)

pchila · 2023-09-25T15:51:33Z

buildkite test this

pchila · 2023-09-26T06:25:20Z

buildkite test this

elastic-sonarqube · 2023-09-28T17:25:03Z

SonarQube Quality Gate

0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells

69.1% Coverage
0.0% Duplication

pchila added the enhancement New feature or request label Aug 10, 2023

pchila requested review from ycombinator, AndersonQ and joshdover August 10, 2023 14:31

mergify bot assigned pchila Aug 10, 2023

mergify bot added the backport-skip label Aug 10, 2023

blakerouse reviewed Aug 10, 2023

View reviewed changes

internal/pkg/agent/application/application.go Outdated Show resolved Hide resolved

pchila force-pushed the propagate-apm-config branch from 8e57dc8 to b7960e6 Compare August 21, 2023 15:40

pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Aug 24, 2023

pchila marked this pull request as ready for review August 25, 2023 12:41

pchila requested a review from a team as a code owner August 25, 2023 12:41

pierrehilbert requested a review from a team August 28, 2023 09:26

AndersonQ requested changes Aug 30, 2023

View reviewed changes

dmathieu reviewed Sep 1, 2023

View reviewed changes

pkg/component/fake/component/README.md Outdated Show resolved Hide resolved

pchila force-pushed the propagate-apm-config branch 2 times, most recently from 62e927c to 9283429 Compare September 1, 2023 17:00

dmathieu reviewed Sep 5, 2023

View reviewed changes

blakerouse approved these changes Sep 6, 2023

View reviewed changes

AndersonQ approved these changes Sep 11, 2023

View reviewed changes

pchila added the skip-changelog label Sep 11, 2023

pchila added 22 commits September 28, 2023 19:03

Update elastic-agent-client version

259744b

Pass apm config to components

af057c4

Add config patcher for apm injection in fleet managed agents

e1848f8

Add tests for APM config injector and patcher

c587bab

Add retrieve apm config action to fake component

cede9b6

Add FakeInput tests for APM config

8954acc

fix make check

fc01b8e

Add global labels to apm config

b7c9192

lint

31129b9

WIP - introduce new apm fake input

dad0749

Add apm fake input to fake component

dc02906

Move and update fake component spec definition

2fee30f

set APM global labels as a map

72abf79

fixup! DO NOT MERGE - debug commit

4d31c5d

Incorporate code review feedback

dd1e775

make check-ci

b495da4

fix sonarqube code smell

0304a14

remove redundant ApmConfig injection

f4d8b6e

Add APM config propagation e2e test

80d804f

lint

6fc4a2a

add documentation for InjectAPMConfig and PatchAPMConfig

edf6084

add docs

fff955e

pchila force-pushed the propagate-apm-config branch from 874af15 to fff955e Compare September 28, 2023 17:13

pchila merged commit 123ba9c into elastic:main Sep 29, 2023
24 checks passed

pchila deleted the propagate-apm-config branch September 29, 2023 07:20

leehinman mentioned this pull request Nov 8, 2023

Linux agents gets unhealthy on enabling/disabling modules for System/Linux integration. #3654

Closed

tetianakravchenko mentioned this pull request Feb 12, 2024

[Kubernetes Integration] Investigate Elastic Agent API calls and check memory consumption #4122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Propagate apm config #3223

Propagate apm config #3223

pchila commented Aug 10, 2023 •

edited

Loading

mergify bot commented Aug 10, 2023

elasticmachine commented Aug 10, 2023 •

edited

Loading

Build stats

Test stats 🧪

blakerouse left a comment

mergify bot commented Aug 11, 2023

elasticmachine commented Aug 25, 2023

AndersonQ left a comment

mergify bot commented Aug 30, 2023

pchila commented Sep 4, 2023

dmathieu left a comment

dmathieu Sep 5, 2023

dmathieu Sep 5, 2023

pchila Sep 6, 2023

dmathieu Sep 5, 2023

pchila Sep 6, 2023

dmathieu Sep 6, 2023

pchila Sep 6, 2023

elasticmachine commented Sep 6, 2023 •

edited

Loading

blakerouse left a comment

pchila commented Sep 7, 2023

pierrehilbert commented Sep 8, 2023

pchila commented Sep 11, 2023

pchila commented Sep 25, 2023

pchila commented Sep 26, 2023

elastic-sonarqube bot commented Sep 28, 2023

Propagate apm config #3223

Propagate apm config #3223

Conversation

pchila commented Aug 10, 2023 • edited Loading

What does this PR do?

Why is it important?

Checklist

Author's Checklist

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

mergify bot commented Aug 10, 2023

elasticmachine commented Aug 10, 2023 • edited Loading

💚 Build Succeeded

Build stats

Test stats 🧪

💚 Flaky test report

🤖 GitHub comments

blakerouse left a comment

Choose a reason for hiding this comment

mergify bot commented Aug 11, 2023

elasticmachine commented Aug 25, 2023

AndersonQ left a comment

Choose a reason for hiding this comment

mergify bot commented Aug 30, 2023

pchila commented Sep 4, 2023

dmathieu left a comment

Choose a reason for hiding this comment

dmathieu Sep 5, 2023

Choose a reason for hiding this comment

dmathieu Sep 5, 2023

Choose a reason for hiding this comment

pchila Sep 6, 2023

Choose a reason for hiding this comment

dmathieu Sep 5, 2023

Choose a reason for hiding this comment

pchila Sep 6, 2023

Choose a reason for hiding this comment

dmathieu Sep 6, 2023

Choose a reason for hiding this comment

pchila Sep 6, 2023

Choose a reason for hiding this comment

elasticmachine commented Sep 6, 2023 • edited Loading

🌐 Coverage report

blakerouse left a comment

Choose a reason for hiding this comment

pchila commented Sep 7, 2023

pierrehilbert commented Sep 8, 2023

pchila commented Sep 11, 2023

pchila commented Sep 25, 2023

pchila commented Sep 26, 2023

elastic-sonarqube bot commented Sep 28, 2023

pchila commented Aug 10, 2023 •

edited

Loading

elasticmachine commented Aug 10, 2023 •

edited

Loading

elasticmachine commented Sep 6, 2023 •

edited

Loading