Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propagate apm config #3223

Merged
merged 22 commits into from
Sep 29, 2023
Merged

Propagate apm config #3223

merged 22 commits into from
Sep 29, 2023

Conversation

pchila
Copy link
Member

@pchila pchila commented Aug 10, 2023

What does this PR do?

This change propagates the APM configuration set up for agent to the components that are managed.

We only support Elastic APM at the moment and at the time of this change this can only be configured in elastic-agent.yml configuration file.

For fleet-managed agents we include a workaround that will inject this configuration from the config file; this workaround inject the configuration from the config file in all the config changes received from Fleet, however it does not support hot reloading in this configuration: any changes to the apm configuration will take effect after a restart.

Here's what an example of APM configuration looks like:

agent.monitoring:
  ...
  traces: true
  apm:
    hosts:
      - https://12519f377f2449228f5095a5b549fe49.apm.us-central1.gcp.qa.cld.elstc.co:443
    environment: test-apm-changed
    secret_token: secret
    tls:
      skip_verify: true

Why is it important?

We want to manage the apm configuration from a single place for all components run via Elastic Agent.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • [ ] I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

Author's Checklist

  • Add a config injector to restore apm config for fleet-managed agents (fleet policy overwrites whatever we have in the config file)

How to test this PR locally

Related issues

Use cases

Screenshots

Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@pchila pchila added the enhancement New feature or request label Aug 10, 2023
@mergify mergify bot assigned pchila Aug 10, 2023
@mergify
Copy link
Contributor

mergify bot commented Aug 10, 2023

This pull request does not have a backport label. Could you fix it @pchila? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Aug 10, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-28T17:13:44.391+0000

  • Duration: 27 min 1 sec

Test stats 🧪

Test Results
Failed 0
Passed 6405
Skipped 59
Total 6464

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is in draft, but want to provide some early feedback. I like how this is looking.

internal/pkg/agent/application/application.go Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Aug 11, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b propagate-apm-config upstream/propagate-apm-config
git merge upstream/main
git push upstream propagate-apm-config

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Aug 24, 2023
@pchila pchila marked this pull request as ready for review August 25, 2023 12:41
@pchila pchila requested a review from a team as a code owner August 25, 2023 12:41
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert pierrehilbert requested a review from a team August 28, 2023 09:26
Copy link
Member

@AndersonQ AndersonQ left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing serious, but I believe you still need to update to a released version of the elastic-agent-client and there was an append that was doing nothing. So either you forgot to add the element to append, or the line should be removed

go.mod Outdated Show resolved Hide resolved
internal/pkg/agent/application/apm_config_modifier.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/apm_config_modifier.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/apm_config_modifier.go Outdated Show resolved Hide resolved
internal/pkg/agent/application/apm_config_modifier.go Outdated Show resolved Hide resolved
pkg/component/runtime/manager_test.go Outdated Show resolved Hide resolved
pkg/component/runtime/manager_test.go Outdated Show resolved Hide resolved
pkg/component/runtime/manager_test.go Outdated Show resolved Hide resolved
pkg/component/runtime/manager_test.go Outdated Show resolved Hide resolved
pkg/component/runtime/manager_test.go Outdated Show resolved Hide resolved
@mergify
Copy link
Contributor

mergify bot commented Aug 30, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b propagate-apm-config upstream/propagate-apm-config
git merge upstream/main
git push upstream propagate-apm-config

@pchila pchila force-pushed the propagate-apm-config branch 2 times, most recently from 62e927c to 9283429 Compare September 1, 2023 17:00
@pchila
Copy link
Member Author

pchila commented Sep 4, 2023

/test

Copy link
Member

@dmathieu dmathieu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if that kind of doc already exists for other configs, but how about documenting how services should handle the added config?

type ConfigPatch func(change ConfigChange) ConfigChange

// ConfigPatchManager is a decorator to restore some agent settings from the elastic agent configuration file
type ConfigPatchManager struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this struct is lacking tests?

"github.com/elastic/elastic-agent/pkg/utils"
)

func InjectAPMConfig(comps []component.Component, cfg map[string]interface{}) ([]component.Component, error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As someone who doesn't know much/anything about elastic-agent, I find it hard to understand why we have to inject and then patch. How about documenting those two public methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add some documentation, the short version is:

  • PatchAPMConfig will patch agent configuration coming from fleet (which does not support APM configuration at the moment) by adding the APM config values retrieved from the elastic-agent.yaml file
  • InjectAPMConfig will add APM configuration to the internal datamodel agent uses to determine which components and units to run

}

func (c ConfigPatchManager) Run(ctx context.Context) error {
go c.patch(c.inner.Watch(), c.outCh)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An unchecked goroutine here feels a bit brittle/unsafe. Could we end up with an explosion of running goroutines? Should there be a timeout?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean an unchecked goroutine? Are you referring to the fact that there is no waitgroup or other syncronization object for the goroutine ?

This structure will decorate an inner ConfigManager and patch configuration using the goroutine you see here for as long as the inner ConfigManager is running (the lifespan of the goroutine depends on the inner configmanager lifespan)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. If somehow, inner.Run() returns right away, but inner.Watch() doesn't at all, then you could end up with a goroutines leak.
I also don't have much context on this code at all, so maybe that's not really possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inner.Watch() returns the channel on which config changes are published. We use that channel to intercept configuration changes that we want to patch.
The basic pattern for a ConfigManager is to start it and then consume all the configuration changes passed on the channel.
The ConfigPatchManager does the same thing with the addition of a goroutine that will read from the inner channel, patch and publish on its own channel

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 6, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 65.886% (197/299) 👎 -0.215
Classes 65.461% (362/553) 👎 -0.232
Methods 52.726% (1141/2164) 👎 -0.018
Lines 38.346% (12998/33897) 👍 0.143
Conditionals 100.0% (0/0) 💚

Copy link
Contributor

@blakerouse blakerouse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really looked through the code. It is very well documented and has a good amount of tests. I really like the addition in the fake component to really test this change from Elastic Agent down to the actual running component.

@pchila
Copy link
Member Author

pchila commented Sep 7, 2023

/test

@pierrehilbert
Copy link
Contributor

buildkite test this

@pchila
Copy link
Member Author

pchila commented Sep 11, 2023

@dmathieu

I don't know if that kind of doc already exists for other configs, but how about documenting how services should handle the added config?

This feature is for internal use at the moment, so no official docs are created at this stage.

Regarding how services should handle the added config: it depends :)
The basic idea is that the agent will propagate the APM configuration it receives to the components and units it manages and it expects that such components take appropriate action to reload/reinstantiate their APM instrumentation.
How that happens in detail depends on what sort of APM objects the component uses, for example:

  • if the component uses a decorated http server it may be needed to stop (gracefully) the current server, recreate it with the new configuration and start the new one.
  • if it uses a custom Tracer object, it will need to create the new one, close the old one and swap them safely (this is for example what happens in the e2e test in this PR)

As of now the agent and the agent client will not set the APM env variables for the component or restart it, the processing is delegated to the component itself (if then the component decides that setting the env and reexecuting is the easier way to apply the configuration, that is obviously an option)

@pchila
Copy link
Member Author

pchila commented Sep 25, 2023

buildkite test this

1 similar comment
@pchila
Copy link
Member Author

pchila commented Sep 26, 2023

buildkite test this

@elastic-sonarqube
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip enhancement New feature or request skip-changelog Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Propagate APM tracing configuration to sub-processes via the control protocol
6 participants