Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support flattened data_stream.* fields #3465

Merged
merged 11 commits into from
Nov 8, 2023

Conversation

belimawr
Copy link
Contributor

@belimawr belimawr commented Sep 22, 2023

What does this PR do?

An input configuration supports flattened fields, however the 'data_stream' field was not being correctly decoded when flattened. This commit fixes this issue.

Why is it important?

It increases consistency within the Elastic-Agent configuration and our other projects, like Beats, because they all support flattened configuration keys.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
    - [ ] I have made corresponding change to the default configuration files
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an entry in ./changelog/fragments using the changelog tool
  • I have added an integration test or an E2E test

## Author's Checklist

How to test this PR locally

  1. Deploy a standalone Elastic-Agent with a configuration like the following.
outputs:
  default:
    type: elasticsearch
    hosts:
      - "${ELASTICSEARCH_HOST}"
    username: "${ELASTICSEARCH_USERNAME}"
    password: "${ELASTICSEARCH_PASSWORD}"

inputs:
  - type: filestream
    id: elastic-agent-input-id
    streams:
      - id: filestream-input-id-1
        data_stream:
          dataset: generic
        data_stream.namespace: "flattened-namespace"
        data_stream.type: "logs"
        paths:
          - /tmp/log.log
  1. Make sure some fields from data_stream.* are flattened. It is even better if there is a mix
  2. Look at the ingested data and make sure the fields correctly set.

Related issues

## Use cases
## Screenshots
## Logs

Questions to ask yourself

  • How are we going to support this in production?
  • How are we going to measure its adoption?
  • How are we going to debug this?
  • What are the metrics I should take care of?
  • ...

@belimawr belimawr added the Team:Elastic-Agent Label for the Agent team label Sep 22, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 22, 2023

This pull request does not have a backport label. Could you fix it @belimawr? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-11-08T07:30:24.032+0000

  • Duration: 13 min 7 sec

❕ Flaky test report

No test was executed to be analysed.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Sep 22, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.824% (84/85) 👍
Files 66.885% (204/305) 👍
Classes 65.95% (368/558) 👍
Methods 53.105% (1163/2190) 👍 0.064
Lines 39.43% (13708/34765) 👍 0.059
Conditionals 100.0% (0/0) 💚

@belimawr belimawr force-pushed the flattened-datastreams branch 2 times, most recently from 3f81294 to a91af36 Compare September 22, 2023 16:30
@belimawr belimawr force-pushed the flattened-datastreams branch 5 times, most recently from 07020fb to 2b2bcad Compare October 6, 2023 14:29
@belimawr belimawr force-pushed the flattened-datastreams branch 2 times, most recently from 9812846 to da565c3 Compare October 9, 2023 11:07
@belimawr belimawr marked this pull request as ready for review October 9, 2023 11:09
@belimawr belimawr requested a review from a team as a code owner October 9, 2023 11:09
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

info := define.Require(t, define.Requirements{
Local: false,
Stack: &define.Stack{
Version: version.Agent + "-SNAPSHOT",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI was failing because the stack was too old, so I "hardcoded" the version at least to make sure the tests are passing

@belimawr
Copy link
Contributor Author

belimawr commented Oct 9, 2023

/test

Copy link
Contributor

@faec faec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few nits for code clarity but looks good to me!

pkg/component/config.go Outdated Show resolved Hide resolved
pkg/component/config.go Outdated Show resolved Hide resolved
pkg/component/config.go Show resolved Hide resolved
pkg/component/config.go Outdated Show resolved Hide resolved
@belimawr belimawr requested a review from faec October 10, 2023 08:56
@belimawr
Copy link
Contributor Author

/test

@belimawr belimawr force-pushed the flattened-datastreams branch 2 times, most recently from 99f9579 to c743823 Compare October 27, 2023 11:19
@belimawr
Copy link
Contributor Author

/test

Copy link
Contributor

mergify bot commented Nov 1, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b flattened-datastreams upstream/flattened-datastreams
git merge upstream/main
git push upstream flattened-datastreams

@belimawr
Copy link
Contributor Author

belimawr commented Nov 6, 2023

fleet-ci/pr-merge is stuck 🤦‍♂️

belimawr and others added 9 commits November 6, 2023 19:46
An input configuration supports flattened fields, however the
'data_stream' field was not being correctly decoded when
flattened. This commit fixes this issue.

Some small additions and refactoring are also implemented in the
integration test framework as well as some more detailed
documentation.
Fix all failing tests
@belimawr
Copy link
Contributor Author

belimawr commented Nov 6, 2023

I rebased onto main to re-trigger the tests

Copy link
Member

@cmacknz cmacknz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should update the reference YAML to include an example of this style of configuration, I would recommend modifying the sample log input:

# # Collecting log files
# - type: filestream
# # Input ID allowing Elastic Agent to track the state of this input. Must be unique.
# id: your-input-id
# streams:
# # Stream ID for this data stream allowing Filebeat to track the state of the ingested files. Must be unique.
# # Each filestream data stream creates a separate instance of the Filebeat filestream input.
# - id: your-filestream-stream-id
# data_stream:
# dataset: generic
# paths:
# - /var/log/*.log

If the documentation isn't updated nobody is going to use this configuration style :)

@belimawr
Copy link
Contributor Author

belimawr commented Nov 7, 2023

You should update the reference YAML to include an example of this style of configuration, I would recommend modifying the sample log input:

If the documentation isn't updated nobody is going to use this configuration style :)

We already have examples since about 3 years ago 😱

data_stream.namespace: default
use_output: default
streams:
- metricsets:
- cpu
# Dataset name must conform to the naming conventions for Elasticsearch indices, cannot contain dashes (-), and cannot exceed 100 bytes
data_stream.dataset: system.cpu
- metricsets:
- memory
data_stream.dataset: system.memory
- metricsets:
- network
data_stream.dataset: system.network
- metricsets:
- filesystem
data_stream.dataset: system.filesystem

Anyways, I'll edit the filestream example as well.

Revert my last chances in the `elastic-agent.reference.yml` because
`mage check` does not like it.
@belimawr
Copy link
Contributor Author

belimawr commented Nov 8, 2023

@cmacknz mage check did not like my last changes in the configuration :/

Aside from the flattened examples I pointed out before we also have it shown in our documentation: https://github.com/elastic/ingest-docs/blob/bab384ba275f4ae34696cedbdc1221383b64e1c9/docs/en/ingest-management/elastic-agent/configuration/elastic-agent-configuration.asciidoc?plain=1#L37-L50

I believe we have a good mix of syntax example.

I confess I'm puzzled about mage check only editing the commented out section of the reference configuration 🤔

Copy link

@belimawr belimawr merged commit b0b8e85 into elastic:main Nov 8, 2023
12 checks passed
belimawr added a commit that referenced this pull request Nov 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-skip Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support flattened data_stream fields
6 participants