Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LogPatternTool #413

Merged
merged 6 commits into from
Sep 29, 2024
Merged

Conversation

qianheng-aws
Copy link
Contributor

@qianheng-aws qianheng-aws commented Sep 26, 2024

Description

Add LogPatternTool to support generate log patterns based on an input DSL.

Background of log pattern

Log pattern is a feature in Observability. It aims to find the common feature among retrieved logs.

The implementation of the log pattern feature in Observability includes 3 steps:

  1. execute a PPL to retrieve the first log
  2. set default pattern field, which is implemented in front-end. The default pattern field is found by scanning over the values of the first sample log. The field with the longest length of value will be the default pattern field. (Code Ref; For alert analysis, it seems meaningless to find pattern on the field with the longest length of value.)
  3. execute a PPL to retrieve data and fill in log pattern table. Assuming field message is the pattern field, an example of such an PPL is:
    source=opensearch_dashboards_sample_data_flights | where timestamp >= ’[2024-09-09 16](tel:2024090916):16:30.321000' and timestamp <= '2024-09-10 07:16:30.322000' | patterns Dest | stats count(), take(Dest, 1) by patterns_field
    By default, the operator pattern in PPL will generate a new field named pattern_field which will ignore characters conatined by [a-zA-Z\d] in the original value(Code Ref; For alert analysis, it also seems less helpful to analysis on such pattern).

However, if customers use DSL to retrieve logs, it doesn't support such feature. Thus we need this tool to achieve this goal.

Implementation of LogPatternTool

This tool supports generating log patterns on the input dsl and index. It's implemented by
several steps:

  1. Retrival [[${DOC_SIZE_FIELD}]] logs from index
  2. Extract patterns for each retrieved log
    2.1 Find Pattern Field: If users provide parameter [[${PATTERN_FIELD}]], use it as the pattern
    field; Otherwise, find the string field with the longest length on the first log.
    2.2 Extract Pattern: If users provide parameter [[${PATTERN}]], compile it as a pattern;
    Otherwise, use [[${DEFAULT_IGNORED_CHARS}]]. It will remove all chars matching the pattern.
  3. Group logs by their extracted patterns.
  4. Find top N patterns with the largest sample log size.
  5. For each found top N patterns, return [[${SAMPLE_LOG_SIZE}]] sample logs.

Example

POST _plugins/_ml/agents/V4pYKJIBf_90oTl_gmnp/_execute
{
    "parameters": {
       "selected_tools": ["LogPatternTool"],
       "top_n_pattern": "3",
       "sample_log_size": "20",
       "doc_size": 100,
        "input":  """
{"query":{"bool":{"filter":[{"range":{"timestamp":{"from":"1727172421713||-1h","to":"1727172421713","include_lower":true,"include_upper":true,"boost":1}}},{"term":{"response":{"value":"200","boost":1}}}],"adjust_pure_negative":true,"boost":1}}}
        """
    }
}

image

Related Issues

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

if (hits != null && hits.length > 0) {
String patternField = parameters.containsKey(PATTERN_FIELD)
? parameters.get(PATTERN_FIELD)
: findLongestField(hits[0].getSourceAsMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we stop if longestField is fewer than maybe 30 chars or continue to search for 10 more records in case the first one has outlier data?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It just imitates the logic of Discover log pattern.

It sounds good to avoid outliner, but it could also make the logic more complex and inconsistent with Discover.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The above suggestion is not ideal, just want to trigger some discussion. It's ok to leave it as it is if we don't have better idea.

Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can implement the ut and it after resolving my comments

Comment on lines 83 to 88
super(client, xContentRegistry, null, null, docSize);
this.topNPattern = topNPattern;
this.sampleLogSize = sampleLogSize;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are docSize, topNPattern and sampleLogSize tool parameters or run time parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

run time parameters I think. But I think users can set them as well when creating this tool, and it should support override them in runtime.

Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not see the PPL, but it makes sense to use DSL for search. Would you like to check whether the code can be reused with search index tool? In search index tool, it has specialized operations on connector, model and model_group index.

Comment on lines +167 to +179
char[] chars = rawString.toCharArray();
int pos = 0;
for (int i = 0; i < chars.length; i++) {
if (!DEFAULT_IGNORED_CHARS.contains(chars[i])) {
chars[pos++] = chars[i];
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For better readability, please try regex: [a-zA-Z0-9]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this version, I think we just need to copy the code from sql plugin without doing too much change. So developer can better track the source and diff.

What you suggested could be an enhancement in a separate PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

if (hits != null && hits.length > 0) {
String patternField = parameters.containsKey(PATTERN_FIELD)
? parameters.get(PATTERN_FIELD)
: findLongestField(hits[0].getSourceAsMap());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the longest field is missing in hits[0] ? Shall we iterate though all the hits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's possible. And also, in some cases, the longest field in the first log maybe not the most longest overall.

But I think firstly we just imitate what current log pattern feature do, and do enhancement later. Better to discuss with Observability team as well. You can check the current implementation here Code Ref

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add one more check to see whether parseField is one of the columns inside the index? Sometimes customer could type wrong parse field name, in the PPL, it will raise an error to show the parse field doesn't exist. We could do the same logic.

@qianheng-aws
Copy link
Contributor Author

Did not see the PPL, but it makes sense to use DSL for search. Would you like to check whether the code can be reused with search index tool? In search index tool, it has specialized operations on connector, model and model_group index.

LogPatternTool aims to generate log pattern for DSL. PPL already supports pattern syntax itself.

Copy link
Member

@yuye-aws yuye-aws left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job @qianheng-aws ! You can go on with ut and it after resolving these two mistakes.

Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
Signed-off-by: Heng Qian <[email protected]>
@zane-neo
Copy link
Collaborator

It's better to add ITs to cover the code, is there any concern on adding ITs?

Copy link

codecov bot commented Sep 29, 2024

Codecov Report

Attention: Patch coverage is 40.98361% with 72 lines in your changes missing coverage. Please review.

Project coverage is 77.75%. Comparing base (2b76e3c) to head (6b71d21).
Report is 24 commits behind head on main.

Files with missing lines Patch % Lines
...ava/org/opensearch/agent/tools/LogPatternTool.java 41.52% 62 Missing and 7 partials ⚠️
src/main/java/org/opensearch/agent/ToolPlugin.java 0.00% 3 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main     #413      +/-   ##
============================================
- Coverage     81.78%   77.75%   -4.04%     
- Complexity      193      260      +67     
============================================
  Files            11       15       +4     
  Lines           961     1380     +419     
  Branches        137      196      +59     
============================================
+ Hits            786     1073     +287     
- Misses          121      231     +110     
- Partials         54       76      +22     
Flag Coverage Δ
77.75% <40.98%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@qianheng-aws
Copy link
Contributor Author

It's better to add ITs to cover the code, is there any concern on adding ITs?

I'm going to add one for this PR.

@zane-neo zane-neo merged commit 7d72612 into opensearch-project:main Sep 29, 2024
8 of 10 checks passed
@opensearch-trigger-bot
Copy link
Contributor

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/skills/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/skills/backport-2.x
# Create a new branch
git switch --create backport/backport-413-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 7d726125bc9704d5a1fa01831f44c7c0d7b4aab5
# Push it to GitHub
git push --set-upstream origin backport/backport-413-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/skills/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-413-to-2.x.

@yuye-aws
Copy link
Member

yuye-aws commented Sep 29, 2024

@qianheng-aws Please raise another PR with UT and IT.

qianheng-aws added a commit to qianheng-aws/skills that referenced this pull request Sep 29, 2024
* ut fix: construct update in Monitor

Signed-off-by: Heng Qian <[email protected]>

* Add LogPatternTool

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Add function doc

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

* Address comments

Signed-off-by: Heng Qian <[email protected]>

---------

Signed-off-by: Heng Qian <[email protected]>
(cherry picked from commit 7d72612)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants