Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate proc optimizations #4824

Merged

Conversation

kkondaka
Copy link
Collaborator

Description

Translate processor is taking a lot of cpu cycles while doing event.toMap() when it is not necessary in some cases.
This change optimizes it by refactoring the code.

Issues Resolved

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • [X ] Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

kkondaka and others added 3 commits August 12, 2024 06:27
Signed-off-by: Krishna Kondaka <[email protected]>
@@ -107,14 +106,15 @@ private void translateSource(Object sourceObject, Event recordEvent, TargetsPara
}

String rootField = jsonExtractor.getRootField(commonPath);
if(!recordObject.containsKey(rootField)){
Object rootFieldObject = recordEvent.get(rootField, Object.class);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this faster? If so, can we optimize containsKey instead of making this change?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Biggest performance issue is calling toMap(). I wanted to avoid calling until it is actually needed. Worked with a customer provided 50000 line yaml and this change seems to be helping.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is change to get doing anything to help with performance? If not, I'd suggest reverting to containsKey. If so, let's fix containsKey in JacksonEvent and then revert to use containsKey here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable the profiler showed that a lot of time was spent in event.toMap() Not sure how fixing containsKey would help here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Data Prepper Event has a containsKey() method.

if(!recordEvent.containsKey(rootField)) {
...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh Ok. So, you are referring to line 110 and not 92. I did not try with containsKey. I can try that and get back to you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dlvenable looking at the code, both get() and containsKey() code is almost same. Definitely using containsKey would help even more.

Signed-off-by: Krishna Kondaka <[email protected]>
@kkondaka kkondaka merged commit 72e2db7 into opensearch-project:main Aug 15, 2024
45 of 47 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants