Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Register Spark tokenizer #3810

Merged
merged 1 commit into from
Nov 24, 2023
Merged

[VL] Register Spark tokenizer #3810

merged 1 commit into from
Nov 24, 2023

Conversation

rui-mo
Copy link
Contributor

@rui-mo rui-mo commented Nov 22, 2023

What changes were proposed in this pull request?

To fix affected unit tests in https://github.com/oap-project/velox, register Spark tokenizer instead of changing Velox one.

Depends on: oap-project/velox#444.

How was this patch tested?

under test

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

zhli1142015
zhli1142015 previously approved these changes Nov 22, 2023
Copy link
Contributor

@zhli1142015 zhli1142015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


private:
const std::string path_;
State state;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: state_ ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed. Thanks.

@rui-mo rui-mo merged commit 4f63ddc into apache:main Nov 24, 2023
6 of 15 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3810_time.csv log/native_master_11_23_2023_df3eb372c_time.csv difference percentage
q1 34.69 34.09 -0.599 98.27%
q2 24.88 24.43 -0.453 98.18%
q3 37.69 37.11 -0.577 98.47%
q4 37.42 35.49 -1.932 94.84%
q5 70.32 70.74 0.425 100.60%
q6 7.11 6.77 -0.337 95.25%
q7 83.59 85.18 1.593 101.91%
q8 86.77 84.73 -2.038 97.65%
q9 125.47 118.73 -6.734 94.63%
q10 45.41 43.59 -1.816 96.00%
q11 19.27 19.61 0.344 101.79%
q12 25.41 23.77 -1.641 93.54%
q13 46.19 45.78 -0.412 99.11%
q14 16.59 17.52 0.929 105.60%
q15 28.46 28.14 -0.327 98.85%
q16 15.78 15.14 -0.636 95.97%
q17 102.98 100.38 -2.593 97.48%
q18 147.63 147.08 -0.559 99.62%
q19 12.94 12.95 0.018 100.14%
q20 27.95 28.07 0.120 100.43%
q21 218.99 218.99 0.008 100.00%
q22 12.87 12.93 0.063 100.49%
total 1228.38 1211.23 -17.152 98.60%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants