Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Enable spill-to-disk for partial aggregation #3697

Merged
merged 6 commits into from
Nov 18, 2023

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Nov 13, 2023

Rely on oap-project/velox#439 to add partial aggregation spill support.

Copy link

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

@zhztheplayer
Copy link
Member Author

The error in CI log

2023-11-13T12:02:49.3373701Z io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
2023-11-13T12:02:49.3375455Z Error Source: RUNTIME
2023-11-13T12:02:49.3376161Z Error Code: INVALID_STATE
2023-11-13T12:02:49.3376933Z Reason: Spiller has been finalized
2023-11-13T12:02:49.3377767Z Retriable: False
2023-11-13T12:02:49.3378412Z Expression: !finalized_
2023-11-13T12:02:49.3379091Z Function: spill
2023-11-13T12:02:49.3379758Z File: ../../velox/exec/Spiller.cpp
2023-11-13T12:02:49.3380561Z Line: 458
2023-11-13T12:02:49.3381104Z Stack trace:
2023-11-13T12:02:49.3381894Z # 0  _ZN8facebook5velox7process10StackTraceC1Ei
2023-11-13T12:02:49.3384118Z # 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
2023-11-13T12:02:49.3386736Z # 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
2023-11-13T12:02:49.3388918Z # 3  _ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE
2023-11-13T12:02:49.3390411Z # 4  _ZN8facebook5velox4exec11GroupingSet5spillEv
2023-11-13T12:02:49.3391687Z # 5  _ZN8facebook5velox4exec11GroupingSet11noMoreInputEv
2023-11-13T12:02:49.3393131Z # 6  _ZN8facebook5velox4exec15HashAggregation11noMoreInputEv
2023-11-13T12:02:49.3395252Z # 7  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
2023-11-13T12:02:49.3397544Z # 8  _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
2023-11-13T12:02:49.3399403Z # 9  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
2023-11-13T12:02:49.3400946Z # 10 _ZN6gluten24WholeStageResultIterator4nextEv
2023-11-13T12:02:49.3402421Z # 11 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
2023-11-13T12:02:49.3403696Z # 12 0x00007fcb473c5ef0
2023-11-13T12:02:49.3404155Z 
2023-11-13T12:02:49.3405117Z 	at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
2023-11-13T12:02:49.3407282Z 	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
2023-11-13T12:02:49.3409522Z 	at io.glutenproject.utils.InvocationFlowProtection.hasNext(Iterators.scala:137)
2023-11-13T12:02:49.3411628Z 	at io.glutenproject.utils.IteratorCompleter.hasNext(Iterators.scala:69)
2023-11-13T12:02:49.3413444Z 	at io.glutenproject.utils.PayloadCloser.hasNext(Iterators.scala:35)
2023-11-13T12:02:49.3415360Z 	at io.glutenproject.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
2023-11-13T12:02:49.3417498Z 	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
2023-11-13T12:02:49.3419357Z 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2023-11-13T12:02:49.3421466Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:116)
2023-11-13T12:02:49.3423949Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:217)
2023-11-13T12:02:49.3426298Z 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
2023-11-13T12:02:49.3428495Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
2023-11-13T12:02:49.3430522Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
2023-11-13T12:02:49.3432228Z 	at org.apache.spark.scheduler.Task.run(Task.scala:131)
2023-11-13T12:02:49.3433924Z 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
2023-11-13T12:02:49.3435876Z 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
2023-11-13T12:02:49.3437724Z 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
2023-11-13T12:02:49.3439720Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2023-11-13T12:02:49.3441901Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2023-11-13T12:02:49.3443436Z 	at java.lang.Thread.run(Thread.java:750)

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

1 similar comment
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer marked this pull request as ready for review November 17, 2023 06:08
@zhztheplayer
Copy link
Member Author

The error in CI log

2023-11-13T12:02:49.3373701Z io.glutenproject.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
2023-11-13T12:02:49.3375455Z Error Source: RUNTIME
2023-11-13T12:02:49.3376161Z Error Code: INVALID_STATE
2023-11-13T12:02:49.3376933Z Reason: Spiller has been finalized
2023-11-13T12:02:49.3377767Z Retriable: False
2023-11-13T12:02:49.3378412Z Expression: !finalized_
2023-11-13T12:02:49.3379091Z Function: spill
2023-11-13T12:02:49.3379758Z File: ../../velox/exec/Spiller.cpp
2023-11-13T12:02:49.3380561Z Line: 458
2023-11-13T12:02:49.3381104Z Stack trace:
2023-11-13T12:02:49.3381894Z # 0  _ZN8facebook5velox7process10StackTraceC1Ei
2023-11-13T12:02:49.3384118Z # 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
2023-11-13T12:02:49.3386736Z # 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
2023-11-13T12:02:49.3388918Z # 3  _ZN8facebook5velox4exec7Spiller5spillEPKNS1_20RowContainerIteratorE
2023-11-13T12:02:49.3390411Z # 4  _ZN8facebook5velox4exec11GroupingSet5spillEv
2023-11-13T12:02:49.3391687Z # 5  _ZN8facebook5velox4exec11GroupingSet11noMoreInputEv
2023-11-13T12:02:49.3393131Z # 6  _ZN8facebook5velox4exec15HashAggregation11noMoreInputEv
2023-11-13T12:02:49.3395252Z # 7  _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
2023-11-13T12:02:49.3397544Z # 8  _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
2023-11-13T12:02:49.3399403Z # 9  _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
2023-11-13T12:02:49.3400946Z # 10 _ZN6gluten24WholeStageResultIterator4nextEv
2023-11-13T12:02:49.3402421Z # 11 Java_io_glutenproject_vectorized_ColumnarBatchOutIterator_nativeHasNext
2023-11-13T12:02:49.3403696Z # 12 0x00007fcb473c5ef0
2023-11-13T12:02:49.3404155Z 
2023-11-13T12:02:49.3405117Z 	at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
2023-11-13T12:02:49.3407282Z 	at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
2023-11-13T12:02:49.3409522Z 	at io.glutenproject.utils.InvocationFlowProtection.hasNext(Iterators.scala:137)
2023-11-13T12:02:49.3411628Z 	at io.glutenproject.utils.IteratorCompleter.hasNext(Iterators.scala:69)
2023-11-13T12:02:49.3413444Z 	at io.glutenproject.utils.PayloadCloser.hasNext(Iterators.scala:35)
2023-11-13T12:02:49.3415360Z 	at io.glutenproject.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
2023-11-13T12:02:49.3417498Z 	at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
2023-11-13T12:02:49.3419357Z 	at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
2023-11-13T12:02:49.3421466Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.internalWrite(ColumnarShuffleWriter.scala:116)
2023-11-13T12:02:49.3423949Z 	at org.apache.spark.shuffle.ColumnarShuffleWriter.write(ColumnarShuffleWriter.scala:217)
2023-11-13T12:02:49.3426298Z 	at org.apache.spark.shuffle.ShuffleWriteProcessor.write(ShuffleWriteProcessor.scala:59)
2023-11-13T12:02:49.3428495Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:99)
2023-11-13T12:02:49.3430522Z 	at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:52)
2023-11-13T12:02:49.3432228Z 	at org.apache.spark.scheduler.Task.run(Task.scala:131)
2023-11-13T12:02:49.3433924Z 	at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
2023-11-13T12:02:49.3435876Z 	at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
2023-11-13T12:02:49.3437724Z 	at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
2023-11-13T12:02:49.3439720Z 	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
2023-11-13T12:02:49.3441901Z 	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
2023-11-13T12:02:49.3443436Z 	at java.lang.Thread.run(Thread.java:750)

Fixed.

Copy link

Run Gluten Clickhouse CI

Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer merged commit 5f5d18a into apache:main Nov 18, 2023
17 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3697_time.csv log/native_master_11_17_2023_bb88f4f1a_time.csv difference percentage
q1 33.65 34.06 0.418 101.24%
q2 22.75 24.52 1.771 107.79%
q3 37.10 37.38 0.278 100.75%
q4 36.49 38.12 1.630 104.47%
q5 70.80 71.52 0.720 101.02%
q6 7.14 6.66 -0.484 93.22%
q7 84.28 84.33 0.044 100.05%
q8 85.99 87.40 1.412 101.64%
q9 120.07 121.92 1.846 101.54%
q10 45.23 46.88 1.648 103.64%
q11 20.83 19.99 -0.840 95.97%
q12 24.51 23.85 -0.667 97.28%
q13 46.29 45.59 -0.709 98.47%
q14 16.45 16.53 0.082 100.50%
q15 28.92 28.51 -0.415 98.56%
q16 15.97 16.19 0.220 101.38%
q17 102.72 100.12 -2.598 97.47%
q18 149.80 147.11 -2.694 98.20%
q19 13.07 13.01 -0.061 99.53%
q20 29.04 27.52 -1.516 94.78%
q21 220.29 223.26 2.975 101.35%
q22 12.88 12.99 0.106 100.82%
total 1224.28 1227.45 3.165 100.26%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_11_18_2023_time.csv log/native_master_11_17_2023_bb88f4f1a_time.csv difference percentage
q1 34.26 34.06 -0.192 99.44%
q2 24.90 24.52 -0.379 98.48%
q3 37.65 37.38 -0.265 99.30%
q4 37.76 38.12 0.360 100.95%
q5 70.20 71.52 1.317 101.88%
q6 7.22 6.66 -0.565 92.18%
q7 82.98 84.33 1.345 101.62%
q8 83.97 87.40 3.430 104.08%
q9 119.68 121.92 2.234 101.87%
q10 47.20 46.88 -0.322 99.32%
q11 19.89 19.99 0.105 100.53%
q12 24.70 23.85 -0.852 96.55%
q13 45.67 45.59 -0.085 99.81%
q14 18.07 16.53 -1.540 91.48%
q15 27.13 28.51 1.377 105.07%
q16 15.20 16.19 0.991 106.52%
q17 102.37 100.12 -2.255 97.80%
q18 147.89 147.11 -0.778 99.47%
q19 13.09 13.01 -0.074 99.44%
q20 27.75 27.52 -0.230 99.17%
q21 220.69 223.26 2.573 101.17%
q22 12.94 12.99 0.045 100.35%
total 1221.20 1227.45 6.242 100.51%

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_master_11_19_2023_time.csv log/native_master_11_18_2023_5f5d18abe_time.csv difference percentage
q1 34.50 34.26 -0.246 99.29%
q2 24.57 24.90 0.329 101.34%
q3 37.14 37.65 0.507 101.37%
q4 36.74 37.76 1.017 102.77%
q5 70.93 70.20 -0.723 98.98%
q6 7.09 7.22 0.134 101.89%
q7 83.62 82.98 -0.634 99.24%
q8 85.87 83.97 -1.895 97.79%
q9 126.22 119.68 -6.532 94.82%
q10 46.75 47.20 0.444 100.95%
q11 19.85 19.89 0.041 100.21%
q12 25.57 24.70 -0.873 96.59%
q13 45.26 45.67 0.408 100.90%
q14 18.39 18.07 -0.322 98.25%
q15 28.36 27.13 -1.227 95.67%
q16 15.85 15.20 -0.648 95.91%
q17 100.27 102.37 2.099 102.09%
q18 147.67 147.89 0.219 100.15%
q19 13.00 13.09 0.090 100.70%
q20 26.75 27.75 1.004 103.75%
q21 222.79 220.69 -2.105 99.06%
q22 13.20 12.94 -0.253 98.08%
total 1230.37 1221.20 -9.165 99.26%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants