Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Fix failed to get iterator exception #3641

Merged
merged 1 commit into from
Nov 8, 2023
Merged

Conversation

JkSelf
Copy link
Contributor

@JkSelf JkSelf commented Nov 7, 2023

What changes were proposed in this pull request?

The native Parquet writer may call hasNext more than once. However, in the current code, when the first hasNext call returns false, the resources are released. As a result, when the second hasNext call is made, it throws the following exception. This PR adds protection to allow multiple hasNext calls on the Scala side.

o.glutenproject.exception.GlutenException: java.lang.RuntimeException: failed to get batch iterator
        at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
        at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
        at io.glutenproject.utils.IteratorCompleter.hasNext(Iterators.scala:68)
        at io.glutenproject.utils.PayloadCloser.hasNext(Iterators.scala:34)
        at io.glutenproject.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:97)
        at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
        at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
        at org.apache.spark.sql.execution.datasources.FileFormatDataWriter.writeWithIterator(FileFormatDataWriter.scala:102)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:414)
        at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1525)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:423)
        at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$17(FileFormatWriter.scala:331)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:506)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1491)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:509)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)
Caused by: java.lang.RuntimeException: failed to get batch iterator
        at io.glutenproject.vectorized.ColumnarBatchOutIterator.nativeHasNext(Native Method)
        at io.glutenproject.vectorized.ColumnarBatchOutIterator.hasNextInternal(ColumnarBatchOutIterator.java:65)
        at io.glutenproject.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37)
        ... 19 more

How was this patch tested?

Local test

@apache apache deleted a comment from github-actions bot Nov 8, 2023
@marin-ma
Copy link
Contributor

marin-ma commented Nov 8, 2023

Verified. Thanks!

Copy link
Member

@zhztheplayer zhztheplayer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@zhztheplayer zhztheplayer merged commit b60fe75 into apache:main Nov 8, 2023
15 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3641_time.csv log/native_master_11_07_2023_e3eff1d8f_time.csv difference percentage
q1 33.93 34.38 0.451 101.33%
q2 25.83 25.03 -0.807 96.88%
q3 38.34 38.14 -0.200 99.48%
q4 37.47 37.57 0.099 100.26%
q5 71.74 71.50 -0.240 99.67%
q6 7.91 6.26 -1.648 79.16%
q7 85.08 82.22 -2.854 96.65%
q8 85.78 86.95 1.168 101.36%
q9 123.06 119.81 -3.253 97.36%
q10 52.58 51.26 -1.313 97.50%
q11 19.99 19.73 -0.260 98.70%
q12 27.15 24.39 -2.761 89.83%
q13 49.85 50.30 0.441 100.88%
q14 16.93 17.67 0.740 104.37%
q15 31.81 30.35 -1.452 95.44%
q16 16.32 16.20 -0.121 99.26%
q17 102.71 101.51 -1.200 98.83%
q18 146.43 148.26 1.828 101.25%
q19 16.19 16.17 -0.014 99.91%
q20 29.87 30.31 0.444 101.49%
q21 225.16 224.88 -0.280 99.88%
q22 13.60 14.08 0.488 103.59%
total 1257.72 1246.98 -10.743 99.15%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants