Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VL] Iterator's and its payloads' lifecycle improvements #3526

Merged
merged 14 commits into from
Oct 30, 2023

Conversation

zhztheplayer
Copy link
Member

@zhztheplayer zhztheplayer commented Oct 25, 2023

One of the purpose of the patch is to close plan iterators (so Velox's task gets stopped) before stopping shuffle writer.

Will do some cleanups against code.

Changes listed as following:

  1. Add utility method Iterators#wrap for miscellaneous operations on iterator, e.g. handle iterator and payload's closing.
  2. Remove CloseableColumnarBatchIterator, use utility method #recyclePayload.
  3. Remove CloseablePairedColumnarBatchIterator, use utility method #recyclePayload.
  4. Refactor code to use utility method #recycleIterator. The method would take early-close into account. Which means, once iterator's hasNext started to return false, iterator will be closed. Previously for Velox's out iterator we only close it when task ends.
  5. Add API NativeMemoryManager#hold() to make memory manager hold references of all created memory pools to avoid them from becoming inaccessible after the task gets early-closed.

This will also fix some metrics issues when SQL limit clause is used since we rely on #hasNext result to collect iterator metrics.

@zhztheplayer zhztheplayer changed the title WIP: [VL] Iterator lifecycle enhancements WIP: [VL] Iterator lifecycle improvements Oct 25, 2023
@github-actions
Copy link

Run Gluten Clickhouse CI

3 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@apache apache deleted a comment from github-actions bot Oct 25, 2023
@zhztheplayer zhztheplayer changed the title WIP: [VL] Iterator lifecycle improvements [VL] Iterator lifecycle improvements Oct 25, 2023
@github-actions
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer marked this pull request as ready for review October 25, 2023 07:37
@github-actions
Copy link

Run Gluten Clickhouse CI

5 similar comments
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI


override def next(): A = {
val a: A = in.next()
closer.synchronized {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused why do we need synchronized here. Does it mean hasNext and next can be called at the same time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it mean hasNext and next can be called at the same time?

This case is unusual... However sometime another case might be possible that the iterator was read asynchronously by another thread (say, Velox sometime hand over it's sub-tasks to background thread), but the lambda passed into TaskResources.addRecycler will be called from the original Spark task thread.

BTW I didn't have all the cases tested but rather programed defensively... It will be hard to debug if we actually bumped into threading issues caused by lack of locking here.

@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@zhztheplayer
Copy link
Member Author

@zzcclp

This impacts CH backend's execution either. Would you please help to review? Thanks.

@zhztheplayer
Copy link
Member Author

/Benchmark Velox

class WrapperBuilder[A](in: Iterator[A]) { // FIXME how to make the ctor companion-private?
private var wrapped: Iterator[A] = in

def recyclePayload(closeCallback: (A) => Unit): WrapperBuilder[A] = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you @zhztheplayer for the refactor. Can you explain a bit more why naming a new word payload instead of ColumnarBatch in java ? Do you mean there is something else besides ColumnarBatch will call this method ? I'm wondering if we can make a recycleColumnarBatch method to make it clear.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Iterator[A], A should be type of "payloads" of the iterator. It's also possible that A is not ColumnarBatch, a typical case is C2R that creates a row iterator as output.

Does payload sound very confusing here? element could be another choice but I felt it probably sounds too general for Gluten's use case.

closed = true
}
hasNext
batches.hasNext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems we didi not call recyclePayload for c2r after refactor since the iterator is row-based. Is there a leak ?

Copy link
Member Author

@zhztheplayer zhztheplayer Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't change any code about payload closing. The deleted lines were for iterator completion so moved torecycleIterator.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is closed by itself, then it seems c2r did not call recyclePayload ?

Copy link
Member Author

@zhztheplayer zhztheplayer Oct 30, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see it is closed by itself, then it seems c2r did not call recyclePayload ?

Yes as you see the batches are closed manually within some if-else conditions. I didn't how much effort needed to refactor the usages within #recyclePayload() so didn't change that part of code in this patch. Probably we can do that in a separate ticket, I am not sure.

@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3526_time.csv log/native_master_10_29_2023_a7b68c3dc_time.csv difference percentage
q1 34.70 34.59 -0.109 99.69%
q2 24.51 24.87 0.361 101.47%
q3 39.78 39.78 0.007 100.02%
q4 35.82 36.90 1.077 103.01%
q5 71.47 71.92 0.451 100.63%
q6 8.73 8.87 0.142 101.62%
q7 85.76 88.07 2.315 102.70%
q8 87.67 87.12 -0.545 99.38%
q9 122.32 120.37 -1.947 98.41%
q10 54.60 52.53 -2.068 96.21%
q11 20.00 19.81 -0.186 99.07%
q12 28.59 27.05 -1.544 94.60%
q13 48.30 49.69 1.389 102.88%
q14 21.23 19.38 -1.853 91.27%
q15 34.87 34.35 -0.527 98.49%
q16 16.01 16.06 0.046 100.29%
q17 101.79 102.20 0.412 100.40%
q18 146.84 146.79 -0.048 99.97%
q19 16.81 16.60 -0.211 98.75%
q20 31.21 30.11 -1.096 96.49%
q21 225.65 224.31 -1.339 99.41%
q22 13.26 13.14 -0.117 99.12%
total 1269.93 1264.54 -5.390 99.58%

@github-actions
Copy link

Run Gluten Clickhouse CI

@@ -230,11 +230,12 @@ class CHIteratorApi extends IteratorApi with Logging with LogLevelUtil {
/**
* Generate closeable ColumnBatch iterator.
*
* FIXME: This no longer overrides parent's method
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep it in the CHIteratorApi

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggested change? I only removed override modifier. But not sure where to put this method yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggested change? I only removed override modifier. But not sure where to put this method yet.

This change is OK to me.

@github-actions
Copy link

Run Gluten Clickhouse CI

@zhztheplayer zhztheplayer changed the title [VL] Iterator lifecycle improvements [VL] Iterator's and its payloads' lifecycle improvements Oct 30, 2023
@zzcclp
Copy link
Contributor

zzcclp commented Oct 30, 2023

Run Gluten Clickhouse CI

1 similar comment
@github-actions
Copy link

Run Gluten Clickhouse CI

@github-actions
Copy link

Run Gluten Clickhouse CI

@marin-ma
Copy link
Contributor

LGTM. Thanks!

@zhztheplayer zhztheplayer merged commit 39a5162 into apache:main Oct 30, 2023
15 checks passed
@GlutenPerfBot
Copy link
Contributor

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query log/native_3526_time.csv log/native_master_10_29_2023_a7b68c3dc_time.csv difference percentage
q1 35.25 34.59 -0.661 98.12%
q2 24.99 24.87 -0.122 99.51%
q3 41.09 39.78 -1.306 96.82%
q4 37.37 36.90 -0.472 98.74%
q5 71.10 71.92 0.823 101.16%
q6 8.90 8.87 -0.034 99.62%
q7 88.14 88.07 -0.062 99.93%
q8 87.04 87.12 0.084 100.10%
q9 121.56 120.37 -1.188 99.02%
q10 54.76 52.53 -2.225 95.94%
q11 19.72 19.81 0.097 100.49%
q12 26.80 27.05 0.246 100.92%
q13 49.34 49.69 0.357 100.72%
q14 18.12 19.38 1.261 106.96%
q15 34.38 34.35 -0.029 99.91%
q16 16.21 16.06 -0.152 99.06%
q17 102.14 102.20 0.064 100.06%
q18 149.08 146.79 -2.285 98.47%
q19 16.70 16.60 -0.100 99.40%
q20 31.09 30.11 -0.974 96.87%
q21 221.48 224.31 2.837 101.28%
q22 13.37 13.14 -0.228 98.30%
total 1268.60 1264.54 -4.068 99.68%

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants