[GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data #3721

liujiayi771 · 2023-11-15T07:26:00Z

What changes were proposed in this pull request?

If there are inconsistencies in the order and type of intermediate data between Velox and Spark aggregation functions, special handling is required. We can introduce a VeloxIntermediateData object to handle these cases. For non-special cases, we can continue using the aggBufferAttributes of the aggregation function without any special matching. In future PRs, the remaining methods for handling such aggregation functions will be incorporated into this VeloxIntermediateData object.

In the applyExtractStruct function, a lot of code was written to match the intermediate data outputted by Velox with the column data in Spark's agg buffer. These code segments involved many index order adjustments, which made them difficult to read and understand why such ordering was necessary. For example (It is difficult to understand the significance of [1, 4, 5, 0, 2, 3]),

case _: Corr =>
  // Select count from Velox struct with count casted from LongType into DoubleType.
  expressionNodes.add(
    ExpressionBuilder
      .makeCast(
        ConverterUtils.getTypeNode(DoubleType, nullable = false),
        ExpressionBuilder.makeSelection(colIdx, 1),
        SQLConf.get.ansiEnabled))
  expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 4))
  expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 5))
  expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 0))
  expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 2))
  expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 3))

In this PR, all these code segments have been modified and improved.

How was this patch tested?

Exists CI

github-actions · 2023-11-15T07:26:18Z

Thanks for opening a pull request!

Could you open an issue for this pull request on Github Issues?

https://github.com/oap-project/gluten/issues

Then could you also rename commit message and pull request title in the following format?

[GLUTEN-${ISSUES_ID}][COMPONENT]feat/fix: ${detailed message}

See also:

Other pull requests

github-actions · 2023-11-15T10:48:34Z

Run Gluten Clickhouse CI

github-actions · 2023-11-15T15:57:11Z

Run Gluten Clickhouse CI

github-actions · 2023-11-16T02:43:13Z

Run Gluten Clickhouse CI

github-actions · 2023-11-16T02:50:26Z

Run Gluten Clickhouse CI

github-actions · 2023-11-16T03:33:17Z

Run Gluten Clickhouse CI

github-actions · 2023-11-16T03:41:29Z

#3719

liujiayi771 · 2023-11-16T06:06:21Z

@rui-mo @PHILO-HE Could you help review this PR?

github-actions · 2023-11-16T07:29:34Z

Run Gluten Clickhouse CI

github-actions · 2023-11-16T10:24:17Z

Run Gluten Clickhouse CI

rui-mo · 2023-11-21T05:58:53Z

backends-velox/src/main/scala/io/glutenproject/execution/HashAggregateExecTransformer.scala

-          expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 2))
-          expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 3))
-          expressionNodes.add(ExpressionBuilder.makeSelection(colIdx, 0))
+        case _ @VeloxIntermediateData.Type(veloxTypes: Seq[DataType]) =>


I'm confused here. Why can we match aggregateFunction against VeloxIntermediateData.Type?

It will use VeloxIntermediateData.Type.unapply method to extract veloxTypes from aggFunc. This is equivalent to val veloxTypes = VeloxIntermediateData.Type.unapply(aggFunc)

rui-mo · 2023-11-21T06:02:57Z

backends-velox/src/main/scala/io/glutenproject/execution/HashAggregateExecTransformer.scala

+          val (sparkOrders, sparkTypes) =
+            aggFunc.aggBufferAttributes.map(attr => (attr.name, attr.dataType)).unzip
+          val veloxOrders = VeloxIntermediateData.veloxIntermediateDataOrder(aggFunc)
+          val adjustedOrders = sparkOrders.map(veloxOrders.indexOf(_))


Is it enough to decide the order based on name equality? E.g., if attr.name contains suffix of exprId, would it fail to match with the string in veloxOrders?

I think the column names in aggBufferAttributes are fixed, it will not contains suffix of exprId. You can check the implementation of each agg. For example,

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala#L82-L83

https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Sum.scala#L84-L86

liujiayi771 · 2023-11-21T06:43:50Z

backends-velox/src/main/scala/io/glutenproject/execution/HashAggregateExecTransformer.scala

@@ -133,56 +123,29 @@ case class HashAggregateExecTransformer(
        case _ =>
          throw new UnsupportedOperationException(s"${expr.mode} not supported.")
      }
+      val aggFunc = expr.aggregateFunction
      expr.aggregateFunction match {


Use aggFunc defined in the previous line.

rui-mo

Thanks for providing these details. Will merge after internal CI passes.

PHILO-HE

Looks good!

GlutenPerfBot · 2023-11-21T09:04:48Z

===== Performance report for TPCH SF2000 with Velox backend, for reference only ====

query	log/native_3721_time.csv	log/native_master_11_20_2023_60fc2a02e_time.csv	difference	percentage
q1	33.26	34.37	1.114	103.35%
q2	24.72	24.70	-0.017	99.93%
q3	37.66	35.40	-2.258	94.00%
q4	36.77	36.40	-0.378	98.97%
q5	70.70	68.86	-1.838	97.40%
q6	6.80	7.01	0.209	103.07%
q7	83.25	85.39	2.142	102.57%
q8	86.21	87.44	1.224	101.42%
q9	124.48	124.93	0.452	100.36%
q10	44.95	46.83	1.877	104.18%
q11	19.35	19.99	0.643	103.32%
q12	24.96	24.57	-0.390	98.44%
q13	46.07	44.85	-1.221	97.35%
q14	15.31	18.77	3.466	122.65%
q15	27.70	27.13	-0.566	97.96%
q16	15.51	15.41	-0.096	99.38%
q17	100.30	102.21	1.913	101.91%
q18	145.89	149.83	3.943	102.70%
q19	13.02	13.48	0.460	103.54%
q20	27.35	28.10	0.756	102.76%
q21	221.79	220.09	-1.694	99.24%
q22	13.23	12.89	-0.341	97.42%
total	1219.25	1228.65	9.399	100.77%

liujiayi771 mentioned this pull request Nov 15, 2023

[VL] Optimize the handling of specific agg func in HashAggregateExecTransformer #3719

Closed

6 tasks

liujiayi771 force-pushed the vl-agg branch from 55ececc to d2bf5fa Compare November 16, 2023 02:42

Introduce VeloxIntermediateData to adjust type and order

b96919d

liujiayi771 force-pushed the vl-agg branch from d2bf5fa to b96919d Compare November 16, 2023 02:49

fix style

ac85135

liujiayi771 changed the title ~~[WIP][GLUTEN-3719][VL] Remove case match in getIntermediateTypeNode~~ [GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data Nov 16, 2023

liujiayi771 marked this pull request as ready for review November 16, 2023 06:10

Remove match case in applyExtractStruct

5c17363

Use spark order to extract struct

7fee141

rui-mo reviewed Nov 21, 2023

View reviewed changes

liujiayi771 commented Nov 21, 2023

View reviewed changes

rui-mo reviewed Nov 21, 2023

View reviewed changes

rui-mo approved these changes Nov 21, 2023

View reviewed changes

PHILO-HE approved these changes Nov 21, 2023

View reviewed changes

rui-mo merged commit f29077e into apache:main Nov 21, 2023
17 checks passed

liujiayi771 deleted the vl-agg branch November 21, 2023 09:27

liujiayi771 mentioned this pull request Nov 22, 2023

[GLUTEN-3719][VL] Optimize agg func match in getAggRelWithRowConstruct #3819

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data #3721

[GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data #3721

liujiayi771 commented Nov 15, 2023 •

edited

Loading

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

liujiayi771 commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

rui-mo Nov 21, 2023

liujiayi771 Nov 21, 2023

rui-mo Nov 21, 2023

liujiayi771 Nov 21, 2023 •

edited

Loading

liujiayi771 Nov 21, 2023

rui-mo left a comment

PHILO-HE left a comment

GlutenPerfBot commented Nov 21, 2023

[GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data #3721

[GLUTEN-3719][VL] Introduce VeloxIntermediateData to adjust velox agg func intermediate data #3721

Conversation

liujiayi771 commented Nov 15, 2023 • edited Loading

What changes were proposed in this pull request?

How was this patch tested?

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 15, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

liujiayi771 commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

github-actions bot commented Nov 16, 2023

rui-mo Nov 21, 2023

Choose a reason for hiding this comment

liujiayi771 Nov 21, 2023

Choose a reason for hiding this comment

rui-mo Nov 21, 2023

Choose a reason for hiding this comment

liujiayi771 Nov 21, 2023 • edited Loading

Choose a reason for hiding this comment

liujiayi771 Nov 21, 2023

Choose a reason for hiding this comment

rui-mo left a comment

Choose a reason for hiding this comment

PHILO-HE left a comment

Choose a reason for hiding this comment

GlutenPerfBot commented Nov 21, 2023

liujiayi771 commented Nov 15, 2023 •

edited

Loading

liujiayi771 Nov 21, 2023 •

edited

Loading