[VL] Doc refresh (#3882)

* update configurations Signed-off-by: Yuan Zhou <[email protected]> * update operators/functions Signed-off-by: Yuan Zhou <[email protected]> * fix maxBatchSize doc Signed-off-by: Yuan Zhou <[email protected]> * fix operator support status Signed-off-by: Yuan Zhou <[email protected]> --------- Signed-off-by: Yuan Zhou <[email protected]>
apache · Nov 30, 2023 · 43fdaea · 43fdaea
1 parent ed19a36
commit 43fdaea
Show file tree

Hide file tree

Showing 2 changed files with 96 additions and 38 deletions.
diff --git a/docs/Configuration.md b/docs/Configuration.md
@@ -20,20 +20,23 @@ You can add these configurations into spark-defaults.conf to enable or disable t
 | spark.plugins                                           | To load Gluten's components by Spark's plug-in loader                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | com.intel.oap.GlutenPlugin                           |
 | spark.shuffle.manager                                   | To turn on Gluten Columnar Shuffle Plugin                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 | org.apache.spark.shuffle.sort.ColumnarShuffleManager |
 | spark.gluten.enabled                                    | Enable Gluten, default is true. Just an experimental property. Recommend to enable/disable Gluten through the setting for `spark.plugins`.                                                                                                                                                                                                                                                                                                                                                                                                | true                                                 |
+| spark.gluten.sql.columnar.maxBatchSize                  | Number of rows to be processed in each batch. Default value is 4096.                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | 4096                                                 |
 | spark.gluten.memory.isolation                           | (Experimental) Enable isolated memory mode. If true, Gluten controls the maximum off-heap memory can be used by each task to X, X = executor memory / max task slots. It's recommended to set true if Gluten serves concurrent queries within a single session, since not all memory Gluten allocated is guaranteed to be spillable. In the case, the feature should be enabled to avoid OOM. Note when true, setting spark.memory.storageFraction to a lower value is suggested since storage memory is considered non-usable by Gluten. | false                                                |
 | spark.gluten.sql.columnar.scanOnly                      | When enabled, this config will overwrite all other operators' enabling, and only Scan and Filter pushdown will be offloaded to native.                                                                                                                                                                                                                                                                                                                                                                                                    | false                                                |
 | spark.gluten.sql.columnar.batchscan                     | Enable or Disable Columnar BatchScan, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | true                                                 |
 | spark.gluten.sql.columnar.hashagg                       | Enable or Disable Columnar Hash Aggregate, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | true                                                 |
 | spark.gluten.sql.columnar.project                       | Enable or Disable Columnar Project, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | true                                                 |
 | spark.gluten.sql.columnar.filter                        | Enable or Disable Columnar Filter, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | true                                                 |
-| spark.gluten.sql.columnar.codegen.sort                  | Enable or Disable Columnar Sort, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | true                                                 |
+| spark.gluten.sql.columnar.sort                          | Enable or Disable Columnar Sort, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | true                                                 |
 | spark.gluten.sql.columnar.window                        | Enable or Disable Columnar Window, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | true                                                 |
 | spark.gluten.sql.columnar.shuffledHashJoin              | Enable or Disable ShuffledHashJoin, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       | true                                                 |
 | spark.gluten.sql.columnar.forceShuffledHashJoin         | Force to use ShuffledHashJoin over SortMergeJoin, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | true                                                 |
-| spark.gluten.sql.columnar.sort                          | Enable or Disable Columnar Sort, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | true                                                 |
 | spark.gluten.sql.columnar.sortMergeJoin                 | Enable or Disable Columnar Sort Merge Join, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | true                                                 |
 | spark.gluten.sql.columnar.union                         | Enable or Disable Columnar Union, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | true                                                 |
 | spark.gluten.sql.columnar.expand                        | Enable or Disable Columnar Expand, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        | true                                                 |
+| spark.gluten.sql.columnar.generate                      | Enable or Disable Columnar Generate, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      | true                                                 |
+| spark.gluten.sql.columnar.limit                         | Enable or Disable Columnar Limit, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | true                                                 |
+| spark.gluten.sql.columnar.tableCache                    | Enable or Disable Columnar Table Cache, default is false                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  | true                                                 |
 | spark.gluten.sql.columnar.broadcastExchange             | Enable or Disable Columnar Broadcast Exchange, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                            | true                                                 |
 | spark.gluten.sql.columnar.broadcastJoin                 | Enable or Disable Columnar BroadcastHashJoin, default is true                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | true                                                 |
 | spark.gluten.sql.columnar.shuffle.codec                 | Set up the codec to be used for Columnar Shuffle. If this configuration is not set, will check the value of spark.io.compression.codec. By default, Gluten use software compression. Valid options for software compression are lz4, zstd. Valid options for QAT and IAA is gzip.                                                                                                                                                                                                                                                         | lz4                                                  |
@@ -55,12 +58,11 @@ You can add these configurations into spark-defaults.conf to enable or disable t
 | spark.gluten.sql.columnar.backend.velox.bloomFilter.numBits          | The default number of bits to use for the velox bloom filter. | 8388608L                                                |
 | spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits       | The max number of bits to use for the velox bloom filter. | 4194304L                                                |
 
-Below is an example for spark-default.conf, if you are using conda to install OAP project.
+Below is an example for spark-default.conf:
 
 ```
 ##### Columnar Process Configuration
 
-spark.sql.sources.useV1SourceList    avro
 spark.plugins    io.glutenproject.GlutenPlugin
 spark.shuffle.manager    org.apache.spark.shuffle.sort.ColumnarShuffleManager
 spark.driver.extraClassPath    ${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar