Skip to content

Commit

Permalink
[VL] Doc refresh (#3882)
Browse files Browse the repository at this point in the history
* update configurations

Signed-off-by: Yuan Zhou <[email protected]>

* update operators/functions

Signed-off-by: Yuan Zhou <[email protected]>

* fix maxBatchSize doc

Signed-off-by: Yuan Zhou <[email protected]>

* fix operator support status

Signed-off-by: Yuan Zhou <[email protected]>

---------

Signed-off-by: Yuan Zhou <[email protected]>
  • Loading branch information
zhouyuan authored Nov 30, 2023
1 parent ed19a36 commit 43fdaea
Show file tree
Hide file tree
Showing 2 changed files with 96 additions and 38 deletions.
10 changes: 6 additions & 4 deletions docs/Configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,20 +20,23 @@ You can add these configurations into spark-defaults.conf to enable or disable t
| spark.plugins | To load Gluten's components by Spark's plug-in loader | com.intel.oap.GlutenPlugin |
| spark.shuffle.manager | To turn on Gluten Columnar Shuffle Plugin | org.apache.spark.shuffle.sort.ColumnarShuffleManager |
| spark.gluten.enabled | Enable Gluten, default is true. Just an experimental property. Recommend to enable/disable Gluten through the setting for `spark.plugins`. | true |
| spark.gluten.sql.columnar.maxBatchSize | Number of rows to be processed in each batch. Default value is 4096. | 4096 |
| spark.gluten.memory.isolation | (Experimental) Enable isolated memory mode. If true, Gluten controls the maximum off-heap memory can be used by each task to X, X = executor memory / max task slots. It's recommended to set true if Gluten serves concurrent queries within a single session, since not all memory Gluten allocated is guaranteed to be spillable. In the case, the feature should be enabled to avoid OOM. Note when true, setting spark.memory.storageFraction to a lower value is suggested since storage memory is considered non-usable by Gluten. | false |
| spark.gluten.sql.columnar.scanOnly | When enabled, this config will overwrite all other operators' enabling, and only Scan and Filter pushdown will be offloaded to native. | false |
| spark.gluten.sql.columnar.batchscan | Enable or Disable Columnar BatchScan, default is true | true |
| spark.gluten.sql.columnar.hashagg | Enable or Disable Columnar Hash Aggregate, default is true | true |
| spark.gluten.sql.columnar.project | Enable or Disable Columnar Project, default is true | true |
| spark.gluten.sql.columnar.filter | Enable or Disable Columnar Filter, default is true | true |
| spark.gluten.sql.columnar.codegen.sort | Enable or Disable Columnar Sort, default is true | true |
| spark.gluten.sql.columnar.sort | Enable or Disable Columnar Sort, default is true | true |
| spark.gluten.sql.columnar.window | Enable or Disable Columnar Window, default is true | true |
| spark.gluten.sql.columnar.shuffledHashJoin | Enable or Disable ShuffledHashJoin, default is true | true |
| spark.gluten.sql.columnar.forceShuffledHashJoin | Force to use ShuffledHashJoin over SortMergeJoin, default is true | true |
| spark.gluten.sql.columnar.sort | Enable or Disable Columnar Sort, default is true | true |
| spark.gluten.sql.columnar.sortMergeJoin | Enable or Disable Columnar Sort Merge Join, default is true | true |
| spark.gluten.sql.columnar.union | Enable or Disable Columnar Union, default is true | true |
| spark.gluten.sql.columnar.expand | Enable or Disable Columnar Expand, default is true | true |
| spark.gluten.sql.columnar.generate | Enable or Disable Columnar Generate, default is true | true |
| spark.gluten.sql.columnar.limit | Enable or Disable Columnar Limit, default is true | true |
| spark.gluten.sql.columnar.tableCache | Enable or Disable Columnar Table Cache, default is false | true |
| spark.gluten.sql.columnar.broadcastExchange | Enable or Disable Columnar Broadcast Exchange, default is true | true |
| spark.gluten.sql.columnar.broadcastJoin | Enable or Disable Columnar BroadcastHashJoin, default is true | true |
| spark.gluten.sql.columnar.shuffle.codec | Set up the codec to be used for Columnar Shuffle. If this configuration is not set, will check the value of spark.io.compression.codec. By default, Gluten use software compression. Valid options for software compression are lz4, zstd. Valid options for QAT and IAA is gzip. | lz4 |
Expand All @@ -55,12 +58,11 @@ You can add these configurations into spark-defaults.conf to enable or disable t
| spark.gluten.sql.columnar.backend.velox.bloomFilter.numBits | The default number of bits to use for the velox bloom filter. | 8388608L |
| spark.gluten.sql.columnar.backend.velox.bloomFilter.maxNumBits | The max number of bits to use for the velox bloom filter. | 4194304L |

Below is an example for spark-default.conf, if you are using conda to install OAP project.
Below is an example for spark-default.conf:

```
##### Columnar Process Configuration
spark.sql.sources.useV1SourceList avro
spark.plugins io.glutenproject.GlutenPlugin
spark.shuffle.manager org.apache.spark.shuffle.sort.ColumnarShuffleManager
spark.driver.extraClassPath ${GLUTEN_HOME}/package/target/gluten-<>-jar-with-dependencies.jar
Expand Down
Loading

0 comments on commit 43fdaea

Please sign in to comment.