SpannerScanner: add option to disableDataboost

Allows Databoost to be disabled; it is on by default given the point of this connector. However, there is something to be said about compatibility so that by default most users who haven't enabled Databoost can still use it, but that's to be discussed for later. Fixes #68
GoogleCloudDataproc · Sep 19, 2023 · b9473c4 · b9473c4
1 parent 983ea73
commit b9473c4
Showing 1 changed file with 8 additions and 1 deletion.
diff --git a/spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerScanner.java b/spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerScanner.java
@@ -89,14 +89,21 @@ public InputPartition[] planInputPartitions() {
     if (filters.length > 0) {
       sqlStmt += " WHERE " + SparkFilterUtils.getCompiledFilter(true, filters);
     }
+
+    // By default, dataBoost is enabled, given the point of this
+    // integration was to take advantage of dataBoost firstly.
+    // Please see https://github.com/GoogleCloudDataproc/spark-spanner-connector/issues/68
+    boolean disableDataboost = this.opts.get("disableDataboost");
+    boolean enableDataboost = disableDataboost == null || !disableDataboost;
+
     try (BatchReadOnlyTransaction txn =
         batchClient.batchClient.batchReadOnlyTransaction(TimestampBound.strong())) {
       String mapAsJSON = SpannerUtils.serializeMap(this.opts);
       List<com.google.cloud.spanner.Partition> rawPartitions =
           txn.partitionQuery(
               PartitionOptions.getDefaultInstance(),
               Statement.of(sqlStmt),
-              Options.dataBoostEnabled(true));
+              Options.dataBoostEnabled(enableDataboost);
 
       List<Partition> parts =
           Streams.mapWithIndex(