spannerspark: initial code skeleton for read #31

odeke-em · 2023-07-20T13:50:37Z

First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.

odeke-em · 2023-07-20T13:51:12Z

/cc @halio-g

halio-g · 2023-07-21T23:23:08Z

examples/SpannerSpark.java

+        Dataset<Row> df = spark.read()
+            .format("cloud-spanner")
+            .option("table", "people")
+            .option("projectId", "orijtech-161805")


Please leave the personal or test information empty. You can keep it in your local branch though.

Yeah I am eventually going to read it from System.getenv

halio-g · 2023-07-21T23:24:46Z

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java

+import java.util.concurrent.TimeUnit;
+
+public class SpannerSpark implements DataSourceRegister, TableProvider {
+    public Dataset<Row> execute(SparkSession spark, String sqlStmt) {


Let's hide the code that is unused by the inherited method.

halio-g · 2023-07-21T23:26:55Z

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java

+        // "projectId": <PROJECT_ID>
+        // "databaseId": <DATABASE_ID>
+        String spannerUri = String.format(
+                "cloudspanner:/projects/%s/instances/%s/databases/%s", 


Probably we want to extract the usage of connectionOptions and Spanner operations into multiple class for easy testing.

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java

halio-g · 2023-07-21T23:30:38Z

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java

+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+public class SpannerSpark implements DataSourceRegister, TableProvider {


Please call it SpannerTableProvider or so.

First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.

Updates #43

.github/workflows/test.yml

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/DefaultSource.java

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerDataReader_java

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerDataSourceReader_java

spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/SpannerSparkTest.java

halio-g · 2023-08-01T23:29:17Z

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java

+    this.properties = properties;
+  }
+
+  public Dataset<Row> execute(SparkSession spark, String sqlStmt) {


This function is not used anywhere. Let's remove it in the pull request to make the pr clean.

Implements the DefaultSource class along with SpannerBaseRelation which is an implementation of BaseRelation Fixes #47 Fixes #48

Fixes #46

…Spark

This change implements SpannerPartitionReader which takes in an InputPartition which would have been generated after running a partitionQuery that then serializes the returned Partition over the network to the executor which when creating SpannerPartitionReader acquires a transaction then retrieves values from the produced ResultSet. Later on, we shall need to investigate if the transaction might be held for longer than 10 seconds and might be idle for that long.

odeke-em · 2023-08-11T16:24:36Z

I am going to merge the base working version in and then individually work on the respective issues plus split work.

odeke-em force-pushed the initial-code-wireup branch from 7ed70df to 7991c4f Compare July 20, 2023 14:00

halio-g reviewed Jul 21, 2023

View reviewed changes

spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java Outdated Show resolved Hide resolved

halio-g reviewed Jul 21, 2023

View reviewed changes

odeke-em force-pushed the initial-code-wireup branch 2 times, most recently from 4e3a0d4 to d308ed6 Compare July 24, 2023 16:48

spannerspark: initial code skeleton for read

52c5e8f

First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.

odeke-em force-pushed the initial-code-wireup branch 2 times, most recently from 3e0f0ac to 531b211 Compare July 28, 2023 02:03

Add dummy Scan + ScanBuilder classes

b3aaa94

Updates #43

odeke-em force-pushed the initial-code-wireup branch 4 times, most recently from d7a9368 to 7325137 Compare July 28, 2023 04:38

extract and plug-in SpannerTable+ScanBuilder

74427e1

odeke-em force-pushed the initial-code-wireup branch from 7325137 to 74427e1 Compare July 28, 2023 04:41

Implement more interfaces

7b7122c

odeke-em force-pushed the initial-code-wireup branch from 88dd79c to 7b7122c Compare July 28, 2023 05:05

odeke-em added 2 commits July 28, 2023 13:54

Start tests

dd56441

README: update with instructions to run emulator locally

05a94db

odeke-em force-pushed the initial-code-wireup branch 7 times, most recently from 47456df to 65729a9 Compare July 29, 2023 00:09