-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spannerspark: initial code skeleton for read #31
Conversation
/cc @halio-g |
7ed70df
to
7991c4f
Compare
examples/SpannerSpark.java
Outdated
Dataset<Row> df = spark.read() | ||
.format("cloud-spanner") | ||
.option("table", "people") | ||
.option("projectId", "orijtech-161805") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please leave the personal or test information empty. You can keep it in your local branch though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I am eventually going to read it from System.getenv
import java.util.concurrent.TimeUnit; | ||
|
||
public class SpannerSpark implements DataSourceRegister, TableProvider { | ||
public Dataset<Row> execute(SparkSession spark, String sqlStmt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's hide the code that is unused by the inherited method.
// "projectId": <PROJECT_ID> | ||
// "databaseId": <DATABASE_ID> | ||
String spannerUri = String.format( | ||
"cloudspanner:/projects/%s/instances/%s/databases/%s", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably we want to extract the usage of connectionOptions and Spanner operations into multiple class for easy testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed!
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerSpark.java
Outdated
Show resolved
Hide resolved
import java.util.concurrent.Executors; | ||
import java.util.concurrent.TimeUnit; | ||
|
||
public class SpannerSpark implements DataSourceRegister, TableProvider { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please call it SpannerTableProvider or so.
4e3a0d4
to
d308ed6
Compare
First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.
3e0f0ac
to
531b211
Compare
d7a9368
to
7325137
Compare
7325137
to
74427e1
Compare
88dd79c
to
7b7122c
Compare
47456df
to
65729a9
Compare
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/DefaultSource.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerDataReader_java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/main/java/com/google/cloud/spark/spanner/SpannerDataSourceReader_java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/SpannerSparkTest.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/SpannerSparkTest.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/SpannerSparkTest.java
Outdated
Show resolved
Hide resolved
spark-3.1-spanner-lib/src/test/java/com/google/cloud/spark/spanner/SpannerSparkTest.java
Outdated
Show resolved
Hide resolved
this.properties = properties; | ||
} | ||
|
||
public Dataset<Row> execute(SparkSession spark, String sqlStmt) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function is not used anywhere. Let's remove it in the pull request to make the pr clean.
Fixes #46
This change implements SpannerPartitionReader which takes in an InputPartition which would have been generated after running a partitionQuery that then serializes the returned Partition over the network to the executor which when creating SpannerPartitionReader acquires a transaction then retrieves values from the produced ResultSet. Later on, we shall need to investigate if the transaction might be held for longer than 10 seconds and might be idle for that long.
I am going to merge the base working version in and then individually work on the respective issues plus split work. |
First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.