Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spannerspark: initial code skeleton for read #31

Merged
merged 31 commits into from
Aug 11, 2023
Merged

Conversation

odeke-em
Copy link
Collaborator

First phase of the code skeleton that should allow .printSchema() and .show() to work correctly as well as running queries.

@odeke-em
Copy link
Collaborator Author

/cc @halio-g

Dataset<Row> df = spark.read()
.format("cloud-spanner")
.option("table", "people")
.option("projectId", "orijtech-161805")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please leave the personal or test information empty. You can keep it in your local branch though.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I am eventually going to read it from System.getenv

import java.util.concurrent.TimeUnit;

public class SpannerSpark implements DataSourceRegister, TableProvider {
public Dataset<Row> execute(SparkSession spark, String sqlStmt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's hide the code that is unused by the inherited method.

// "projectId": <PROJECT_ID>
// "databaseId": <DATABASE_ID>
String spannerUri = String.format(
"cloudspanner:/projects/%s/instances/%s/databases/%s",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably we want to extract the usage of connectionOptions and Spanner operations into multiple class for easy testing.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!

import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;

public class SpannerSpark implements DataSourceRegister, TableProvider {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please call it SpannerTableProvider or so.

@odeke-em odeke-em force-pushed the initial-code-wireup branch 2 times, most recently from 4e3a0d4 to d308ed6 Compare July 24, 2023 16:48
First phase of the code skeleton that should allow .printSchema()
and .show() to work correctly as well as running queries.
@odeke-em odeke-em force-pushed the initial-code-wireup branch 2 times, most recently from 3e0f0ac to 531b211 Compare July 28, 2023 02:03
@odeke-em odeke-em force-pushed the initial-code-wireup branch 4 times, most recently from d7a9368 to 7325137 Compare July 28, 2023 04:38
@odeke-em odeke-em force-pushed the initial-code-wireup branch 7 times, most recently from 47456df to 65729a9 Compare July 29, 2023 00:09
this.properties = properties;
}

public Dataset<Row> execute(SparkSession spark, String sqlStmt) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is not used anywhere. Let's remove it in the pull request to make the pr clean.

Implements the DefaultSource class along with
SpannerBaseRelation which is an implementation of BaseRelation

Fixes #47
Fixes #48
This change implements SpannerPartitionReader which takes
in an InputPartition which would have been generated after
running a partitionQuery that then serializes the returned
Partition over the network to the executor which when creating
SpannerPartitionReader acquires a transaction then retrieves
values from the produced ResultSet.
Later on, we shall need to investigate if the transaction
might be held for longer than 10 seconds and might be idle
for that long.
@odeke-em
Copy link
Collaborator Author

I am going to merge the base working version in and then individually work on the respective issues plus split work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants