Skip to content

Commit

Permalink
Merge pull request #73 from melburnerodrigues/postgres-plugin
Browse files Browse the repository at this point in the history
Added CloudSQL PostgreSQL source, sink and action plugins.
  • Loading branch information
Bhooshan Mogal authored Jun 24, 2020
2 parents f30c56b + b9d4628 commit 5607858
Show file tree
Hide file tree
Showing 18 changed files with 1,703 additions and 0 deletions.
108 changes: 108 additions & 0 deletions cloudsql-postgresql-plugin/docs/CloudSQLPostgreSQL-action.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# PostgreSQL Action


Description
-----------
Action that runs a PostgreSQL command on a CloudSQL PostgreSQL instance.


Use Case
--------
The action can be used whenever you want to run a PostgreSQL command before or after a data pipeline.
For example, you may want to run a SQL update command on a database before the pipeline source pulls data from tables.


Properties
----------
**Driver Name:** Name of the JDBC driver to use.

**Database Command:** Database command to execute.

**Database:** PostgreSQL database name.

**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>.
Can be found in the instance overview page.

**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'.

**Username:** User identity for connecting to the specified database.

**Password:** Password to use to connect to the specified database.

**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

**Connection Timeout** The timeout value used for socket connect operations. If connecting to the server takes longer
than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is
disabled.


Examples
--------
**Connecting to a public CloudSQL PostgreSQL instance**

Suppose you want to execute a query against a CloudSQL PostgreSQL database named "prod", as "postgres" user with "postgres"
password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases)), then configure plugin with:


```
Driver Name: "cloudsql-postgresql"
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6"
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
CloudSQL Instance Type: "Public"
Database: "prod"
Username: "postgres"
Password: "postgres"
```


**Connecting to a private CloudSQL PostgreSQL instance**

If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy
docker image using the following command

```
# Set the environment variables
export PROJECT=[project_id]
export REGION=[vm-region]
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit
1 --uri --project=${PROJECT}| sed 's/.*\///'`
export SUBNET=[vpc-subnet-name]
export NAME=[gce-vm-name]
export POSTGRESQL_CONN=[postgresql-instance-connection-name]
# Create a Compute Engine VM
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME}
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy
-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE
--scopes=https://www.googleapis.com/auth/cloud-platform
--image=cos-69-10895-385-0 --image-project=cos-cloud
```

Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using

```
# Get the VM internal IP
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} |
grep "networkIP" | awk '{print $2}'`
# Promote the VM internal IP to static IP
gcloud compute addresses create postgresql-proxy --addresses ${IP} --region
${REGION} --subnet ${SUBNET}
```

Get the latest version of the CloudSQL socket factory jar with driver and dependencies from
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with:

```
Driver Name: "cloudsql-postgresql"
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6"
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
CloudSQL Instance Type: "Private"
Database: "prod"
Username: "postgres"
Password: "postgres"
```
163 changes: 163 additions & 0 deletions cloudsql-postgresql-plugin/docs/CloudSQLPostgreSQL-batchsink.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,163 @@
# CloudSQL PostgreSQL Batch Sink


Description
-----------
Writes records to a CloudSQL PostgreSQL table. Each record will be written to a row in the table.


Use Case
--------
This sink is used whenever you need to write to a CloudSQL PostgreSQL table.
Suppose you periodically build a recommendation model for products on your online store.
The model is stored in a GCS bucket and you want to export the contents
of the bucket to a CloudSQL PostgreSQL table where it can be served to your users.

Column names would be auto detected from input schema.

Properties
----------
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc.

**Driver Name:** Name of the JDBC driver to use.

**Database:** CloudSQL PostgreSQL database name.

**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>.
Can be found in the instance overview page.

**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'.

**Table Name:** Name of the table to export to.

**Username:** User identity for connecting to the specified database.

**Password:** Password to use to connect to the specified database.

**Transaction Isolation Level:** Transaction isolation level for queries run by this sink.

**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations.

**Connection Timeout** The timeout value used for socket connect operations. If connecting to the server takes longer
than this value, the connection is broken.The timeout is specified in seconds and a value of zero means that it is
disabled.


Examples
--------
**Connecting to a public CloudSQL PostgreSQL instance**

Suppose you want to write output records to "users" table of CloudSQL PostgreSQL database named "prod", as "postgres"
user with "postgres" password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases)), then configure plugin with:


```
Reference Name: "sink1"
Driver Name: "cloudsql-postgresql"
Database: "prod"
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
CloudSQL Instance Type: "Public"
Table Name: "users"
Username: "postgres"
Password: "postgres"
```


**Connecting to a private CloudSQL PostgreSQL instance**

If you want to connect to a private CloudSQL PostgreSQL instance, create a Compute Engine VM that runs the CloudSQL Proxy
docker image using the following command

```
# Set the environment variables
export PROJECT=[project_id]
export REGION=[vm-region]
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit
1 --uri --project=${PROJECT}| sed 's/.*\///'`
export SUBNET=[vpc-subnet-name]
export NAME=[gce-vm-name]
export POSTGRESQL_CONN=[postgresql-instance-connection-name]
# Create a Compute Engine VM
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME}
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy
-instances=${POSTGRESQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE
--scopes=https://www.googleapis.com/auth/cloud-platform
--image=cos-69-10895-385-0 --image-project=cos-cloud
```

Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using

```
# Get the VM internal IP
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} |
grep "networkIP" | awk '{print $2}'`
# Promote the VM internal IP to static IP
gcloud compute addresses create postgresql-proxy --addresses ${IP} --region
${REGION} --subnet ${SUBNET}
```

Get the latest version of the CloudSQL socket factory jar with driver and dependencies from
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with:

```
Reference Name: "sink1"
Driver Name: "cloudsql-postgresql"
Database: "prod"
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME]
CloudSQL Instance Type: "Private"
Table Name: "users"
Username: "postgres"
Password: "postgres"
```


Data Types Mapping
------------------
All PostgreSQL specific data types mapped to string and can have multiple input formats and one 'canonical' output form.
Please, refer to PostgreSQL data types documentation to figure out proper formats.

| PostgreSQL Data Type | CDAP Schema Data Type | Comment |
|-----------------------------------------------------|-----------------------|----------------------------------------------|
| bigint | long | |
| bit(n) | string | string with '0' and '1' chars exact n length |
| bit varying(n) | string | string with '0' and '1' chars max n length |
| boolean | boolean | |
| bytea | bytes | |
| character | string | |
| character varying | string | |
| double precision | double | |
| integer | int | |
| numeric(precision, scale)/decimal(precision, scale) | decimal | |
| real | float | |
| smallint | int | |
| text | string | |
| date | date | |
| time [ (p) ] [ without time zone ] | time | |
| time [ (p) ] with time zone | string | |
| timestamp [ (p) ] [ without time zone ] | timestamp | |
| timestamp [ (p) ] with time zone | timestamp | stored in UTC format in database |
| xml | string | |
| tsquery | string | |
| tsvector | string | |
| uuid | string | |
| box | string | |
| cidr | string | |
| circle | string | |
| inet | string | |
| interval | string | |
| json | string | |
| jsonb | string | |
| line | string | |
| lseg | string | |
| macaddr | string | |
| macaddr8 | string | |
| money | string | |
| path | string | |
| point | string | |
| polygon | string | |
Loading

0 comments on commit 5607858

Please sign in to comment.