-
Notifications
You must be signed in to change notification settings - Fork 30
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #67 from melburnerodrigues/develop
Added CloudSQL MySQL source, sink and action plugins.
- Loading branch information
Showing
18 changed files
with
1,629 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,107 @@ | ||
# CloudSQL MySQL Action | ||
|
||
|
||
Description | ||
----------- | ||
Action that runs a MySQL command on a CloudSQL MySQL instance. | ||
|
||
|
||
Use Case | ||
-------- | ||
The action can be used whenever you want to run a MySQL command before or after a data pipeline. | ||
For example, you may want to run a SQL update command on a database before the pipeline source pulls data from tables. | ||
|
||
|
||
Properties | ||
---------- | ||
**Driver Name:** Name of the JDBC driver to use. | ||
|
||
**Database Command:** Database command to execute. | ||
|
||
**Database:** MySQL database name. | ||
|
||
**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>. | ||
Can be found in the instance overview page. | ||
|
||
**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'. | ||
|
||
**Username:** User identity for connecting to the specified database. | ||
|
||
**Password:** Password to use to connect to the specified database. | ||
|
||
**Connection Timeout:** The timeout value (in seconds) used for socket connect operations. If connecting to the server | ||
takes longer than this value, the connection is broken. A value of 0 means that it is disabled. | ||
|
||
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments | ||
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. | ||
|
||
|
||
|
||
Examples | ||
-------- | ||
**Connecting to a public CloudSQL MySQL instance** | ||
|
||
Suppose you want to execute a query against a CloudSQL MySQL database named "prod", as "root" user with "root" password | ||
(Get the latest version of the CloudSQL socket factory jar with driver and dependencies | ||
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with: | ||
|
||
``` | ||
Driver Name: "cloudsql-mysql" | ||
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6" | ||
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME] | ||
CloudSQL Instance Type: "Public" | ||
Database: "prod" | ||
Username: "root" | ||
Password: "root" | ||
``` | ||
|
||
|
||
**Connecting to a private CloudSQL MySQL instance** | ||
|
||
If you want to connect to a private CloudSQL MySQL instance, create a Compute Engine VM that runs the CloudSQL Proxy | ||
docker image using the following command | ||
|
||
``` | ||
# Set the environment variables | ||
export PROJECT=[project_id] | ||
export REGION=[vm-region] | ||
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit | ||
1 --uri --project=${PROJECT}| sed 's/.*\///'` | ||
export SUBNET=[vpc-subnet-name] | ||
export NAME=[gce-vm-name] | ||
export MYSQL_CONN=[mysql-instance-connection-name] | ||
# Create a Compute Engine VM | ||
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME} | ||
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address | ||
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 | ||
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy | ||
-instances=${MYSQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE | ||
--scopes=https://www.googleapis.com/auth/cloud-platform | ||
--image=cos-69-10895-385-0 --image-project=cos-cloud | ||
``` | ||
|
||
Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using | ||
|
||
``` | ||
# Get the VM internal IP | ||
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | | ||
grep "networkIP" | awk '{print $2}'` | ||
# Promote the VM internal IP to static IP | ||
gcloud compute addresses create mysql-proxy --addresses ${IP} --region | ||
${REGION} --subnet ${SUBNET} | ||
``` | ||
|
||
Get the latest version of the CloudSQL socket factory jar with driver and dependencies from | ||
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with: | ||
|
||
``` | ||
Driver Name: "cloudsql-mysql" | ||
Database Command: "UPDATE table_name SET price = 20 WHERE ID = 6" | ||
Instance Name: [PROJECT_ID]:[REGION]:[INSTANCE_NAME] | ||
CloudSQL Instance Type: "Private" | ||
Database: "prod" | ||
Username: "root" | ||
Password: "root" | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,153 @@ | ||
# CloudSQL MySQL Batch Sink | ||
|
||
|
||
Description | ||
----------- | ||
Writes records to a CloudSQL MySQL table. Each record will be written to a row in the table. | ||
|
||
|
||
Use Case | ||
-------- | ||
This sink is used whenever you need to write to a CloudSQL MySQL table. | ||
Suppose you periodically build a recommendation model for products on your online store. | ||
The model is stored in a GCS bucket and you want to export the contents | ||
of the bucket to a CloudSQL MySQL table where it can be served to your users. | ||
|
||
Column names would be autodetected from input schema. | ||
|
||
Properties | ||
---------- | ||
**Reference Name:** Name used to uniquely identify this sink for lineage, annotating metadata, etc. | ||
|
||
**Driver Name:** Name of the JDBC driver to use. | ||
|
||
**Database:** MySQL database name. | ||
|
||
**Connection Name:** The CloudSQL instance to connect to in the format <PROJECT_ID>:\<REGION>:<INSTANCE_NAME>. | ||
Can be found in the instance overview page. | ||
|
||
**CloudSQL Instance Type:** Whether the CloudSQL instance to connect to is private or public. Defaults to 'Public'. | ||
|
||
**Table Name:** Name of the table to export to. Table must exist prior to running the pipeline. | ||
|
||
**Username:** User identity for connecting to the specified database. | ||
|
||
**Password:** Password to use to connect to the specified database. | ||
|
||
**Transaction Isolation Level:** Transaction isolation level for queries run by this sink. | ||
|
||
**Connection Timeout:** The timeout value (in seconds) used for socket connect operations. If connecting to the server | ||
takes longer than this value, the connection is broken. A value of 0 means that it is disabled. | ||
|
||
**Connection Arguments:** A list of arbitrary string key/value pairs as connection arguments. These arguments | ||
will be passed to the JDBC driver as connection arguments for JDBC drivers that may need additional configurations. | ||
|
||
|
||
Data Types Mapping | ||
------------------ | ||
|
||
| MySQL Data Type | CDAP Schema Data Type | | ||
| ------------------------------ | --------------------- | | ||
| BIT | boolean | | ||
| TINYINT | int | | ||
| BOOL, BOOLEAN | boolean | | ||
| SMALLINT | int | | ||
| MEDIUMINT | double | | ||
| INT,INTEGER | int | | ||
| BIGINT | long | | ||
| FLOAT | float | | ||
| DOUBLE | double | | ||
| DECIMAL | decimal | | ||
| DATE | date | | ||
| DATETIME | timestamp | | ||
| TIMESTAMP | timestamp | | ||
| TIME | time | | ||
| YEAR | date | | ||
| CHAR | string | | ||
| VARCHAR | string | | ||
| BINARY | bytes | | ||
| VARBINARY | bytes | | ||
| TINYBLOB | bytes | | ||
| TINYTEXT | string | | ||
| BLOB | bytes | | ||
| TEXT | string | | ||
| MEDIUMBLOB | bytes | | ||
| MEDIUMTEXT | string | | ||
| LONGBLOB | bytes | | ||
| LONGTEXT | string | | ||
| ENUM | string | | ||
| SET | string | | ||
|
||
|
||
|
||
Examples | ||
-------- | ||
**Connecting to a public CloudSQL MySQL instance** | ||
|
||
Suppose you want to write output records to "users" table of CloudSQL MySQL database named "prod", as "root" user with | ||
"root" password (Get the latest version of the CloudSQL socket factory jar with driver and dependencies | ||
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with: | ||
|
||
|
||
``` | ||
Reference Name: "sink1" | ||
Driver Name: "cloudsql-mysql" | ||
Instance Name: [PROJECT_ID]:[REGION_ID]:[INSTANCE_NAME] | ||
CloudSQL Instance Type: "Public" | ||
Database: "prod" | ||
Table Name: "users" | ||
Username: "root" | ||
Password: "root" | ||
``` | ||
|
||
|
||
**Connecting to a private CloudSQL MySQL instance** | ||
|
||
If you want to connect to a private CloudSQL MySQL instance, create a Compute Engine VM that runs the CloudSQL Proxy | ||
docker image using the following command | ||
|
||
``` | ||
# Set the environment variables | ||
export PROJECT=[project_id] | ||
export REGION=[vm-region] | ||
export ZONE=`gcloud compute zones list --filter="name=${REGION}" --limit | ||
1 --uri --project=${PROJECT}| sed 's/.*\///'` | ||
export SUBNET=[vpc-subnet-name] | ||
export NAME=[gce-vm-name] | ||
export MYSQL_CONN=[mysql-instance-connection-name] | ||
# Create a Compute Engine VM | ||
gcloud beta compute --project=${PROJECT_ID} instances create ${INSTANCE_NAME} | ||
--zone=${ZONE} --machine-type=g1-small --subnet=${SUBNE} --no-address | ||
--metadata=startup-script="docker run -d -p 0.0.0.0:3306:3306 | ||
gcr.io/cloudsql-docker/gce-proxy:1.16 /cloud_sql_proxy | ||
-instances=${MYSQL_CONNECTION_NAME}=tcp:0.0.0.0:3306" --maintenance-policy=MIGRATE | ||
--scopes=https://www.googleapis.com/auth/cloud-platform | ||
--image=cos-69-10895-385-0 --image-project=cos-cloud | ||
``` | ||
|
||
Optionally, you can promote the internal IP address of the VM running the Proxy image to a static IP using | ||
|
||
``` | ||
# Get the VM internal IP | ||
export IP=`gcloud compute instances describe ${NAME} --zone ${ZONE} | | ||
grep "networkIP" | awk '{print $2}'` | ||
# Promote the VM internal IP to static IP | ||
gcloud compute addresses create mysql-proxy --addresses ${IP} --region | ||
${REGION} --subnet ${SUBNET} | ||
``` | ||
|
||
Get the latest version of the CloudSQL socket factory jar with driver and dependencies from | ||
[here](https://github.com/GoogleCloudPlatform/cloud-sql-jdbc-socket-factory/releases), then configure plugin with: | ||
|
||
``` | ||
Reference Name: "sink1" | ||
Driver Name: "cloudsql-mysql" | ||
Instance Name: [PROJECT_ID]:[REGION_ID]:[INSTANCE_NAME] | ||
CloudSQL Instance Type: "Private" | ||
Database: "prod" | ||
Table Name: "users" | ||
Username: "root" | ||
Password: "root" | ||
``` |
Oops, something went wrong.