Skip to content

Latest commit

 

History

History
24 lines (16 loc) · 2 KB

gcs-guide.md

File metadata and controls

24 lines (16 loc) · 2 KB

Google Cloud Storage User Guide

Since Vertica can be deployed on Google Cloud Platform, it is possible for the Spark Connector to make use of Google Cloud Storage as the intermediary storage.

  • Running on DataProc clusters: If your Spark cluster deployed on GCP, you will need to obtain an HMAC interoperability key. Then configure connector options gcs_hmac_key_id and gcs_hmac_key_secret. The instruction for obtaining the key can be found here.
  • Running outside of DataProc clusters: In addition to configuring the HMAC key above, you will obtain a GCS service account key in the form of a JSON service keyfile. Instruction on obtaining one can be found here.

Then, specify the connector option gcs_service_keyfile with the path to your keyfile JSON. Alternatively, the connector can pick up the option from the environment variable GOOGLE_APPLICATION_CREDENTIALS as well as the spark configuration option fs.gs.auth.service.account.json.keyfile.

Finally, ensure that you include the Google Hadoop Connector dependency into your project. Make sure your select the appropriate connector distribution for your Hadoop version.

With the credential specified, you can now configure the connector option staging_fs_url to use GCS paths gs://<bucket-id>/path/to/data.

Another option to specifying the keyfile path is to set the following connector options:

gcs_service_key_id = <field private_key_id in your keyfile json>
gcs_service_key = <field private_key in your keyfile json>
gcs_service_email = <field client_email in your keyfile json>

Additional Resources