-
Notifications
You must be signed in to change notification settings - Fork 435
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[GLUTEN-3902][VL] Add documentation to configure the Velox+GCS connec…
…tor (#3902)
- Loading branch information
Showing
3 changed files
with
43 additions
and
4 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
--- | ||
layout: page | ||
title: Using GCS with Gluten | ||
nav_order: 5 | ||
parent: Getting-Started | ||
--- | ||
Object stores offered by CSPs such as GCS are important for users of Gluten to store their data. This doc will discuss all details of configs, and use cases around using Gluten with object stores. In order to use a GCS endpoint as your data source, please ensure you are using the following GCS configs in your spark-defaults.conf. If you're experiencing any issues authenticating to GCS with additional auth mechanisms, please reach out to us using the 'Issues' tab. | ||
|
||
# Working with GCS | ||
|
||
## Installing the gcloud CLI | ||
|
||
To access GCS Objects using Gluten and Velox, first you have to [download an install the gcloud CLI] (https://cloud.google.com/sdk/docs/install). | ||
|
||
|
||
## Configuring GCS using a user account | ||
|
||
This is recommended for regular users, follow the [instructions to authorize a user account](https://cloud.google.com/sdk/docs/authorizing#user-account). | ||
After these steps, no specific configuration is required for Gluten, since the authorization was handled entirely by the gcloud tool. | ||
|
||
|
||
## Configuring GCS using a credential file | ||
|
||
For workloads that need to be fully automated, manually authorizing can be problematic. For such cases it is better to use a json file with the credentials. | ||
This is described in the [instructions to configure a service account]https://cloud.google.com/sdk/docs/authorizing#service-account. | ||
|
||
Such json file with the credetials can be passed to Gluten: | ||
|
||
```sh | ||
spark.hadoop.fs.gs.auth.type SERVICE_ACCOUNT_JSON_KEYFILE | ||
spark.hadoop.fs.gs.auth.service.account.json.keyfile // path to the json file with the credentials. | ||
``` | ||
|
||
## Configuring GCS endpoints | ||
|
||
For cases when a GCS mock is used, an optional endpoint can be provided: | ||
```sh | ||
spark.hadoop.fs.gs.storage.root.url // url to the mock gcs service including starting with http or https | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters