This module allows you to create opinionated Google Cloud Platform BigQuery datasets and tables. This will allow the user to programmatically create an empty table schema inside of a dataset, ready for loading. Additional user accounts and permissions are necessary to begin querying the newly created table(s).
This module is meant for use with Terraform 0.12. If you haven't upgraded and need a Terraform 0.11.x-compatible version of this module, the last released version intended for Terraform 0.11.x is 1.0.0.
The current version is 4.X. The following guides are available to assist with upgrades:
Basic usage of this module is as follows:
module "bigquery" {
source = "terraform-google-modules/bigquery/google"
version = "~> 4.4"
dataset_id = "foo"
dataset_name = "foo"
description = "some description"
project_id = "<PROJECT ID>"
location = "US"
default_table_expiration_ms = 3600000
tables = [
{
table_id = "foo",
schema = "<PATH TO THE SCHEMA JSON FILE>",
time_partitioning = {
type = "DAY",
field = null,
require_partition_filter = false,
expiration_ms = null,
},
expiration_time = null,
clustering = ["fullVisitorId", "visitId"],
labels = {
env = "dev"
billable = "true"
owner = "joedoe"
},
},
{
table_id = "bar",
schema = "<PATH TO THE SCHEMA JSON FILE>",
time_partitioning = null,
expiration_time = 2524604400000, # 2050/01/01
clustering = [],
labels = {
env = "devops"
billable = "true"
owner = "joedoe"
},
}
],
views = [
{
view_id = "barview",
use_legacy_sql = false,
query = <<EOF
SELECT
column_a,
column_b,
FROM
`project_id.dataset_id.table_id`
WHERE
approved_user = SESSION_USER
EOF,
labels = {
env = "devops"
billable = "true"
owner = "joedoe"
}
}
]
dataset_labels = {
env = "dev"
billable = "true"
}
}
Functional examples are included in the examples directory.
The tables
variable should be provided as a list of object with the following keys:
{
table_id = "some_id" # Unique table id (will be used as ID and Freandly name for the table).
schema = "path/to/schema.json" # Path to the schema json file.
time_partitioning = { # Set it to `null` to omit partitioning configuration for the table.
type = "DAY", # The only type supported is DAY, which will generate one partition per day based on data loading time.
field = null, # The field used to determine how to create a time-based partition. If time-based partitioning is enabled without this value, the table is partitioned based on the load time. Set it to `null` to omit configuration.
require_partition_filter = false, # If set to true, queries over this table require a partition filter that can be used for partition elimination to be specified. Set it to `null` to omit configuration.
expiration_ms = null, # Number of milliseconds for which to keep the storage for a partition.
},
clustering = ["fullVisitorId", "visitId"] # Specifies column names to use for data clustering. Up to four top-level columns are allowed, and should be specified in descending priority order. Partitioning should be configured in order to use clustering.
expiration_time = 2524604400000 # The time when this table expires, in milliseconds since the epoch. If set to `null`, the table will persist indefinitely.
labels = { # A mapping of labels to assign to the table.
env = "dev"
billable = "true"
}
}
The views
variable should be provided as a list of object with the following keys:
{
view_id = "some_id" # Unique view id. it will be set to friendly name as well
query = "Select user_id, name from `project_id.dataset_id.table`" # the Select query that will create the view. Tables should be created before.
use_legacy_sql = false # whether to use legacy sql or standard sql
labels = { # A mapping of labels to assign to the view.
env = "dev"
billable = "true"
}
}
A detailed example with authorized views can be found here.
This module provisions a dataset and a list of tables with associated JSON schemas and views from queries.
Name | Description | Type | Default | Required |
---|---|---|---|---|
access | An array of objects that define dataset access for one or more entities. | any |
[ |
no |
dataset_id | Unique ID for the dataset being provisioned. | string |
n/a | yes |
dataset_labels | Key value pairs in a map for dataset labels | map(string) |
{} |
no |
dataset_name | Friendly name for the dataset being provisioned. | string |
null |
no |
default_table_expiration_ms | TTL of tables using the dataset in MS | number |
null |
no |
delete_contents_on_destroy | (Optional) If set to true, delete all the tables in the dataset when destroying the resource; otherwise, destroying the resource will fail if tables are present. | bool |
null |
no |
description | Dataset description. | string |
null |
no |
encryption_key | Default encryption key to apply to the dataset. Defaults to null (Google-managed). | string |
null |
no |
external_tables | A list of objects which include table_id, expiration_time, external_data_configuration, and labels. | list(object({ |
[] |
no |
location | The regional location for the dataset only US and EU are allowed in module | string |
"US" |
no |
project_id | Project where the dataset and table are created | string |
n/a | yes |
tables | A list of objects which include table_id, schema, clustering, time_partitioning, expiration_time and labels. | list(object({ |
[] |
no |
views | A list of objects which include table_id, which is view id, and view query | list(object({ |
[] |
no |
Name | Description |
---|---|
bigquery_dataset | Bigquery dataset resource. |
bigquery_external_tables | Map of BigQuery external table resources being provisioned. |
bigquery_tables | Map of bigquery table resources being provisioned. |
bigquery_views | Map of bigquery view resources being provisioned. |
external_table_ids | Unique IDs for any external tables being provisioned |
external_table_names | Friendly names for any external tables being provisioned |
project | Project where the dataset and tables are created |
table_ids | Unique id for the table being provisioned |
table_names | Friendly name for the table being provisioned |
view_ids | Unique id for the view being provisioned |
view_names | friendlyname for the view being provisioned |
These sections describe requirements for using this module.
The following dependencies must be available:
- Terraform v0.12
- Terraform Provider for GCP plugin v3
A service account with the following roles must be used to provision the resources of this module:
- BigQuery Data Owner:
roles/bigquery.dataOwner
The Project Factory module and the IAM module may be used in combination to provision a service account with the necessary roles applied.
A helper script for configuring a Service Account is located at (./helpers/setup-sa.sh).
A project with the following APIs enabled must be used to host the resources of this module:
- BigQuery JSON API:
bigquery-json.googleapis.com
The Project Factory module can be used to provision a project with the necessary APIs enabled.
Refer to the contribution guidelines for information on contributing to this module.