Terraform module which creates Glue Job resources on AWS.
module "glue_job" {
source = "vitalibo/glue-job/aws"
organization = var.organization
environment = var.environment
name = "data-cleaning"
script_location = "s3://${var.bucket}/${var.environment}/driver.py"
glue_version = "3.0"
role_arn = aws_iam_role.glue_job.arn
timeout = 60
number_of_workers = 6
job_language = "python"
extra_py_files = ["s3://${var.bucket}/${var.environment}/sources.zip"]
job_bookmark_option = "job-bookmark-enable"
enable_glue_datacatalog = true
enable_metrics = true
enable_continuous_log_filter = false
tags = {
"Maintainer" : "Vitaliy Boyarsky"
}
}
The following arguments are supported:
organization
- (Optional) Organization abbreviation that will be prefixed to resource names.environment
- (Required) Environment name.name
- (Required) Name that will be used for identify resources.tags
- (Optional) Key-value map of resource tags.script_location
- (Required) Specifies the S3 path to a script that executes a job.python_version
- (Optional) The Python version being used to execute a Python shell job.connections
- (Optional) The list of connections used for this job.description
- (Optional) Description of the job.max_concurrent_runs
- (Optional) The maximum number of concurrent runs allowed for a job.glue_version
- (Optional) The version of glue to use.max_retries
- (Optional) The maximum number of times to retry this job if it fails.notify_delay_after
- (Optional) After a job run starts, the number of minutes to wait before sending a job run delay notification.role_arn
- (Optional) The ARN of the IAM role associated with this job.timeout
- (Optional) The job timeout in minutes.worker_type
- (Optional) The type of predefined worker that is allocated when a job runs.number_of_workers
- (Optional) The number of workers of a defined workerType that are allocated when a job runs.security_configuration
- (Optional) The name of the Security Configuration to be associated with the job.create_security_configuration
- (Optional) Create AWS Glue Security Configuration associated with the job.security_configuration_cloudwatch_encryption
- (Optional) A cloudwatch_encryption block as described below, which contains encryption configuration for CloudWatch.security_configuration_job_bookmarks_encryption
- (Optional) A job_bookmarks_encryption block as described below, which contains encryption configuration for job bookmarks.security_configuration_s3_encryption
- (Optional) A s3_encryption block as described below, which contains encryption configuration for S3 data.log_group_retention_in_days
- (Optional) The default number of days log events retained in the glue job log group.job_language
- (Optional) The script programming language.class
- (Optional) The Scala class that serves as the entry point for your Scala script.extra_py_files
- (Optional) The Amazon S3 paths to additional Python modules that AWS Glue adds to the Python path before executing your script.extra_jars
- (Optional) The Amazon S3 paths to additional Java .jar files that AWS Glue adds to the Java classpath before executing your script.user_jars_first
- (Optional) Prioritizes the customer's extra JAR files in the classpath.use_postgres_driver
- (Optional) Prioritizes the Postgres JDBC driver in the class path to avoid a conflict with the Amazon Redshift JDBC driver.extra_files
- (Optional) The Amazon S3 paths to additional files, such as configuration files that AWS Glue copies to the working directory of your script before executing it.job_bookmark_option
- (Optional) Controls the behavior of a job bookmark.temp_dir
- (Optional) Specifies an Amazon S3 path to a bucket that can be used as a temporary directory for the job.enable_s3_parquet_optimized_committer
- (Optional) Enables the EMRFS S3-optimized committer for writing Parquet data into Amazon S3.enable_rename_algorithm_v2
- (Optional) Sets the EMRFS rename algorithm version to version 2.enable_glue_datacatalog
- (Optional) Enables you to use the AWS Glue Data Catalog as an Apache Spark Hive metastore.enable_metrics
- (Optional) Enables the collection of metrics for job profiling for job run.enable_continuous_cloudwatch_log
- (Optional) Enables real-time continuous logging for AWS Glue jobs.enable_continuous_log_filter
- (Optional) Specifies a standard filter or no filter when you create or edit a job enabled for continuous logging.continuous_log_stream_prefix
- (Optional) Specifies a custom CloudWatch log stream prefix for a job enabled for continuous logging.continuous_log_conversion_pattern
- (Optional) Specifies a custom conversion log pattern for a job enabled for continuous logging.enable_spark_ui
- (Optional) Enable Spark UI to monitor and debug AWS Glue ETL jobs.spark_event_logs_path
- (Optional) Specifies an Amazon S3 path. When using the Spark UI monitoring feature.additional_python_modules
- (Optional) List of Python modules to add a new module or change the version of an existing module.
cloudwatch_encryption_mode
- (Optional) Encryption mode to use for CloudWatch data.kms_key_arn
- (Optional) Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
job_bookmarks_encryption_mode
- (Optional) Encryption mode to use for job bookmarks data.kms_key_arn
- (Optional) Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
s3_encryption_mode
- (Optional) Encryption mode to use for S3 data.kms_key_arn
- (Optional) Amazon Resource Name (ARN) of the KMS key to be used to encrypt the data.
The following attributes are exported:
job_id
- Job name.job_arn
- Amazon Resource Name (ARN) of Glue Job.job_role_arn
- Amazon Resource Name (ARN) specifying the role.job_security_configuration_id
- Glue security configuration name.job_log_group_arn
- The Amazon Resource Name (ARN) specifying the log group.