title | has_children | nav_order | nav_exclude |
---|---|---|---|
Infrastructure Catalog |
true |
3 |
false |
The Infrastructure Catalog contains ready-to-deploy terraform modules for a variety of production data project use cases and POCs. For information about the technical building blocks used in these modules, please see the catalog components index.
-
Azure Catalog
- (Coming soon)
-
GCP Catalog
- (Coming soon)
Airflow is an open source platform to programmatically author, schedule and monitor workflows. More information here: airflow.apache.org
The bastion-host
module deploys an ECS-backed container which can be used to remotely test
or develop using the native cloud environment.
Applicable use cases include:
- Debugging network firewall and routing rules
- Debugging components which can only be run from whitelisted IP ranges
- Offloading heavy processing from the developer's local laptop
- Mitigating network reliability issues when working from WiFi or home networks
This data lake implementation creates three buckets, one each for data, logging, and metadata. The data lake also supports lambda functions which can trigger automatically when new content is added.
- Designed to be used in combination with the
aws/data-lake-users
module. - To add SFTP protocol support, combine this module with the
aws/sftp
module.
Automates the management of users and groups in an S3 data lake.
- Designed to be used in combination with the
aws/data-lake
module.
DBT (Data Built Tool) is a CI/CD and DevOps-friendly platform for automating data transformations. More info at www.getdbt.com.
The environment module sets up common infrastrcuture like VPCs and network subnets. The environment
output
from this module is designed to be passed easily to downstream modules, streamlining the reuse of these core components.
This module automates MLOps tasks associated with training Machine Learning models.
The module leverages Step Functions and Lambda functions as needed. The state machine executes hyperparameter tuning, training, and deployments as needed. Deployment options supported are Sagemaker endpoints and/or batch inference.
Deploys a MySQL server running on RDS.
- NOTE: Requires AWS policy 'AmazonRDSFullAccess' on the terraform account
Deploys a Postgres server running on RDS.
- NOTE: Requires AWS policy 'AmazonRDSFullAccess' on the terraform account
Redshift is an AWS database platform which applies MPP (Massively-Parallel-Processing) principles to big data workloads in the cloud.
Automates the management of the AWS Transfer Service, which provides an SFTP interface on top of existing S3 storage resources.
- Designed to be used in combination with the
aws/data-lake
andaws/sftp-users
modules.
Automates the management of SFTP user accounts on the AWS Transfer Service. AWS Transfer Service provides an SFTP interface on top of existing S3 storage resources.
- Designed to be used in combination with the
aws/sftp
module.
The Singer Taps platform is the open source stack which powers the Stitcher EL platform. For more information, see singer.io
This module securely deploys one or more Tableau Servers, which can then be used to host reports in production or POC environments. The module supports both Linux and Windows versions of the Tableau Server Software.
(Coming soon)
(Coming soon)
NOTE: This documentation was auto-generated using
terraform-docs
. Please do not attempt to manually update
this file.