A Terraform module which deploys Snowplow Enrich service on VMSS.
This module by default collects and forwards telemetry information to Snowplow to understand how our applications are being used. No identifying information about your sub-account or account fingerprints are ever forwarded to us - it is very simple information about what modules and applications are deployed and active.
If you wish to subscribe to our mailing list for updates to these modules or security advisories please set the user_provided_id
variable to include a valid email address which we can reach you at.
To disable telemetry simply set variable telemetry_enabled = false
.
For details on what information is collected please see this module: https://github.com/snowplow-devops/terraform-snowplow-telemetry
Enrich takes data from a raw input topic and pushes validated data to the enriched topic and failed data to the bad topic. As part of this validation process we leverage Iglu which is Snowplow's schema repository - the home for event and entity definitions. If you are using custom events that you have defined yourself you will need to ensure that you link in your own Iglu Registries to this module so that they can be discovered correctly.
By default this module enables 5 enrichments which you can find in the templates/enrichments
directory of this module.
module "pipeline_eh_namespace" {
source = "snowplow-devops/event-hub-namespace/azurerm"
version = "0.1.1"
name = "snowplow-pipeline"
resource_group_name = var.resource_group_name
}
module "raw_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "raw-topic"
namespace_name = module.pipeline_eh_namespace.name
resource_group_name = var.resource_group_name
}
module "bad_1_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "bad-1-topic"
namespace_name = module.pipeline_eh_namespace.name
resource_group_name = var.resource_group_name
}
module "enriched_eh_topic" {
source = "snowplow-devops/event-hub/azurerm"
version = "0.1.1"
name = "enriched-topic"
namespace_name = module.pipeline_eh_namespace.name
resource_group_name = var.resource_group_name
}
module "enrich_event_hub" {
source = "snowplow-devops/enrich-event-hub-vmss/azurerm"
accept_limited_use_license = true
name = "enrich-server"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_servers
raw_topic_name = module.raw_eh_topic.name
raw_topic_kafka_password = module.raw_eh_topic.read_only_primary_connection_string
good_topic_name = module.enriched_eh_topic.name
good_topic_kafka_password = module.enriched_eh_topic.read_write_primary_connection_string
bad_topic_name = module.bad_1_eh_topic.name
bad_topic_kafka_password = module.bad_1_eh_topic.read_write_primary_connection_string
eh_namespace_name = module.pipeline_eh_namespace.name
kafka_brokers = module.pipeline_eh_namespace.broker
ssh_public_key = "your-public-key-here"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
}
To define your own enrichment configurations you will need to provide a JSON encoded string of the enrichment in the appropriate placeholder.
locals {
enrichment_anon_ip = jsonencode(<<EOF
{
"schema": "iglu:com.snowplowanalytics.snowplow/anon_ip/jsonschema/1-0-1",
"data": {
"name": "anon_ip",
"vendor": "com.snowplowanalytics.snowplow",
"enabled": true,
"parameters": {
"anonOctets": 1,
"anonSegments": 1
}
}
}
EOF
)
}
module "enrich_event_hub" {
source = "snowplow-devops/enrich-event-hub-vmss/azurerm"
accept_limited_use_license = true
name = "enrich-server"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_servers
raw_topic_name = module.raw_eh_topic.name
raw_topic_kafka_password = module.raw_eh_topic.read_only_primary_connection_string
good_topic_name = module.enriched_eh_topic.name
good_topic_kafka_password = module.enriched_eh_topic.read_write_primary_connection_string
bad_topic_name = module.bad_1_eh_topic.name
bad_topic_kafka_password = module.bad_1_eh_topic.read_write_primary_connection_string
eh_namespace_name = module.pipeline_eh_namespace.name
kafka_brokers = module.pipeline_eh_namespace.broker
ssh_public_key = "your-public-key-here"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
# Enable this enrichment
enrichment_anon_ip = local.enrichment_anon_ip
}
As with inserting custom enrichments to disable the default enrichments a similar strategy must be employed. For example to disable YAUAA you would do the following.
locals {
enrichment_yauaa = jsonencode(<<EOF
{
"schema": "iglu:com.snowplowanalytics.snowplow.enrichments/yauaa_enrichment_config/jsonschema/1-0-0",
"data": {
"enabled": false,
"vendor": "com.snowplowanalytics.snowplow.enrichments",
"name": "yauaa_enrichment_config"
}
}
EOF
)
}
module "enrich_event_hub" {
source = "snowplow-devops/enrich-event-hub-vmss/azurerm"
accept_limited_use_license = true
name = "enrich-server"
resource_group_name = var.resource_group_name
subnet_id = var.subnet_id_for_servers
raw_topic_name = module.raw_eh_topic.name
raw_topic_kafka_password = module.raw_eh_topic.read_only_primary_connection_string
good_topic_name = module.enriched_eh_topic.name
good_topic_kafka_password = module.enriched_eh_topic.read_write_primary_connection_string
bad_topic_name = module.bad_1_eh_topic.name
bad_topic_kafka_password = module.bad_1_eh_topic.read_write_primary_connection_string
eh_namespace_name = module.pipeline_eh_namespace.name
kafka_brokers = module.pipeline_eh_namespace.broker
ssh_public_key = "your-public-key-here"
ssh_ip_allowlist = ["0.0.0.0/0"]
# Linking in the custom Iglu Server here
custom_iglu_resolvers = [
{
name = "Iglu Server"
priority = 0
uri = "http://your-iglu-server-endpoint/api"
api_key = var.iglu_super_api_key
vendor_prefixes = []
}
]
# Disable this enrichment
enrichment_yauaa_enrichment_config = local.enrichment_yauaa
}
Name | Version |
---|---|
terraform | >= 1.0.0 |
azurerm | >= 3.58.0 |
Name | Version |
---|---|
azurerm | >= 3.58.0 |
Name | Source | Version |
---|---|---|
service | snowplow-devops/service-vmss/azurerm | 0.1.1 |
telemetry | snowplow-devops/telemetry/snowplow | 0.5.0 |
Name | Type |
---|---|
azurerm_eventhub_consumer_group.raw_topic | resource |
azurerm_network_security_group.nsg | resource |
azurerm_network_security_rule.egress_tcp_443 | resource |
azurerm_network_security_rule.egress_tcp_80 | resource |
azurerm_network_security_rule.egress_tcp_custom | resource |
azurerm_network_security_rule.egress_udp_123 | resource |
azurerm_network_security_rule.ingress_tcp_22 | resource |
azurerm_resource_group.rg | data source |
Name | Description | Type | Default | Required |
---|---|---|---|---|
bad_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) | string |
n/a | yes |
bad_topic_name | The name of the bad Kafka topic that enrichment will insert failed data into | string |
n/a | yes |
good_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for writing is expected) | string |
n/a | yes |
good_topic_name | The name of the good Kafka topic that enrichment will insert good data into | string |
n/a | yes |
kafka_brokers | The brokers to configure for access to the Kafka Cluster (note: as default the EventHubs namespace broker) | string |
n/a | yes |
name | A name which will be pre-pended to the resources created | string |
n/a | yes |
raw_topic_kafka_password | Password for connection to Kafka cluster under PlainLoginModule (note: as default the EventHubs topic connection string for reading is expected) | string |
n/a | yes |
raw_topic_name | The name of the raw Kafka topic that enrichment will pull data from | string |
n/a | yes |
resource_group_name | The name of the resource group to deploy the service into | string |
n/a | yes |
ssh_public_key | The SSH public key attached for access to the servers | string |
n/a | yes |
subnet_id | The subnet id to deploy the service into | string |
n/a | yes |
accept_limited_use_license | Acceptance of the SLULA terms (https://docs.snowplow.io/limited-use-license-1.0/) | bool |
false |
no |
app_version | App version to use. This variable facilitates dev flow, the modules may not work with anything other than the default value. | string |
"3.9.0" |
no |
assets_update_period | Period after which enrich assets should be checked for updates (e.g. MaxMind DB) | string |
"7 days" |
no |
associate_public_ip_address | Whether to assign a public ip address to this instance | bool |
true |
no |
bad_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
custom_iglu_resolvers | The custom Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[] |
no |
custom_tcp_egress_port_list | For opening up TCP ports to access other destinations not served over HTTP(s) (e.g. for SQL / API enrichments) | list(object({ |
[] |
no |
default_iglu_resolvers | The default Iglu Resolvers that will be used by Enrichment to resolve and validate events | list(object({ |
[ |
no |
eh_namespace_name | The name of the Event Hubs namespace (note: if you are not using EventHubs leave this blank) | string |
"" |
no |
enrichment_anon_ip | n/a | string |
"" |
no |
enrichment_api_request_enrichment_config | n/a | string |
"" |
no |
enrichment_campaign_attribution | n/a | string |
"" |
no |
enrichment_cookie_extractor_config | n/a | string |
"" |
no |
enrichment_currency_conversion_config | n/a | string |
"" |
no |
enrichment_event_fingerprint_config | n/a | string |
"" |
no |
enrichment_http_header_extractor_config | n/a | string |
"" |
no |
enrichment_iab_spiders_and_bots_enrichment | Note: Requires paid database to function | string |
"" |
no |
enrichment_ip_lookups | Note: Requires free or paid subscription to database to function | string |
"" |
no |
enrichment_javascript_script_config | n/a | string |
"" |
no |
enrichment_pii_enrichment_config | n/a | string |
"" |
no |
enrichment_referer_parser | n/a | string |
"" |
no |
enrichment_sql_query_enrichment_config | n/a | string |
"" |
no |
enrichment_ua_parser_config | n/a | string |
"" |
no |
enrichment_weather_enrichment_config | n/a | string |
"" |
no |
enrichment_yauaa_enrichment_config | n/a | string |
"" |
no |
good_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
java_opts | Custom JAVA Options | string |
"-XX:InitialRAMPercentage=75 -XX:MaxRAMPercentage=75" |
no |
kafka_source | The source providing the Kafka connectivity (def: azure_event_hubs) | string |
"azure_event_hubs" |
no |
raw_topic_kafka_username | Username for connection to Kafka cluster under PlainLoginModule (default: '$ConnectionString' which is used for EventHubs) | string |
"$ConnectionString" |
no |
ssh_ip_allowlist | The comma-seperated list of CIDR ranges to allow SSH traffic from | list(string) |
[ |
no |
tags | The tags to append to this resource | map(string) |
{} |
no |
telemetry_enabled | Whether or not to send telemetry information back to Snowplow Analytics Ltd | bool |
true |
no |
user_provided_id | An optional unique identifier to identify the telemetry events emitted by this stack | string |
"" |
no |
vm_instance_count | The instance count to use | number |
1 |
no |
vm_sku | The instance type to use | string |
"Standard_B2s" |
no |
Name | Description |
---|---|
nsg_id | ID of the network security group attached to the Collector Server nodes |
vmss_id | ID of the VM scale-set |
Copyright 2023-present Snowplow Analytics Ltd.
Licensed under the Snowplow Limited Use License Agreement. (If you are uncertain how it applies to your use case, check our answers to frequently asked questions.)