Skip to content

Commit

Permalink
Release/snowplow normalize/0.3.3 (#37)
Browse files Browse the repository at this point in the history
  • Loading branch information
emielver authored Oct 3, 2023
1 parent 37acc17 commit 7bc8a05
Show file tree
Hide file tree
Showing 12 changed files with 59 additions and 118 deletions.
2 changes: 1 addition & 1 deletion .github/workflows/integration_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ jobs:
dbt run-operation post_ci_cleanup --target ${{ matrix.warehouse }}
- name: Run tests
run: ./.scripts/integration_test.sh -d ${{ matrix.warehouse }}
run: ./.scripts/integration_tests.sh -d ${{ matrix.warehouse }}

- name: "Post-test: Drop ci schemas"
run: |
Expand Down
19 changes: 15 additions & 4 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,3 +1,14 @@
Snowplow Normalize 0.3.3 (2023-09-29)
---------------------------------------
## Summary
- Include the new base macro functionality from utils in the package
- Allow users to specify the timestamp used to process events (from the default of `collector_tstamp`)

## Under the hood
- Simplify the model architecture
## Upgrading
Bump the snowplow-normalize version in your `packages.yml` file.

Snowplow Normalize 0.3.2 (2023-09-12)
---------------------------------------
## Summary
Expand All @@ -19,11 +30,11 @@ To upgrade the package, bump the version number in the `packages.yml` file in yo
Snowplow Normalize 0.3.0 (2023-03-28)
---------------------------------------
## Summary
This version migrates our models away from the `snowplow_incremental_materialization` and instead move to using the built-in `incremental` with an optimization applied on top.
This version migrates our models away from the `snowplow_incremental_materialization` and instead move to using the built-in `incremental` with an optimization applied on top.

## 🚨 Breaking Changes 🚨
### Changes to materialization
To take advantage of the optimization we apply to the `incremental` materialization, users will need to add the following to their `dbt_project.yml` :
To take advantage of the optimization we apply to the `incremental` materialization, users will need to add the following to their `dbt_project.yml` :
```yaml
# dbt_project.yml
...
Expand Down Expand Up @@ -53,7 +64,7 @@ This release allows users to disable the days late data filter to enable normali
- Allow disabling of days late filter by setting `snowplow__days_late_allowed` to `-1` (#28)

## Upgrading
To upgrade the package, bump the version number in the packages.yml file in your project.
To upgrade the package, bump the version number in the packages.yml file in your project.

Snowplow Normalize 0.2.2 (2023-03-13)
---------------------------------------
Expand Down Expand Up @@ -122,7 +133,7 @@ Once you have upgraded your config file, the easiest way to ensure your models m
- Change the `unique_key` in the config section to `unique_id`
- Add a line between the `event_table_name` and `from` lines for each select statement; `, event_id||'-'||'<that_event_table_name>' as unique_id`, with the event table name for that select block.
- For your users table:
- Add 3 new values to the start of the macro call, `'user_id','',''`, before the `user_cols` argument.
- Add 3 new values to the start of the macro call, `'user_id','',''`, before the `user_cols` argument.

### Upgrade your filtered events table
If you use the master filtered events table, you will need to add a new column for the latest version to work. If you have not processed much data yet it may be easier to simply re-run the package from scratch using `dbt run --full-refresh --vars 'snowplow__allow_refresh: true'`, alternatively run the following in your warehouse, replacing the schema/dataset/warehouse and table name for your table:
Expand Down
12 changes: 4 additions & 8 deletions dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@

name: 'snowplow_normalize'
version: '0.3.2'
version: '0.3.3'
config-version: 2

require-dbt-version: [">=1.4.0", "<2.0.0"]
Expand Down Expand Up @@ -38,6 +38,7 @@ vars:
snowplow__query_tag: "snowplow_dbt"
snowplow__dev_target_name: 'dev'
snowplow__allow_refresh: false
snowplow__session_timestamp: 'collector_tstamp'
# Variables - Databricks Only
# Add the following variable to your dbt project's dbt_project.yml file
# Depending on the use case it should either be the catalog (for Unity Catalog users from databricks connector 1.1.1 onwards) or the same value as your snowplow__atomic_schema (unless changed it should be 'atomic')
Expand All @@ -51,7 +52,7 @@ on-run-start:

# Update manifest table with last event consumed per sucessfully executed node/model
on-run-end:
- "{{ snowplow_utils.snowplow_incremental_post_hook('snowplow_normalize') }}"
- "{{ snowplow_utils.snowplow_incremental_post_hook('snowplow_normalize', 'snowplow_normalize_incremental_manifest', 'snowplow_normalize_base_events_this_run', var('snowplow__session_timestamp')) }}"


# Tag 'snowplow_normalize_incremental' allows snowplow_incremental_post_hook to identify Snowplow models
Expand All @@ -67,9 +68,4 @@ models:
scratch:
+schema: "scratch"
+tags: "scratch"
bigquery:
enabled: "{{ target.type == 'bigquery' | as_bool() }}"
databricks:
enabled: "{{ target.type in ['databricks', 'spark'] | as_bool() }}"
snowflake:
enabled: "{{ target.type == 'snowflake' | as_bool() }}"
+enabled: "{{ target.type in ['bigquery', 'databricks', 'spark', 'snowflake'] | as_bool() }}"
2 changes: 1 addition & 1 deletion integration_tests/dbt_project.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
name: 'snowplow_normalize_integration_tests'
version: '0.3.2'
version: '0.3.3'
config-version: 2

profile: 'integration_tests'
Expand Down
11 changes: 1 addition & 10 deletions models/base/manifest/snowplow_normalize_incremental_manifest.sql
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,4 @@
-- Boilerplate to generate table.
-- Table updated as part of end-run hook

with prep as (
select
cast(null as {{ snowplow_utils.type_max_string() }}) model,
cast('1970-01-01' as {{ type_timestamp() }}) as last_success
)

select *

from prep
where false
{{ snowplow_utils.base_create_snowplow_incremental_manifest() }}

This file was deleted.

This file was deleted.

This file was deleted.

30 changes: 30 additions & 0 deletions models/base/scratch/snowplow_normalize_base_events_this_run.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
{{
config(
tags=["this_run"],
sql_header=snowplow_utils.set_query_tag(var('snowplow__query_tag', 'snowplow_dbt'))
)
}}

{%- set lower_limit, upper_limit, session_start_limit = snowplow_utils.return_base_new_event_limits(ref('snowplow_normalize_base_new_event_limits')) %}

select
a.*

from {{ var('snowplow__events') }} as a

where
{# dvce_sent_tstamp is an optional field and not all trackers/webhooks populate it, this means this filter needs to be optional #}
{% if var("snowplow__days_late_allowed") == -1 %}
1 = 1
{% else %}
a.dvce_sent_tstamp <= {{ snowplow_utils.timestamp_add('day', var("snowplow__days_late_allowed", 3), 'a.dvce_created_tstamp') }}
{% endif %}
and a.{{ var('snowplow__session_timestamp', 'collector_tstamp') }} >= {{ lower_limit }}
and a.{{ var('snowplow__session_timestamp', 'collector_tstamp') }} <= {{ upper_limit }}
{% if var('snowplow__derived_tstamp_partitioned', true) and target.type == 'bigquery' | as_bool() %}
and a.derived_tstamp >= {{ snowplow_utils.timestamp_add('hour', -1, lower_limit) }}
and a.derived_tstamp <= {{ upper_limit }}
{% endif %}
and {{ snowplow_utils.app_id_filter(var("snowplow__app_id",[])) }}

qualify row_number() over (partition by a.event_id order by a.collector_tstamp{% if target.type in ['databricks', 'spark'] -%}, a.etl_tstamp {%- endif %}) = 1
11 changes: 6 additions & 5 deletions models/base/scratch/snowplow_normalize_base_new_event_limits.sql
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,15 @@
{% set min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models = snowplow_utils.get_incremental_manifest_status(ref('snowplow_normalize_incremental_manifest'), models_in_run) -%}
has_matched_all_models = snowplow_utils.get_incremental_manifest_status(ref('snowplow_normalize_incremental_manifest'),
models_in_run) -%}


{% set run_limits_query = snowplow_utils.get_run_limits(min_last_success,
max_last_success,
models_matched_from_manifest,
has_matched_all_models,
var("snowplow__start_date","2020-01-01")) -%}
max_last_success,
models_matched_from_manifest,
has_matched_all_models,
var("snowplow__start_date","2020-01-01")) -%}


{{ run_limits_query }}
2 changes: 1 addition & 1 deletion packages.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
packages:
- package: snowplow/snowplow_utils
version: [">=0.14.0", "<0.16.0"]
version: [">=0.15.1", "<0.16.0"]

0 comments on commit 7bc8a05

Please sign in to comment.