Skip to content

Commit

Permalink
chore(pipeline) : Make the main DAG run hourly
Browse files Browse the repository at this point in the history
Having two DAGS running the geocodages model in the same time doesn't
bode well with the incremental models.

Let's switch to running the main DAG hourly, which also kind of makes
sense as our pipeline is more frequently updated and is also by far the
simpler option, thus the one with the fewer surprises.
  • Loading branch information
vperron committed Sep 19, 2024
1 parent 1af0ec7 commit 6a9d8d7
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 35 deletions.
29 changes: 0 additions & 29 deletions pipeline/dags/compute_hourly.py

This file was deleted.

7 changes: 2 additions & 5 deletions pipeline/dags/dag_utils/dbt.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,13 +48,10 @@ def dbt_operator_factory(
)


def get_staging_tasks(schedule=None):
def get_staging_tasks():
task_list = []

for source_id, src_meta in sorted(SOURCES_CONFIGS.items()):
if schedule and "schedule" in src_meta and src_meta["schedule"] != schedule:
continue

for source_id in sorted(SOURCES_CONFIGS):
dbt_source_id = source_id.replace("-", "_")

stg_selector = f"path:models/staging/sources/**/stg_{dbt_source_id}__*.sql"
Expand Down
2 changes: 1 addition & 1 deletion pipeline/dags/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
dag_id="main",
start_date=pendulum.datetime(2022, 1, 1, tz=date.TIME_ZONE),
default_args=notify_failure_args(),
schedule="0 4 * * *",
schedule="@hourly",
catchup=False,
concurrency=4,
) as dag:
Expand Down

0 comments on commit 6a9d8d7

Please sign in to comment.