-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update the packaging and deployment process for Dagster and bootstrap dbt project #240
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This addresses the work needed for #233 |
shaidar
reviewed
Jun 14, 2022
'pants.backend.python.typecheck.mypy', | ||
'pants.backend.shell', | ||
'pants.backend.shell.lint.shellcheck', | ||
'pants.backend.shell.lint.shfmt', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be a good idea to add bandit
to the list
blarghmatey
force-pushed
the
elt_refactor
branch
2 times, most recently
from
June 15, 2022 19:42
66462c6
to
7a4f010
Compare
This addresses the changes needed for #233 |
We are working on building out a data platform with the core building block being a data lakehouse that is populated and managed with an ELT workflow. To that end this adds additional directory structures, and renames others, to make it clearer which parts of the code are for which task. - Rename ol_data_pipelines -> ol_orchestrate to make it clear that this code is relevant to data orchestration workflows - Create an ol_dbt directory for holding dbt model definitions and associated configurations and workflows - Add Pants to the repository for handling build and packaging of Dagster orchestration pipelines to simplify build and deployment workflows
In order to reduce the overhead of building and deploying the different pipelines this updates the structure of the repository and build flow. The goal is to have BUILD targets defined that specify the Dagster repository to package, build that into a Python distribution, and install that distribution into a Docker image that will get published for deployment. This also moves to a layout with ops/jobs/graphs as the primary concern so that we can abstract the actual tasks across the different business concerns to improve logic reuse.
As part of the updated build/deployment we want to have separate images for the Dagit and dagaster-daemon processes, which are also separate from the user pipeline code so that they can all be built, deployed, and scaled independently. For the user pipelines we also want to ensure that the dbt project is available in the runtime environment. This does the following: - Copy all files related to the dbt project into user pipeline images by default - Create a multi-stage build for Dagit/dagster-daemon to avoid duplicate logic - Moves the Dagster-specific workspace and Dagster yaml files into the `ol_orchestrate` directory - Moves the dbt project files to the proper directory level in the repo - Adds the initial work to package up collections of Dagster pipelines based on the 'repository' as the entry-point for the Python distribution
Start iterating on how to run image builds in Concourse pipelines to ensure that build and push workflows are automated to streamline deployment considerations.
- tag images with `docker_image` to allow for filtering in pants command - package all targets tagged with `docker_image`
We are no longer relying on Invoke in this repository and the package script that it was managing is not the way that we will be building and deploying pipelines going forward. This removes the `tasks.py` file that was used for those build steps.
blarghmatey
force-pushed
the
elt_refactor
branch
from
June 15, 2022 20:08
061d599
to
f914248
Compare
…latest and the dagster version.
… the package name
By having too many types for disparate use cases it encourages mixing of concerns. This splits the types into more bounded domains.
shaidar
reviewed
Jun 29, 2022
7 tasks
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
shaidar
reviewed
Jun 29, 2022
The PEX binaries that are being created for running the dagit and dagster-daemon processes were being defined separate from the locations where they were being used. This consolidates them so that they are in the same BUILD file for more clarity.
👍 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
As part of our effort to bootstrap our data platform we need to have a solid foundation to build from. This restructures the repository to be aligned with the purpose of being the central location for all data platform related code.