This repository has been archived by the owner on Jul 13, 2023. It is now read-only.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi NYCPlanning! I miss you all! I've been using DBT for my new job and it's amazing! I'm creating this PR to just give you a taste on how things can look like if Pluto adopted DBT.
Things DBT solves in the Pluto context
02_build.sh
that manually specifies execution orderfor each table you also get to see the SQL code that generates it!
you can also use the lineage graph to see table lineage
From a development perspective
.env
using a single~/.dbt/profiles.yml
! e.g.this way, you can switch across different projects / prod or dev environment easily in different repo by specifying the specific
target
->dbt run xxxxxx --target devdb-prod
. This is helpful because you can ensure the environment set up is consistent across team members by only having to maintain 1 file instead of 1 file for each repo/project.2. DBT makes it easy to create schema / create table / view or replace them when you want to rerun some code. In the pluto repo, there are a lot of code to
DROP TABLE IF EXISTS
orCREATE TABLE
/CREATE SCHEMA
which adds a lot of bulk to the code, DBT abstracted all that away so you can focus on the business logic -> usually stated in aSELECT
statement.3. We tried to start doing this in DevDB, it is recommended to use
SELECT
for business logic because it's more declarative and more transparent compared toINSERT
orUPDATE
.4. Testing is also made easy, especially in the context of Pluto, we always want to make sure e.g. there's no duplicated BBL in certain tables vs another, you can easily do so by using the
dbt_utils
package out of the box. e.g.this would conduct the following tests:
for column
geo_bbl
for tablestg_geocodes
check the field is unique and not nullfor column
borough
, check the field is not null and contains only values in [1, 2, 3, 4, 5]The
dbt test
command makes it really easy to implement some of the QAQC checks that gave us a lot of a headache.Not implemented, but might be useful
Good luck! lemme know if you have questions! I'm always on github! say hi to everyone for me thanks!