Skip to content

Latest commit

 

History

History
178 lines (132 loc) · 6.24 KB

part05-create-fact.md

File metadata and controls

178 lines (132 loc) · 6.24 KB

Part 5: Create the fact table

After we have created all required dimension tables, we can now create the fact table for fct_sales.

Step 1: Create model files

Let’s create the new dbt model files that will contain our transformation code. Under adventureworks/models/marts, create two files:

  • fct_sales.sql : This file will contain our SQL transformation code.
  • fct_sales.yml : This file will contain our documentation and tests for fct_sales .
adventureworks/models/
└── marts
    ├── fct_sales.sql
    ├── fct_sales.yml

Step 2: Fetch data from the upstream tables

To answer the business questions, we need columns from both salesorderheader and salesorderdetail. Let’s reflect that in fct_sales.sql :

with stg_salesorderheader as (
    select
        salesorderid,
        customerid,
        creditcardid,
        shiptoaddressid,
        status as order_status,
        cast(orderdate as date) as orderdate
    from {{ ref('salesorderheader') }}
),

stg_salesorderdetail as (
    select
        salesorderid,
        salesorderdetailid,
        productid,
        orderqty,
        unitprice,
        unitprice * orderqty as revenue
    from {{ ref('salesorderdetail') }}
)

... 

Step 3: Perform joins

The grain of the fct_sales table is one record in the SalesOrderDetail table, which describes the quantity of a product within a SalesOrderHeader. So we perform a join between salesorderheader and salesorderdetail to achieve that grain.

... 

select
    ... 
from stg_salesorderdetail
inner join stg_salesorderheader on stg_salesorderdetail.salesorderid = stg_salesorderheader.salesorderid

Step 4: Create the surrogate key

Next, we create the surrogate key to uniquely identify each row in the fact table. Each row in the fct_sales table can be uniquely identified by the salesorderid and the salesorderdetailid which is why we use both columns in the generate_surrogate_key() macro.

... 

select
    {{ dbt_utils.generate_surrogate_key(['stg_salesorderdetail.salesorderid', 'salesorderdetailid']) }} as sales_key,
		... 
from stg_salesorderdetail
inner join stg_salesorderheader on stg_salesorderdetail.salesorderid = stg_salesorderheader.salesorderid

Step 5: Select fact table columns

You can now select the fact table columns that will help us answer the business questions identified earlier. We want to be able to calculate the amount of revenue, and therefore we include a column revenue per sales order detail which is calculated by unitprice * orderqty as revenue .

...

select
    {{ dbt_utils.generate_surrogate_key(['stg_salesorderdetail.salesorderid', 'salesorderdetailid']) }} as sales_key,
    stg_salesorderdetail.salesorderid,
    stg_salesorderdetail.salesorderdetailid,
    stg_salesorderdetail.unitprice,
    stg_salesorderdetail.orderqty,
    stg_salesorderdetail.revenue
from stg_salesorderdetail
inner join stg_salesorderheader on stg_salesorderdetail.salesorderid = stg_salesorderheader.salesorderid

Step 6: Create foreign surrogate keys

We want to be able to slice and dice our fact table against the dimension tables we have created in the earlier step. So we need to create the foreign surrogate keys that will be used to join the fact table back to the dimension tables.

We achieve this by applying the generate_surrogate_key() macro to the same unique id columns that we had previously used when generating the surrogate keys in the dimension tables.

...

select
    {{ dbt_utils.generate_surrogate_key(['stg_salesorderdetail.salesorderid', 'salesorderdetailid']) }} as sales_key,
    {{ dbt_utils.generate_surrogate_key(['productid']) }} as product_key,
    {{ dbt_utils.generate_surrogate_key(['customerid']) }} as customer_key,
    {{ dbt_utils.generate_surrogate_key(['creditcardid']) }} as creditcard_key,
    {{ dbt_utils.generate_surrogate_key(['shiptoaddressid']) }} as ship_address_key,
    {{ dbt_utils.generate_surrogate_key(['order_status']) }} as order_status_key,
    {{ dbt_utils.generate_surrogate_key(['orderdate']) }} as order_date_key,
    stg_salesorderdetail.salesorderid,
    stg_salesorderdetail.salesorderdetailid,
    stg_salesorderdetail.unitprice,
    stg_salesorderdetail.orderqty,
    stg_salesorderdetail.revenue
from stg_salesorderdetail
inner join stg_salesorderheader on stg_salesorderdetail.salesorderid = stg_salesorderheader.salesorderid

Step 7: Choose a materialization type

You may choose from one of the following materialization types supported by dbt:

  • View
  • Table
  • Incremental

It is common for fact tables to be materialized as incremental or table depending on the data volume size. As a rule of thumb, if you are transforming millions or billions of rows, then you should start using the incremental materialization. In this example, we have chosen to go with table for simplicity.

Step 8: Create model documentation and tests

Alongside our fct_sales.sql model, we can populate the corresponding fct_sales.yml file to document and test our model.

version: 2

models:
  - name: fct_sales
    columns:

      - name: sales_key
        description: The surrogate key of the fct sales
        tests:
          - not_null
          - unique

      - name: product_key
        description: The foreign key of the product
        tests:
          - not_null

      - name: customer_key
        description: The foreign key of the customer
        tests:
          - not_null 
      
      ... 

      - name: orderqty
        description: The quantity of the product 
        tests:
          - not_null

      - name: revenue
        description: The revenue obtained by multiplying unitprice and orderqty

Step 9: Build dbt models

Execute the dbt run and dbt test commands to run and test your dbt models:

dbt run && dbt test 

Great work, you have successfully created your very first fact and dimension tables! Our dimensional model is now complete!! 🎉 

« Previous Next »