Skip to content

Streaming Change Data Capture using AWS Kinesis and push change to Data Warehouse

Notifications You must be signed in to change notification settings

ngngocuong/Streaming-ChangeDataCapture

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Streaming Change Data Capture

This project is about streaming change data capture by using AWS Kinesis and push data into histor Data Warehouse (storing historical data)

Objective

From transaction database:

transaction

We can build Data Warehouse(store historical data) using SCD type 2 warehouse

Prerequisites

  1. AWS account to setup infrastructure
  2. docker build image to push lambda using container

Design

Data pipeline

Data

The data will be generated by a data generation script at generate_data.py

The script will generate data and push data to AWS Dynamodb using Boto3 but you have to add permission to connect AWS Dynamodb

Setup and Run

  1. Setup AWS Kinesis Data Stream

    • Go to Amazon Kinesis console
    • Create data stream -> Enter the name of data stream and create (example: user_stream)
  2. Setup AWS Dynamodb

    • Go to AWS Dynamodb console
    • Create user_dim table
    create table
    • Go to user_dim -> Exports and Streams
    • From Amazon Kinesis data stream details -> Enable -> Then choose the stream you create above
  3. Build your image and push image to Amazon Elastic Container Registory(ECR)

    • Go to Amazon Elastic Container Registory console -> Repositories -> Then create your repositories to storage image
    • Go to View push commands and follow the instructions
  4. Setup AWS Lambda

    • Go to AWS Lambda -> Create Function -> Choose Container image -> Then create name of Lambda function and choose the image in your repository
    • Enable Kinesis Stream: Choose your lambda function -> Add trigger -> Choose Kinesis as your source and choose Kinesis Data Stream that you create above
    • Add Permission for Lambda to access Kinesis: Go to Configuration -> Permission -> Role name -> Add AmazonKinesisReadOnlyAccess permission in IAM Role of Lambda
  5. Setup AWS RDS using Postgres

All of that AWS service you can setup using AWS CLI or AWS CDK

Demo

  • First, I will run the generate.py and it will create 10 user and insert into database

create_user

- Then I will update data

update_user

- Check the historical data

scd_type2

Contact

Facebook: https://www.facebook.com/cuong.nguyenngoc.1612/

Contact to me if you has any question about this project ✈️✈️

About

Streaming Change Data Capture using AWS Kinesis and push change to Data Warehouse

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published