Skip to content

This project creates a serverless data pipeline to extract data from the Colombo Stock Market ASI Index API using AWS Lambda, Kinesis Firehose, and S3. An AWS Glue workflow processes and transforms the data, storing it in an Apache Iceberg table via Athena and Glue ETL jobs.

Notifications You must be signed in to change notification settings

Sanjay-dev-ds/aws-serverless-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

serverless-data-pipeline

Description

This project implements a serverless data pipeline for extracting data from the Colombo Stock Market ASI Index API. The pipeline uses an AWS Lambda function to fetch the data and load it into an Amazon Kinesis Firehose Delivery Stream. The Firehose stream then writes the data to an Amazon S3 bucket, buffering the data either for 5 minutes or until it reaches 5KB in size.

Once the data is stored in S3, an event notification is sent to an Amazon SQS queue. Then on-demand AWS Glue workflow job is triggered, where an AWS Glue Crawler is executed to extract the data from the S3 bucket and create an Amazon Athena table. After the raw data is processed by the Glue Crawler, a subsequent Glue bash script transforms the data and loads it into a staging table.

Finally, an ETL job creates an Apache Iceberg table from the staging table using a MERGE operation, completing the data processing pipeline.

Technologies Used

  • AWS Glue
  • AWS Lambda
  • Amazon Athena
  • Amazon S3
  • Amazon SQS
  • Amazon Kinesis Firehose

Prerequisites

  • Serverless Framework — Used as Infrastructure as Code
  • Python
  • AWS Cloud account

Architecture

architecture-serverless-etl.png

Steps to Deploy the Project

  1. Clone the repository
  2. Install the Serverless Framework
    npm install -g serverless
  3. Configure the AWS credentials
    serverless config credentials --provider aws --key <AWS_ACCESS_KEY_ID> --secret <AWS_SECRET_ACCESS_KEY>
  4. Deploy the project
    sls deploy
    This will create Lambda Function, S3 Bucket, SQS Queue, and Firehose Delivery Stream.
  5. Create Glue workflow (based on the scripts inside the glue-script folder) and Crawler.
  6. Test the project!

Author

Sanjay Jayakumar

About

This project creates a serverless data pipeline to extract data from the Colombo Stock Market ASI Index API using AWS Lambda, Kinesis Firehose, and S3. An AWS Glue workflow processes and transforms the data, storing it in an Apache Iceberg table via Athena and Glue ETL jobs.

Topics

Resources

Stars

Watchers

Forks

Languages