serverless-data-pipeline

Description

This project implements a serverless data pipeline for extracting data from the Colombo Stock Market ASI Index API. The pipeline uses an AWS Lambda function to fetch the data and load it into an Amazon Kinesis Firehose Delivery Stream. The Firehose stream then writes the data to an Amazon S3 bucket, buffering the data either for 5 minutes or until it reaches 5KB in size.

Once the data is stored in S3, an event notification is sent to an Amazon SQS queue. Then on-demand AWS Glue workflow job is triggered, where an AWS Glue Crawler is executed to extract the data from the S3 bucket and create an Amazon Athena table. After the raw data is processed by the Glue Crawler, a subsequent Glue bash script transforms the data and loads it into a staging table.

Finally, an ETL job creates an Apache Iceberg table from the staging table using a MERGE operation, completing the data processing pipeline.

Technologies Used

AWS Glue
AWS Lambda
Amazon Athena
Amazon S3
Amazon SQS
Amazon Kinesis Firehose

Prerequisites

Serverless Framework — Used as Infrastructure as Code
Python
AWS Cloud account

Architecture

Steps to Deploy the Project

Clone the repository
Install the Serverless Framework
```
npm install -g serverless
```

Configure the AWS credentials

serverless config credentials --provider aws --key <AWS_ACCESS_KEY_ID> --secret <AWS_SECRET_ACCESS_KEY>

Deploy the project
```
sls deploy
```
This will create Lambda Function, S3 Bucket, SQS Queue, and Firehose Delivery Stream.
Create Glue workflow (based on the scripts inside the glue-script folder) and Crawler.
Test the project!

Author

Sanjay Jayakumar

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.idea		.idea
glue-scripts		glue-scripts
images		images
.gitignore		.gitignore
README.md		README.md
data-extractor.py		data-extractor.py
serverless.yml		serverless.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

serverless-data-pipeline

Description

Technologies Used

Prerequisites

Architecture

Steps to Deploy the Project

Author

About

Languages

Sanjay-dev-ds/aws-serverless-data-pipeline

Folders and files

Latest commit

History

Repository files navigation

serverless-data-pipeline

Description

Technologies Used

Prerequisites

Architecture

Steps to Deploy the Project

Author

About

Topics

Resources

Stars

Watchers

Forks

Languages