Skip to content

This project is an overview of an Weather Data Analysis Pipeline that extracts the weather data live from the weather APIs and load it into the Readshift after reuired transformation. This Project is using the the AWS Services like S3, CodeBuild, Airflow, Glue, Redshift etc.

Notifications You must be signed in to change notification settings

yash872/Weather-Data-Analysis-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Weather-Data-Analysis-Project


Project Overview

This project is an overview of an Weather Data Analysis Pipeline that extracts the weather data live from the weather APIs and load it into the Readshift after reuired transformation. This Project is using the the AWS Services like S3, CodeBuild, Airflow, Glue, Redshift etc.


Architectural Diagram

Weather-Data-Analysis


Key Steps

1. Create a S3 bucket

  • we will create a S3 bucket "airflow-managed-yb" to store the airflow scripts under dags folder and requirement.txt. S3

2. Create a Codebuild Project

  • we will create a CodeBuild Project "weather-cicd" for the CICD setup which will copy the dags script and othe files to the S3 on the pull merge request. Codebuild

3. Create a Airflow Environment

  • we will create a managed Airflow Environment "airflow-cluster-1" to orchestrate our pipeline. Airflow

4. Create a pull merge request on github

  • we will create a pull merge request on github to pull scripts and related files from test to main. with the pull merge request, CodeBuild will be triggerd and perform the actions mentioned in buildspec.yml codeBuilsSuccess

    • copy dags python script in the S3 dags filder.

    • S3Dags

    • copy the requirement.txt file in the S3 bucket.

    • S3Req

    • copy the Glue python script to the related glue S3 bucket.

    • S3GlueScript

5. Check the Dags in Airflow UI

  • Both the Dags available in the S3 Dags folder should appear in the Airflow UI.
    • openweather_api_dag
    • transform_redshift_dag AirflowDags

6. Trigger the Dags

  • We will trigger the Dag "openweather_api_dag" and start the execution. DagsTrigger

this Dag will perform 3 Tasks dag1

  • Extract weather data from API and store in xcom
    • xcom

    • Upload that data in S3 bucket "weather-data-yb" as weather_api_data.csv

    • S3Data

    • Trigger the Transform Redshift Dag

    • dag2

      • Glue Job "glue_transform_task" will be created
      • glueJob
      • Data will be transformed and load to Redshift table "public.weather_data"
      • Redshift

About

This project is an overview of an Weather Data Analysis Pipeline that extracts the weather data live from the weather APIs and load it into the Readshift after reuired transformation. This Project is using the the AWS Services like S3, CodeBuild, Airflow, Glue, Redshift etc.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages