Data Engineer - Technical Assesment

Our tech teams are curious, driven, intelligent, pragmatic, collaborative and open-minded and you should be too.

My Solution

I have implemented a solution to use RDDs and spark programming to allow for easy configuration of changing of requested data when needed. Due to the vague types of data required, I therefore created an open ended codebase, for easy changing of required data from the data set. Currently the dataset retrieves the fullUrl and id of the required data.

Technologies Used

Programming Language + Framework - Python - PySpark
Storage Layer - CSV

How To Run

Prequisites:

Python3
Pip
virtualenv (should already be installed with python)

From the top level directory, perform source env/bin/activate to enter into the Virtual environment. (NOTE: Use deactivate to leave the virtual env)

To install the packages, do env/bin/python3 -m pip install -r requirements.txt which will install the dependencies

After intstallation, you can now begin to run the necessary code by doing env/bin/python3 main.py

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
data		data
.gitignore		.gitignore
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Engineer - Technical Assesment

My Solution

Technologies Used

How To Run

Prequisites:

About

Releases

Packages

Languages

Rizxcviii/exa-data-eng-assessment

Folders and files

Latest commit

History

Repository files navigation

Data Engineer - Technical Assesment

My Solution

Technologies Used

How To Run

Prequisites:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages