README TEAM 1: DATA ANALYST PROJECT

Work on a dataset of ABC Corporation employees.

Data Analysis Proyect

Carrying out a complex data analysis process consisting of several phases, which will be explained below.

INDEX

Introduction
Files
Requirements
The process
Author

INTRODUCTION

Our mission is to identify key factors that influence job satisfaction and ultimately employee retention. To this end, we have carried out a complex data analysis process including: EDA process, data transformation, A/B Testing, visualisations, creation of a MySQL database and ETL process.

FILES

Files required for project review:

HR RAW EMPLOYEES.csv: contains information about ABC Corporation employees.
HR RAW DATA CLEAN.csv: CSV file created by us after a thorough cleaning of the data from the initial CSV.
BBDD_abc_corp_employees.sql: DB created by us from the CSV we generated after data cleansing.

REQUIREMENTS

Make sure you have the following libraries installed in your Python environment:

pandas
numpy
matplotlib
seaborn
scikit-learn
mysql connector
scipy stats, chi2_contingency

If you do not have these libraries, you can install them using pip install

THE PROCESS

Built with

Technologies used in the project:

Operating system: Windows 10 Home
Development Environment: Jupyter Notebook, Visual Studio Code
Programming Language: Python
Libraries specified above
Version Control: Git, GitHub
Dependency Management: Pip
MySQL Workbench

First phase: deep data exploration

Importing libraries and loading data:

Importing and use of pandas to load CSV files into DataFrames.

General exploration

General deep review and analysis of data using Pandas functions to obtain information about the structure of the data and basic statistics.
Initial exploration of the data to identify potential problems (null values, duplicate values, outliers, missing data, etc.).
DataFrame joining

Second phase: data transformation

Verification of data consistency and correctness.
Removing unnecesary columns
Homogenization of titles and values.
Treatment of negative numbers, outliers, null data and duplicated values.

Third phase: visualization

Study of six real-world questions about the data and their representation through graphs.

Fourth phase: DataBase

Creation of a DB (with clean DF) in MySQL Workbench, editing tables and their corresponding relations/restrictions. Lastly, creation of the DB diagram.

Fifth phase: ETL

Data extraction, transformation and loading (ETL): -automation of the data insertion into the DB and the information transformation process to ensure that information is updated and inserted in a consistent manner.

Author

Made with 💜 by [Belén V N (https://github.com/BelenVN), Gloria L C (https://github.com/GloriaLopezChinarro), Viviana V R (https://github.com/Viviana1988) y Cristina R H (https://github.com/cristinarull14)]

Name		Name	Last commit message	Last commit date
Latest commit History 57 Commits
Files		Files
.gitignore		.gitignore
EDA.ipynb		EDA.ipynb
HR RAW DATA CLEAN.csv		HR RAW DATA CLEAN.csv
Presentation.pdf		Presentation.pdf
README.md		README.md
Visualization.ipynb		Visualization.ipynb
conection_solved.py		conection_solved.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README TEAM 1: DATA ANALYST PROJECT

Work on a dataset of ABC Corporation employees.

Data Analysis Proyect

INDEX

INTRODUCTION

FILES

REQUIREMENTS

THE PROCESS

Built with

First phase: deep data exploration

Importing libraries and loading data:

General exploration

Second phase: data transformation

Third phase: visualization

Fourth phase: DataBase

Fifth phase: ETL

Author

ENJOY IT 🤩

About

Releases

Packages

Contributors 3

Languages

Adalab/proyecto-da-promo-b-modulo-3-team-1

Folders and files

Latest commit

History

Repository files navigation

README TEAM 1: DATA ANALYST PROJECT

Work on a dataset of ABC Corporation employees.

Data Analysis Proyect

INDEX

INTRODUCTION

FILES

REQUIREMENTS

THE PROCESS

Built with

First phase: deep data exploration

Importing libraries and loading data:

General exploration

Second phase: data transformation

Third phase: visualization

Fourth phase: DataBase

Fifth phase: ETL

Author

ENJOY IT 🤩

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages