Algorithmic Methods for Data Mining - Homework 4

This is a Github repository created to submit the fourth Homework of the Algorithmic Methods for Data Mining (ADM) course for the MSc. in Data Science at the Sapienza University of Rome.

What's inside this repository?

README.md: A markdown file that explains the content of the repository.
main.ipynb: A Jupyter Notebook file containing all the relevant exercises and reports belonging to the homework questions, the Command Line Question, and the Algorithmic Question.
modules/: A folder including 4 Python modules used to solve the exercises in main.ipynb. The files included are:
- __init__.py: A init file that allows us to import the modules into our Jupyter Notebook.
- data_handler.py: A Python file including a DataHandler class designed to handle data cleaning and feature engineering on Kaggle's Netflix Clicks Dataset.
- recommender.py: A Python file including a Recommender class designed to build a Recommendation Engine with LSH using user data obtained from Kaggle's Netflix Clicks Dataset.
- cluster.py: A Python file including three classes: FAMD, KMeans, and KMeans++ designed to perform Factor Analysis of Mixed Data on Kaggle's Netflix Clicks Dataset and then perform parallelized k-Means and k-Means++ clustering using PySpark.
- plotter.py: A Python file including a Plotter class designed to build auxiliary plots for the written report on main.ipynb.
commandline.sh: A bash script including the code to solve the Command Line Question.
images/: A folder containing a screenshot of the successful execution of the commandline.sh script.
.gitignore: A predetermined .gitignore file that tells Git which files or folders to ignore in a Python project.
LICENSE: A file containing an MIT permissive license.

Dataset

In this homework we worked with Kaggle's predefined Netflix Clicks Dataset.

Important Note

If the Notebook doesn't load through Github please try all of these steps:

Try compiling the Notebook through its NBViewer.
Try downloading the Notebook and opening it in your local computer.

Author: Miguel Angel Sanchez Cortes

Email: [email protected]

MSc. in Data Science, Sapienza University of Rome

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Algorithmic Methods for Data Mining - Homework 4

What's inside this repository?

Dataset

Important Note

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
images		images
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
commandline.sh		commandline.sh
main.ipynb		main.ipynb

License

msancor/ADM-HW4

Folders and files

Latest commit

History

Repository files navigation

Algorithmic Methods for Data Mining - Homework 4

What's inside this repository?

Dataset

Important Note

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages