The goal of this project is to create a classifier and see how accurately it can predict song genres. Taking a dataset from Spotify [Pandya, 2022], which is al- ready using machine learning algorithms for these purposes, can help assess if the resulting model can be considered apt for a large-scale business or is more appropriate for a smaller audio streaming market player.
- SVM
- Decision Tree
- Gaussian Naive Bayes
- K-nn
- MLP
- Multinomial Naive Bayes
- Nearest Centroids
- Random Forest
- XGBoost
The contents of the repository are the following:
- data/ → datasets used for this project
- spotify_data: the original Spotify Tracks Dataset
- spotify_clean: dataset without only one genre assigned to each song (generated by using the data-cleaning notebook)
- spotify_simplified: dataset with only 18 unique genres in total (generated by using the clustering notebook)
- data_report: exploratory data analysis for the original dataset
- figures/ → figures generated for the presentation and report (generated using the plots notebook)
- ml_methods/ → notebooks with different machine learning algorithms explored for the project
- baseline → implement the majority and rule-based baselines
- clustering → reduce the number of genres in the dataset to only 18 via a combination of agglomerative clustering and manual input
- data-cleaning → choose only one genre for every song in the dataset that appeared with multiple genres
- data-exploration → visualize the features of the dataset and propose preprocessing steps
- hyperparemter-optimization → hyperparameter optimization implemented using GridSearchCV
- plots → generate plots for the report and presentation
- Activate your virtual environment
- Run the following command to install all the dependencies needed for this project:
pip install -r requirements.txt
- Inspect the code for the different algorithms that were explored (stored under ml_methods/)
Team 1
- Elizaveta Nosova (1983805)
- Miguel Samaniego (1980439)
- Nico Sharei (1986818)
- Julian Ament (1981511)
- Artem Bisliouk (1978986)
- Jannik Kranz (1981766)