Music Similarity Search

Motivation

This is a project for the "Modern Database Systems" lecture, held at the Technische Hochschule Köln. Aim of the project is to find a use case where modern NoSQL databases outperform SQL databases. We decided to build a music similarity search engine, where you can search for a song and get similar songs back. The similarity is based on the lyrics and the audio features of the song. The project is based on the Spotify Dataset from Kaggle.

Requirements

Node.js
Docker
Python 3.8
Spotify API credentials

Installation

Install the requirements with

pip install -r requirements.txt

Download the dataset from Kaggle
Unzip the dataset and place it in the data directory
Clean the dataset with

python3 clean_data.py

Create an .env file in the webapp directory with the following content. You need this to enable the Spotify API. You need to set up a spotify application. Look here for more information.

SPOTIFY_CLIENT_ID=your_spotify_client_id
SPOTIFY_CLIENT_SECRET=your_spotify_client_secret
PORT=5001

First time using this project create the Weaviate.io & vectorizer server containers and start the web server with

docker-compose up

Import the dataset into Weaviate.io with

python weaviate_import.py

Go to localhost:3000 and enjoy listening to songs!

Benchmark

If not already done, repeat all steps mentioned in Installation to initiliaze the vector database
Clean the dataset with clean_sql_data and create the required .sql script

python3 clean_sql_data.py

Upload the dataset to the sql server

python3 sql_import.py

Run the benchmark script and see the results in the console or in the benchmark_results.csv file

python3 benchmark.py

Dataset

Spotify Dataset

Authors

Max Hammer
Dennis Goessler

Name		Name	Last commit message	Last commit date
Latest commit History 73 Commits
data		data
plots		plots
schema		schema
webapp		webapp
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MDS-Spotify_Semantic_Search.pdf		MDS-Spotify_Semantic_Search.pdf
README.md		README.md
benchmark.py		benchmark.py
clean_data.py		clean_data.py
clean_sql_data.py		clean_sql_data.py
clean_sql_data_subTables.py		clean_sql_data_subTables.py
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
requirements.txt		requirements.txt
sql_import.py		sql_import.py
utils.py		utils.py
vis.ipynb		vis.ipynb
weaviate_import.py		weaviate_import.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Music Similarity Search

Table of contents

Motivation

Requirements

Installation

Benchmark

Dataset

Authors

About

Releases

Packages

Contributors 2

Languages

License

MaxHam/MDS_Spotify-Semantic-Search

Folders and files

Latest commit

History

Repository files navigation

Music Similarity Search

Table of contents

Motivation

Requirements

Installation

Benchmark

Dataset

Authors

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages