This is a project for the "Modern Database Systems" lecture, held at the Technische Hochschule Köln. Aim of the project is to find a use case where modern NoSQL databases outperform SQL databases. We decided to build a music similarity search engine, where you can search for a song and get similar songs back. The similarity is based on the lyrics and the audio features of the song. The project is based on the Spotify Dataset from Kaggle.
- Node.js
- Docker
- Python 3.8
- Spotify API credentials
- Install the requirements with
pip install -r requirements.txt
-
Download the dataset from Kaggle
-
Unzip the dataset and place it in the
data
directory -
Clean the dataset with
python3 clean_data.py
- Create an
.env
file in thewebapp
directory with the following content. You need this to enable the Spotify API. You need to set up a spotify application. Look here for more information.
SPOTIFY_CLIENT_ID=your_spotify_client_id
SPOTIFY_CLIENT_SECRET=your_spotify_client_secret
PORT=5001
- First time using this project create the Weaviate.io & vectorizer server containers and start the web server with
docker-compose up
- Import the dataset into Weaviate.io with
python weaviate_import.py
- Go to
localhost:3000
and enjoy listening to songs!
-
If not already done, repeat all steps mentioned in Installation to initiliaze the vector database
-
Clean the dataset with clean_sql_data and create the required .sql script
python3 clean_sql_data.py
- Upload the dataset to the sql server
python3 sql_import.py
- Run the benchmark script and see the results in the console or in the
benchmark_results.csv
file
python3 benchmark.py
- Max Hammer
- Dennis Goessler