Portfolio of data science projects completed by me for academic, self-learning, and hobby purposes.
-
GameGuard: A Machine Learning solution to protect gamers from microtransaction addiction: GameGuard is a machine learning tool that monitors and mitigates addictive behavior in video games based on data from lootbox purchases, by applying a ML model trained on online gambling data. Its goal is to prevent microtransaction addiction and provide a solution for regulatory agencies and game developers to address the issue. Check the online demo. Tools: Python, Pandas, NumPy, scikit-learn, BeautifulSoup, Flask, Plotly.
-
Film recommender system: example of a recommender system, used to provide suggestions for items that better a particular use and predict their preferences. In this example, the recommender system is used to suggest movies to a particular user, based on the overall movie ratings and the taste of similar-minded users. Tools: Python, Pandas, NumPy, Matplotlib, Jupyter Notebook.
-
K-means chromosome size clustering (chromosome length): Determining the most appropiate way to cluster mice chromosomes according to their relative length in each cell, so the chromosome number can be easily inferred in the absence of other variables, through K-Means clustering. Tools: R, ggplot2, Jupyter Notebook.
-
Chromosome distance ratio bootstrapping (chromosome end-to-end distance ratio): Hypothesis testing by comparing features across five chromosome clusters in two groups of mice (wildtype vs mutant). Since the distribution of the dataset did not fit a normal distribution, the bootstrapping method has been used to resample the available data to infer the confidence interval (CI) of the population. Tools: Python, Pandas, Seaborn, Numpy, Matplotlib, sklearn.
-
Photo collection social network graph (digikam's picture database): Social network graph build using the collective metadata stored in a photo collection, using digiKam's sqlite3 database as a source. It computes the number of times two given people appear in a picture together, and computes an interactive social graph showing the relationships between all people present in the photo library (sample 1: [static]; sample 2: [interactive] [static]). It can also focus on a particular people and the relations in their closest circle of acquaintances (sample: [interactive] [static]), or filter the people based on any existing keyword in the database (sample: pictures labelled with "New York" or "Toronto" [interactive] [static]). Tools: SQL, Python, Pandas, Seaborn, scikit-learn, pyvis.
-
Barcelona Live commuter rail map: Real-time visualization created from train schedules in the Barcelona metro area. The backend parses GTFS data to recreate the schedules, uses the Tabula Python library to parse tables in PDF, Pandas to store and clean the data, and is served as Flask web application that sends the train position data to the front-end, writen in JavaScript, CSS, and SVG. (online-demo: [map-view] [line-view]). Tools: Python, Tabula, Pandas, JavaScript, CSS, Flask.
-
Machine learning:
- Linear regression (e-commerce dataset): Example of the use of a Linear Regression Model to predict sales based on numerical data on a simulated E-commerce setting. Tools: Python, Pandas, SeaBorn, NumPy, Sklearn and Jupyter Notebook.
- Logistic regression (advertising dataset): Using a Logistic Regression Model to predict whether or not a user would click on an ad based on a series of features related to how a user browsed the company website. Tools: Python, Pandas, SeaBorn, NumPy, Sklearn and Jupyter Notebook.
- K-Nearest-Neighbors (KNN-project-data dataset): Using the K-Nearest-Neighbors algorithm, we create a model to predict a binary categorical variable based on a series of cryptic numerical features. Tools: Python, Pandas, SeaBorn, NumPy, Sklearn and Jupyter Notebook.
- Random Forest Classifier (Lending Club dataset): Predicting whether or not a borrower will fully repay a loan based on a series of financial features, comparing the performance of a Decision Tree classifier against a Random Forest Model. Tools: Python, Pandas, SeaBorn, NumPy, Sklearn and Jupyter Notebook.