Skip to content

Scrape OKCupid data and run data analysis and machine learning to better predict variables.

Notifications You must be signed in to change notification settings

gkar90/OKCupid-Project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 

Repository files navigation

OKCupid Project

Introduction

In this project, we take almost 60,000 OK Cupid dating profiles, and we look to:

  • run various statistical analysis to give us insight into the profile
  • create data visualizations to easier see our analysis, and see patterns and trends in the data
  • create various machine learning models to accurately predict variables using a users basic inputs

Dataset

Scraped from OKCupid, uploaded here.

Model

Three machine learning models were used:

  • Linear Regression
  • K Neighbors Classification
  • Decision Tree Classification

Summary

This project was intensive in data cleaning as a lot of the data was either missing, or non-sensical. Automating the data cleaning process and the analysis on the newly cleaned data was easily the longest/hardest part here. In terms of modeling, our most accurate model was a decision tree with almost 80% accuracy (and a macro 80% F1 score). Overall - the decision tree process was the most efficient in our selection of our output variable.

About

Scrape OKCupid data and run data analysis and machine learning to better predict variables.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published