Skip to content

Latest commit

 

History

History
7 lines (4 loc) · 866 Bytes

README.md

File metadata and controls

7 lines (4 loc) · 866 Bytes

Movie Classification

This dataset was extracted from a dataset from Cornell University(http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html). After the Data 8 team transformed the dataset (e.g., converting the words to lowercase, removing the naughty words, and converting the counts to frequencies), they created this new dataset containing the frequency of 5000 common words in each movie. This is my attempt to build a classifier that guesses whether a movie is a comedy or a thriller, using only the number of times words appear in the movies's screenplay. This project shows my ability to build a k-nearest-neighbors classifier and test a classifier on data. This project also involves Exploratory Data Analysis using Linear Regression.

Tools: Jupyter Notebook, Python, NumPy, Matplotlib

Created as part of the Data 8 class @ UC Berkeley