Movie Classification

This dataset was extracted from a dataset from Cornell University(http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html). After the Data 8 team transformed the dataset (e.g., converting the words to lowercase, removing the naughty words, and converting the counts to frequencies), they created this new dataset containing the frequency of 5000 common words in each movie. This is my attempt to build a classifier that guesses whether a movie is a comedy or a thriller, using only the number of times words appear in the movies's screenplay. This project shows my ability to build a k-nearest-neighbors classifier and test a classifier on data. This project also involves Exploratory Data Analysis using Linear Regression.

Tools: Jupyter Notebook, Python, NumPy, Matplotlib

Created as part of the Data 8 class @ UC Berkeley

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Movie Classification

Files

README.md

Latest commit

History

README.md

File metadata and controls

Movie Classification