Movie Classification

This dataset was extracted from a dataset from Cornell University(http://www.cs.cornell.edu/~cristian/Cornell_Movie-Dialogs_Corpus.html). After the Data 8 team transformed the dataset (e.g., converting the words to lowercase, removing the naughty words, and converting the counts to frequencies), they created this new dataset containing the frequency of 5000 common words in each movie. This is my attempt to build a classifier that guesses whether a movie is a comedy or a thriller, using only the number of times words appear in the movies's screenplay. This project shows my ability to build a k-nearest-neighbors classifier and test a classifier on data. This project also involves Exploratory Data Analysis using Linear Regression.

Tools: Jupyter Notebook, Python, NumPy, Matplotlib

Created as part of the Data 8 class @ UC Berkeley

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Movie_Classification.ipynb		Movie_Classification.ipynb
README.md		README.md
imdb.csv		imdb.csv
movies.csv		movies.csv
proj3_test_set.csv		proj3_test_set.csv
stem.csv		stem.csv
word_plot.png		word_plot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Movie Classification

About

Releases

Packages

Languages

nitanu32/Movie-Classification

Folders and files

Latest commit

History

Repository files navigation

Movie Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages