Skip to content

CS435 (Big Data) - Natural Language Processing Term Project

Notifications You must be signed in to change notification settings

kiram15/bigDataNLP

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

True to the Score: Amazon Product Rating and Classification using Natural Language Processing

This project is a term project for CS435 - Big Data course. Our team of 4 students were instructed to formulate a question or problem, clearly define our goals that we will accomplish with our analytics, and how it can benefit certain parties. The objectives of the term project were to perform a large-scale data analytics using technologies typically used in modern data centers and interpret the results to extract insight from the data.

For our project, we decided to use the Amazon product dataset to formulate a rating scale to determine the whether the consumer’s written review properly matches their star rating using the Stanford Natural Language Processing library.

The project is divided into 3 main parts:

  1. Preprocessing the data.
  2. Giving rating to every individual review using 5 sentiment classes (1, 2, 3, 4, 5), from very negative (1) to very positive (5) by implementing 7 different algorithms in Apache Spark and comparing them.
  3. Calculating overall adjusted star rating for each product using Bayesian average.

About

CS435 (Big Data) - Natural Language Processing Term Project

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages