N-gram Analysis of Software Reviews

This repository contains Python scripts for conducting N-gram analysis on software reviews.

Dependencies

The following Python libraries are required:

NLTK
pandas
scikit-learn
collections
re
openpyxl

If you're running the script for the first time, uncomment the following lines to download the necessary NLTK corpora:

nltk.download('punkt')
nltk.download('stopwords')
nltk.download('wordnet')

Usage

The script contains two main functions: ngram_analysis and process_date_data.

ngram_analysis The ngram_analysis function reads in an Excel file containing software review data, tokenizes the "All NCSS Capterra Cons" column, removes stopwords, performs lemmatization, generates N-grams, and calculates and prints the frequency distribution of these N-grams.

# Usage
file_path = 'Capterra_Cons_Excel.xlsx'
ngram_analysis(file_path, 3)  # Change 3 to whatever 'n' you want for the N-gram

process_date_data The process_date_data function reads in a CSV file, converts the dataframe into a single string, extracts all four-digit numbers (intended for years), and then uses the N-gram model to calculate and print the number of occurrences for each 4-digit year.

# Usage
file_name = 'Review_Dates.csv'
process_date_data(file_name)

Please ensure the data files are in the same directory as the script when running it.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.idea		.idea
.gitattributes		.gitattributes
Capterra-Forms-On-Fire.csv		Capterra-Forms-On-Fire.csv
Capterra_Cons_Excel.xlsx		Capterra_Cons_Excel.xlsx
Capterra_Review_Dates.csv		Capterra_Review_Dates.csv
NCSS_Capterra_Cons.csv		NCSS_Capterra_Cons.csv
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

N-gram Analysis of Software Reviews

Dependencies

Usage

About

Releases

Packages

Languages

kamron-h/REU_Capterra_N-gram_Analysis

Folders and files

Latest commit

History

Repository files navigation

N-gram Analysis of Software Reviews

Dependencies

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages