Skip to content

Commit

Permalink
Version 1 update. See NEWS for details
Browse files Browse the repository at this point in the history
  • Loading branch information
mjockers committed Apr 25, 2016
1 parent 596a27d commit 7e51d03
Show file tree
Hide file tree
Showing 17 changed files with 14,401 additions and 369 deletions.
9 changes: 5 additions & 4 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,18 +1,19 @@
Package: syuzhet
Type: Package
Title: Extracts Sentiment and Sentiment-Derived Plot Arcs from Text
Version: 0.3.0
Date: 2015-01-20
Version: 1.0.0
Date: 2016-04-22
Authors@R: person("Matthew", "Jockers", email = "[email protected]",
role = c("aut", "cre"))
Maintainer: Matthew Jockers <[email protected]>
Description: Extracts sentiment and sentiment-derived plot arcs
from text using three sentiment dictionaries conveniently
packaged for consumption by R users. Implemented dictionaries include
packaged for consumption by R users. Implemented dictionaries include
"syuzhet" (default) developed in the Nebraska Literary Lab
"afinn" developed by Finn {\AA}rup Nielsen, "bing" developed by Minqing Hu
and Bing Liu, and "nrc" developed by Mohammad, Saif M. and Turney, Peter D.
Applicable references are available in README.md and in the documentation
for the "get_sentiment" function. The package also provides a method for
for the "get_sentiment" function. The package also provides a hack for
implementing Stanford's coreNLP sentiment parser. The package provides
several methods for plot arc normalization.
URL: https://github.com/mjockers/syuzhet
Expand Down
2 changes: 1 addition & 1 deletion NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,5 @@ export(get_text_as_string)
export(get_tokens)
export(get_transformed_values)
export(rescale)
export(rescale2)
export(rescale_x_2)
export(simple_plot)
13 changes: 11 additions & 2 deletions NEWS
Original file line number Diff line number Diff line change
@@ -1,10 +1,19 @@
syuzhet 1.0.0
==========
* Added get_dct_transform(), rescale_x_2(), and simple_plot() functions.

* Added new "syuzhet" (default) sentiment dictionary.

* Updated documentation and vignettes.

syuzhet 0.3.0
==========
Added get_tokens for word level tokenization in addition to sentence level. Updated get_transformed_values with changes from Tommy MacGuire that resolve the periodicity artifacts at the beginnings and ends of a transfomred signal. Updated documentation.
* Added get_tokens() for word level tokenization in addition to sentence level.
* Updated get_transformed_values() with changes from Tommy MacGuire that resolve the periodicity artifacts at the beginnings and ends of a transfomred signal. Updated documentation.

syuzhet 0.2.2
==========
Added quote stripping option to get_sentences() function (Thanks to Annie Swafford)
* Added quote stripping option to get_sentences() function (Thanks to Annie Swafford)

syuzhet 0.2.1
==========
Expand Down
Binary file modified R/sysdata.rda
Binary file not shown.
57 changes: 37 additions & 20 deletions R/syuzhet.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,10 +46,10 @@ get_sentences <- function(text_of_file, strip_quotes = TRUE){

#' Get Sentiment Values for a String
#' @description
#' Iterates over a vector of strings and returns sentiment values based on user supplied method.
#' Iterates over a vector of strings and returns sentiment values based on user supplied method. The default method, "syuzhet" is a custom sentiment dictionary developed in the Nebraska Literary Lab. The default dictionary should be better tuned to fiction as the terms were extracted from a collection of 165,000 human coded sentences taken from a small corpus of contemporary novels.
#'
#' @param char_v A vector of strings for evaluation.
#' @param method A string indicating which sentiment method to use. Options include "bing", "afinn", "nrc" and "stanford." See references for more detail on methods.
#' @param method A string indicating which sentiment method to use. Options include "syuzhet", "bing", "afinn", "nrc" and "stanford." See references for more detail on methods.
#'
#' @references Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web." Proceedings of the 14th International World Wide Web conference (WWW-2005), May 10-14, 2005, Chiba, Japan.
#'
Expand All @@ -67,10 +67,10 @@ get_sentences <- function(text_of_file, strip_quotes = TRUE){
#' @return Return value is a numeric vector of sentiment values, one value for each input sentence.
#' @export
#'
get_sentiment <- function(char_v, method = c("afinn", "bing", "nrc", "stanford"), path_to_tagger = NULL){
if (!is.character(char_v)) stop("Data must be a character vector.")
method <- match.arg(method)
if(method == "afinn" || method == "bing"){
get_sentiment <- function(char_v, method = "syuzhet", path_to_tagger = NULL){
if(is.na(pmatch(method, c("syuzhet", "afinn", "bing", "nrc", "stanford")))) stop("Invalid Method")
if(!is.character(char_v)) stop("Data must be a character vector.")
if(method == "afinn" || method == "bing" || method == "syuzhet"){
word_l <- strsplit(tolower(char_v), "[^A-Za-z']+")
result <- unlist(lapply(word_l, get_sent_values, method))
}
Expand All @@ -91,20 +91,24 @@ get_sentiment <- function(char_v, method = c("afinn", "bing", "nrc", "stanford")

#' Assigns Sentiment Values
#' @description
#' Assigns sentiment values to words based on preloaded dictionary
#' Assigns sentiment values to words based on preloaded dictionary. The default is the syuzhet dictionary.
#'
#' @param char_v A string
#' @param method A string indicating which sentiment dictionary to use
#' @return A single numerical value (positive or negative)
#' based on the assessed sentiment in the string
#'
get_sent_values<-function(char_v, method = "bing"){
get_sent_values<-function(char_v, method = "syuzhet"){
if(method == "bing") {
result <- sum(bing[which(bing$word %in% char_v), "value"])
}
else if(method == "afinn"){
result <- sum(afinn[which(afinn$word %in% char_v), "value"])
}
else if(method == "syuzhet"){
char_v <- gsub("-", "", char_v)
result <- sum(syuzhet_dict[which(syuzhet_dict$word %in% char_v), "value"])
}
else if(method == "nrc") {
result <- get_nrc_sentiment(char_v)
}
Expand Down Expand Up @@ -264,10 +268,11 @@ rescale <- function(x){

#' Bi-Directional x and y axis Rescaling
#' @description
#' Rescale Transformed values from -1 to 1 on the y-axis and scale from zero to 1 on the x-axis
#' Rescales input values to two scales (0 to 1 and -1 to 1) on the y-axis and also creates a scaled vector of x axis values from 0 to 1. This function is useful for plotting and plot comparison.
#' @param v A vector of values
#' @return A list of three vectors (x, y, z). x is a vector of values from 0 to 1 equal in length to the input vector v. y is a scaled (from 0 to 1) vector of the input values equal in lenght to the input vector v. z is a scaled (from -1 to +1) vector of the input values equal in length to the input vector v.
#' @export
rescale2 <- function(v){
rescale_x_2 <- function(v){
x <- 1:length(v)/length(v)
y <- v/max(v)
z <- 2 * (v - min(v))/(max(v) - min(v)) - 1
Expand All @@ -283,16 +288,24 @@ rescale2 <- function(v){
#' @export
simple_plot <- function(raw_values, title="Syuzhet Plot", legend_pos="top"){
wdw <- round(length(raw_values)*.1) # wdw = 10% of length
rolled <- zoo::rollmean(raw_values, k = wdw, fill = 0)
trans <- get_transformed_values(raw_values, x_reverse_len = length(raw_values), scale_range = T)
rolled <- rescale(zoo::rollmean(raw_values, k = wdw, fill = 0))
half <- round(wdw/2)
rolled[1:half] <- NA
end <- length(rolled) - half
rolled[end:length(rolled)] <- NA
trans <- get_dct_transform(raw_values, x_reverse_len = length(raw_values), scale_range = T)
x <- 1:length(raw_values)
y <- raw_values
raw_lo <- loess(y ~ x, span=.5)
low_line <- rescale(predict(raw_lo))
plot(low_line, type = "l", ylim = c(-1,1), main = title, xlab = "Narrative Time", ylab = "Scaled Sentiment")
lines(rescale(rolled), col="blue")
par(mfrow=c(2, 1))
plot(low_line, type = "l", ylim = c(-1,1), main = title, xlab = "Full Narrative Time", ylab = "Scaled Sentiment")
lines(rolled, col="blue")
lines(trans, col="red")
legend(legend_pos, c("Loess Smooth", "Rolling Mean", "Simple Syuzhet"), lty=1, lwd=1,col=c('black', 'blue', 'red'), bty='n', cex=.75)
legend(legend_pos, c("Loess Smooth", "Rolling Mean", "Syuzhet DCT"), lty=1, lwd=1,col=c('black', 'blue', 'red'), bty='n', cex=.75)
normed_trans <- get_dct_transform(raw_values, scale_range = T)
plot(normed_trans,type = "l", ylim = c(-1,1), main = "Normalized Simple Shape", xlab = "Normalized Narrative Time", ylab = "Scaled Sentiment", col="red")
par(mfrow=c(1, 1))
}

#' Discrete Cosine Transformation with Reverse Transform to Time Domain
Expand All @@ -303,14 +316,20 @@ simple_plot <- function(raw_values, title="Syuzhet Plot", legend_pos="top"){
#' @param raw_values the raw sentiment values
#' calculated for each sentence
#' @param low_pass_size The number of components
#' to retain in the low pass filtering. Default = 10
#' to retain in the low pass filtering. Default = 5
#' @param x_reverse_len the number of values to return via decimation. Default = 100
#' @param scale_range Logical determines whether or not to scale the values from -1 to +1. Default = FALSE. If set to TRUE, the lowest value in the vector will be set to -1 and the highest values set to +1 and all the values scaled accordingly in between.
#' @param scale_vals Logical determines whether or not to normalize the values using the scale function Default = FALSE. If TRUE, values will be scaled by subtracting the means and scaled by dividing by their standard deviations. See ?scale
#' @return The transformed values
#' @export
#'
get_dct_transform <- function(raw_values, low_pass_size = 10, x_reverse_len = 100, scale_vals = FALSE, scale_range = FALSE){
#' @examples
#' s_v <- get_sentences("I begin this story with a neutral statement.
#' Now I add a statement about how much I despise cats.
#' I am allergic to them. I hate them. Basically this is a very silly test. But I do love dogs!")
#' raw_values <- get_sentiment(s_v, method = "syuzhet")
#' dct_vals <- get_dct_transform(raw_values)
#' plot(dct_vals, type="l", ylim=c(-0.1,.1))
get_dct_transform <- function(raw_values, low_pass_size = 5, x_reverse_len = 100, scale_vals = FALSE, scale_range = FALSE){
if (!is.numeric(raw_values))
stop("Input must be an numeric vector")
if (low_pass_size > length(raw_values))
Expand All @@ -334,5 +353,3 @@ get_dct_transform <- function(raw_values, low_pass_size = 10, x_reverse_len = 10





18 changes: 9 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,34 +2,34 @@
# Syuzhet
An R package for the extraction of sentiment and sentiment-based plot arcs from text.

The name "Syuzhet" comes from the Russian Formalist Vladimir Propp who divided narrative construction into two components, the "fabula" and the "syuzhet." Syuzhet refers to the "device" or technique of a narrative whereas fabula is the chronological order of events. Syuzhet, therefore, is concerned with the manner in which the elements of the story (fabula) are organized.
The name "Syuzhet" comes from the Russian Formalists Victor Shklovsky and Vladimir Propp who divided narrative into two components, the "fabula" and the "syuzhet." Syuzhet refers to the "device" or technique of a narrative whereas fabula is the chronological order of events. Syuzhet, therefore, is concerned with the manner in which the elements of the story (fabula) are organized (syuzhet).

The Syuzhet package attempts to reveal the latent structure of narrative by means of sentiment analysis. Instead of detecting shifts in the topic or subject matter of the narrative ([as Ben Schmidt has done](http://sappingattention.blogspot.com/2014/12/fundamental-plot-arcs-seen-through.html)), the Syuzhet package reveals the emotional and affectual shifts that serve as proxies for the narrative movement between conflict and conflict resolution. This was an idea explored by the late Kurt Vonnegut in an essay titled "Here's a Lesson in Creative Writing" in his collection *A Man Without A Country* ( Random House, 2007). [A lecture Vonnegut gave on this subject is available via youTube](https://www.youtube.com/watch?v=oP3c1h8v2ZQ)

A deeper discussion and theoretical justification for the approach implemented in this package will be found in Jockers, Matthew L. "Syuzhet: Revealing Plot and Sentiment Arcs." [forthcoming 2015]. Interested readers may also wish to consult [A Novel Method for Detecting Plot](http://www.matthewjockers.net/2014/06/05/a-novel-method-for-detecting-plot/) and [So What?](http://www.matthewjockers.net/2014/05/07/so-what/).
The Syuzhet package attempts to reveal the latent structure of narrative by means of sentiment analysis. Instead of detecting shifts in the topic or subject matter of the narrative ([as Ben Schmidt has done](http://sappingattention.blogspot.com/2014/12/fundamental-plot-arcs-seen-through.html)), the Syuzhet package reveals the emotional shifts that serve as proxies for the narrative movement between conflict and conflict resolution. This was an idea inspired by the late Kurt Vonnegut in an essay titled "Here's a Lesson in Creative Writing" in his collection *A Man Without A Country* ( Random House, 2007). [A lecture Vonnegut gave on this subject is available via youTube](https://www.youtube.com/watch?v=oP3c1h8v2ZQ)

*Thanks to Lincoln Mullen for early feedback on this package (see http://rpubs.com/lmullen/58030)*.

## Installation

This package is now available on CRAN (http://cran.r-project.org/web/packages/syuzhet/), but you can easily install the most current development version from gitHub using the devtools package:
This package is now available on CRAN (http://cran.r-project.org/web/packages/syuzhet/). You can install the most current development version from gitHub using the devtools package:

```R
# install.packages("devtools")
devtools::install_github("mjockers/syuzhet")
```
## References
Syuchet incorporates the work of other scholars as follows:
Syuzhet incorporates four sentiment lexicons:

The default "Syuzhet" lexicon was developed in the Nebraska Literary Lab under the direction of Matthew L. Jockers

Finn {\AA}rup Nielsen - AFINN WORD DATABASE
The "afinn" lexicon was develoepd by Finn Arup Nielsen as the AFINN WORD DATABASE
See: See http://www2.imm.dtu.dk/pubdb/views/publication_details.php?id=6010
The AFINN database of words is copyright protected and distributed under
"Open Database License (ODbL) v1.0" http://www.opendatacommons.org/licenses/odbl/1.0/ or a similar copyleft license.

Minqing Hu and Bing Liu - OPINION LEXICON
The "bing" lexicon was develoepd by Minqing Hu and Bing Liu as the OPINION LEXICON
See: http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

Mohammad, Saif M. and Turney, Peter D. - NRC EMOTION LEXICON.
The "nrc" lexicon was develoepd by Mohammad, Saif M. and Turney, Peter D. as the NRC EMOTION LEXICON.
See: http://saifmohammad.com/WebPages/lexicons.html
The NRC EMOTION LEXICON is released under the following terms of use:
Terms of use:
Expand Down
Loading

0 comments on commit 7e51d03

Please sign in to comment.