Incorrect values get_sentiment and get_nrc_sentiment with Swedish text #39

lisagy · 2021-08-27T14:19:59Z

Hi! I am very new to R and GitHub and coding overall so I apologize for any following mistakes!

I am trying to do a sentiment analysis of a Swedish novel with the help of the syuzhet package but noticed the get_sentiment and get_nrc_sentiment function read the value of certain words incorrectly. I first noticed it with my custom lexicon but then did a test with the nrc lexicon as well and saw that both give incorrect values for words with the letters ö, ä and å in them. Most of the time these words get value 0 (while they should be getting 1 or -1), but I’ve also seen a case where the word gets assigned a positive value (1) while it should be negative (-1).

I've changed RStudio's default encoding to utf-8 and my system's locale to Swedish but nothing has helped.
How could I solve this problem? This is the code I would use to get my results:

# For the nrc lexicon

binas_historia <- read_file(file.choose())
bina_words <- get_tokens(binas_historia, pattern = "\\W")
sentiment_b_nrc <- get_nrc_sentiment(bina_words, language = "swedish")
overzichtje_nrc <- data.frame(bina_words, nrc_data)

# For the Swedish (custom) lexicon 

binas_historia <- read_file(file.choose())
bina_words <- get_tokens(binas_historia, pattern = "\\W")
sensaldo_lexicon <- read.table("HP/Thesis/sensaldo-fullform.txt", 
header = FALSE,
col.names = c("word", "category", "value"), 
colClasses = c("character", "character", "numeric"),
encoding = "UTF-8")
sentiment_b_s <- get_sentiment(bina_words, method = "custom", lexicon = sensaldo_lexicon)
overzichtje_sensaldo <- data.frame(bina_words, sentiment_b_s)

The text was updated successfully, but these errors were encountered:

mjockers · 2023-02-17T13:14:55Z

Since leaving academia, I rarely find time to work on this package anymore. Support for non-English languages is weak. I encourage you to develop a solution and submit as a PR.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Incorrect values get_sentiment and get_nrc_sentiment with Swedish text #39

Incorrect values get_sentiment and get_nrc_sentiment with Swedish text #39

lisagy commented Aug 27, 2021 •

edited

Loading

mjockers commented Feb 17, 2023

Incorrect values get_sentiment and get_nrc_sentiment with Swedish text #39

Incorrect values get_sentiment and get_nrc_sentiment with Swedish text #39

Comments

lisagy commented Aug 27, 2021 • edited Loading

mjockers commented Feb 17, 2023

lisagy commented Aug 27, 2021 •

edited

Loading