Skip to content

Latest commit

 

History

History
624 lines (594 loc) · 15.5 KB

comparison_2023-03-27.md

File metadata and controls

624 lines (594 loc) · 15.5 KB

Comparing numbers

Summary

PolitiFact Acc GossipCop Acc
Paper 1 (TBA) 0.8632** 0.8388**
Paper 2 (2019) 0.691 0.822
Paper 3 (2020) 0.8 0.82
Paper 4 (2023) 0.584 -
Paper 5 (2020) 0.846 0.86
Paper 6 (2021) 0.9156* 0.9156*

* Combined dataset.
** Not updated below. (FakeNewsNet-politifact_max-vocab=15000_pre=v2_text_domain_T=200_s=15 and FakeNewsNet-gossipcop_max-vocab=15000_pre=v2_domain_T=250_s=5)

Paper 1: This paper

Model PolitiFact GossipCop PolitiFact+GossipCop
Acc Prec Rec F1 Acc Prec Rec F1 Acc Prec Rec F1
TM+Text 0.750 0.742 0.761 0.743 0.778 0.695 0.743 0.693 0.742 0.686 0.740 0.694
TM+Domain 0.768 0.849 0.779 0.762 0.732 0.666 0.708 0.675 0.734 0.672 0.714 0.681
TM+Text+Domain 0.764 0.754 0.770 0.756 0.809 0.739 0.775 0.744 0.782 0.721 0.766 0.734
TM+Tweet 0.632 0.688 0.514 0.387 0.241 0.121 0.500 0.194 - - - -
TM+Tweet+Text 0.774 0.764 0.781 0.766 0.799 0.724 0.737 0.716 - - - -
TM+Tweet+Domain 0.750 0.840 0.764 0.745 0.749 0.672 0.702 0.681 - - - -
TM+Tweet+Text+Domain 0.788 0.776 0.792 0.780 0.788 0.724 0.771 0.738 - - - -

Method

  • 80/20 split
  • Tokenize
  • Convert text to lowercase
  • Convert text to a binary bag-of-words vector

Size

  • PolitiFact has a total of 1056
  • GossipCop has a total of 22140

Paper 2: FakeNewsNet: A Data Repository with News Content, Social Context and Dynamic Information for Studying Fake News on Social Media (2019)

PDF

Model PolitiFact GossipCop
Acc Prec Rec F1 Acc Prec Rec F1
SVM 0.580 0.611 0.717 0.659 0.470 0.462 0.451 0.456
Logic regression 0.642 0.757 0.543 0.633 0.822 0.897 0.722 0.799
Naive Bayes 0.617 0.674 0.630 0.651 0.704 0.735 0.765 0.798
CNN 0.629 0.807 0.456 0.583 0.703 0.789 0.623 0.699
Social Article Fusion /S 0.654 0.600 0.789 0.681 0.741 0.709 0.761 0.734
Social Article Fusion /A 0.667 0.667 0.579 0.619 0.796 0.782 0.743 0.762
Social Article Fusion 0.691 0.638 0.789 0.706 0.796 0.820 0.753 0.785

Method

  • 80/20 split
  • Unknown or "default" settings

Size

  • PolitiFact has a total of 1056 (624 real, 432 fake)
  • GossipCop has a total of 22865 (16817 real, 6048 fake)

Paper 3: Exploring N-gram, Word Embedding and Topic Models for Content-based Fake News Detection in FakeNewsNet Evaluation (2020)

PDF

Model PolitiFact GossipCop
Acc Prec Rec F1 Acc Prec Rec F1
LogReg 0.64 0.76 0.54 0.63 0.82 0.90 0.72 0.80
Social Article Fusion 0.69 0.64 0.79 0.71 0.80 0.82 0.75 0.79
N-gram 0.80 0.79 0.78 0.78 0.82 0.75 0.79 0.77
Topic 0.60 0.55 0.53 0.51 0.51 0.51 0.51 0.47
Word2Vec 0.73 0.73 0.74 0.73 0.78 0.71 0.76 0.72
N-gram + Topic 0.77 0.76 0.76 0.76 0.82 0.75 0.78 0.76
N-gram + Word2Vec 0.72 0.72 0.73 0.72 0.78 0.71 0.76 0.72
Topic + Word2Vec 0.42 0.49 0.49 0.39 0.63 0.60 0.64 0.60
N-gram + Topic + Word2Vec 0.40 0.45 0.48 0.36 0.58 0.57 0.60 0.54

Method

  • 80/20 split
  • Tokenize
  • Stemming
  • Remove duplicates
  • Remove punctuation
  • Remove special characters and symbols
  • Remove hash from hashtags
  • Remove stop words
  • Convert text to lowercase

Size

After preprocessing:

  • PolitiFact has a total of 968 (426 real, 542 fake)
  • GossipCop has a total of 20796 (4804 real, 15965 fake)

Paper 4: Machine Learning vs Deep Learning Models for Detecting Fake News: A Comparative Analysis on Fake-NewsNet Dataset (2023)

PDF

Note: this is an aggregate of the presented data.

Model PolitiFact
Acc Prec Rec F1
NB 0.584 0.585 0.565 0.545
SVM 0.574 0.645 0.575 0.515
LSTM 0.560 0.570 0.580 0.555

Method

  • 80/10/10 split
  • Tokenize
  • Stemming
  • "Clean" punctuation
  • Remove some punctuation
  • Remove stop words
  • Remove numbers
  • Drop invalid data
  • Convert text to lowercase
  • Convert text to TF-IDF

Size

  • PolitiFact has a total of 1056 (624 real, 432 fake)

Paper 5: SpotFake+: A Multimodal Framework for Fake News Detection via Transfer Learning (Student Abstract) (2020)

PDF GitHub

Text

Model PolitiFact GossipCop
Acc Acc
SVM 0.58 0.497
Logistic Regression 0.642 0.648
Naive Bayes 0.617 0.624
CNN 0.629 0.723
SAF (Social Article Fusion) 0.691 0.689
XLNet + dense layer 0.74 0.836
XLNet + CNN 0.721 0.84
XLNet + LSTM 0.721 0.807

Text + Image

Model PolitiFact GossipCop
Acc Acc
EANN 0.74 0.86
MVAE 0.673 0.775
SpotFake 0.721 0.807
SpotFake+ 0.846 0.856

Method

  • Remove logos
  • Drop samples without images

Size

Before preprocessing:

  • PolitiFact has a total of 1056 (624 real, 432 fake)
  • GossipCop has a total of 22140 (16817 real, 5323 fake)

After preprocessing:

  • PolitiFact has a total of 485 (321 real, 164 fake)
  • GossipCop has a total of 12840 (10259 real, 2581 fake)

Paper 6: A Heuristic-driven Uncertainty based Ensemble Framework for Fake News Detection in Tweets and News Articles (2021)

PDF

Model PolitiFact + GossipCop
Acc Prec Rec F1
FakeFlow 0.82 0.82 0.82 0.82
One-Hot LR 0.7670 0.7670 0.7670 0.7670
FakeNewsTracker 0.7186 0.7186 0.7186 0.7186
Ensemble Model + Heuristic Post-Processing 0.9007 0.9007 0.9007 0.9007
SFFN (with MCDropout) + Heuristic Post-Processing 0.9156 0.9156 0.9156 0.9156

Method

  • 80/10/10 split
  • For tweets tweet-preprocessor was used (a Python package) to filter out usernames, URLs, emojis, etc.
  • For articles, filter out: usernames, URLs from Instagram, Facebook, Twitter, etc.
  • Different tokenizers (from huggingface)
  • Vocabulary trained on a large corpus like GLUE, wikitext-103, CommonCrawl, etc.
  • Transfer learning
  • News body was crawled
  • Ensemble of models used for balancing

Size

  • PolitiFact has a total of 1056 (624 real, 432 fake)
  • GossipCop has a total of 22140 (16817 real, 5323 fake)

Data was augmented by crawling "E! online", PolitiFact, and GossipCop.

After crawling:

  • PolitiFact has a total of 1011 (610 real, 401 fake)
  • GossipCop has a total of 20474 (15151 real, 5323 fake)