Skip to content

D2KLab participation to the MediaEval Fake News Analysis Task (2021-2022)

Notifications You must be signed in to change notification settings

D2KLab/mediaeval-fakenews

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

67 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task

This repository contains the D2KLab participation to the MediaEval 2021 FakeNews Task and MediaEval 2022 FakeNews Task .

2022

This year's ''FakeNews Detection'' task aims at detecting 9 named conspiracy theories in tweets, as well as classifying misinformation spreaders in a user interaction graph. The code implementation for each task are available in ./2022/src/

Approach

In order to tackle this challenge, we studied text-classification transformer-models for task 1 and 3, and node-classification models for task 2 and 3. Our approach leverages multiple CT-BERT models for text-classification and node2vec in combination with simple classifiers (MLP, RF) for node-classification. We then concatenate both text and graph features and perform classification for task 3.

Results (2022)

The results obtained with our approach are summarised in the following figure: plot

Main Takaways

  • Even though we had more data this year, we did not see an increase in performance
  • Some conspiracies (Harmful Radiation/Influence or New World Order) are easier to detect than others (Antivax). In this example, Antivax has four times more data than Harmful Radiation/Influence but performs significantly worse.
  • Graph-related tasks are challenging and there is room for improvement.
  • Other approach could have been studied (e.g. GNNs)

2021

We proposed three approaches for which the code implementation are available in ./2021/src/ for the ones who would like to retrain our models.

An inference notebook is also directly available in ./2021/inference/inference.ipynb. All models are available for download at https://mediaeval-fakenews.tools.eurecom.fr/index.html

The path to the models needs to be specified in the Input cell of the inference notebook.

Citation

Youri Peskine, Giulio Alfarano, Ismail Harrando, Paolo Papotti, Raphaël Troncy.
Detecting COVID-19-Related Conspiracy Theories in Tweets.
In Multimedia Benchmark Workshop (MediaEval 2021), 13-15 December 2021, Online.
https://2021.multimediaeval.com/paper65.pdf

Approach

In order to tackle this challenge, we studied three different kind of approaches. The first uses a combination of TFIDF and machine learning algorithms. The second approach uses Natural Language Inference (NLI) combined with metadata from Wikipedia. The third approach aims at fine-tuning transformer-based models. This last approach was the most performing one and got the best results on all the tasks amongst all the participants.

Results (2021)

The results for our 3 approaches on a validation set and on the test set are summarized on this figure. Our 2021 runs are available in ./2021/runs/.

  • Run 1 is TFIDF
  • Run 2 is CTBert
  • No run 3
  • Run 4 is task-3-CTBert
  • Run 5 is late fusion ensembling

plot

Requirements

python==3.8
torch==1.6.0
transformers==3.1.0
pandas==1.3.3
numpy==1.22.3
emoji==0.5.3
notebook
scikit-learn

About

D2KLab participation to the MediaEval Fake News Analysis Task (2021-2022)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •