Using unsupervised NLP topic modelling and clustering methods on a dataset of 36+ million Arabic tweets to build a machine learning classifier that can identify misinformation in short-text documents.
This is my final Capstone Project entitled What the Tweet?! Identifying Arabic-Language Political Misinformation on Twitter.
This project was completed as part of the Springboard Data Science Career Track. In this project, I present a unique hybrid approach to topic modelling and apply it to a dataset of 36+ million Arabic tweets to successfully extract only the political content. The project also attempts to apply further clustering algorithms in order to identify the political content that is specifically misinformation, but this has not yet been successful due to technical limitations and the size of the dataset.
See README in the project directory for more information, incl. project outline and key findings.
Read full project report on Towards Data Science.
Watch project presentation on YouTube (22min).
Complete Jupyter Notebooks for each section can be found in the 'notebooks' directory.
Using geospatial analysis and machine learning to predict conflict intensity from climate data.
This is my first Capstone Project entitled Heated Discussions: Predicting Conflict Intensity Using Climate Data.
This project was completed as part of the Springboard Data Science Career Track. In this project, I investigate to what extent it is possible to predict conflict intensity based on climate data. Project design, data collection, and complete execution were done independently.
See README in the project directory for more information, incl. project outline and key findings.
Read full project report on Towards Data Science.
Watch project presentation on YouTube (20min).
Complete Jupyter Notebooks for each section can be found in the 'notebooks' directory.