Skip to content

Latest commit

 

History

History
56 lines (36 loc) · 4.62 KB

README.md

File metadata and controls

56 lines (36 loc) · 4.62 KB

BioCaster in 2021: Automatic Disease Outbreaks Detection from Global News Media


BioCaster was launched in 2008 to provide an ontology-based text mining system for early disease detection from open news sources. Following a six-year break, we have re-launched the system in 2021. Our goal is to systematically upgrade the methodology using state-of-the-art neural network language models, whilst retaining the original benefits that the system provided in terms of logical reasoning and automated early detection of infectious disease outbreaks. Here we present recent extensions such as neural machine translation in 10 languages, neural classification of disease outbreak reports, and a new cloud-based visualisation dashboard. Furthermore, we discuss our vision for further improvements, including combining risk assessment with event semantics and assessing the risk of outbreaks with multi-granularity. We hope that these efforts will benefit the global public health community.

Please see BioCaster Tutorial for a vedio demostration about how to use our BioCaster system.

Repo Structure

  • med_doc_cls: source codes of relevance classification task.

  • sapbert: source codes of the SapBERT entity linking model.

  • biocaster-ontology: the BioCaster ontology against which the patterns are created.

  • srl-editor: the BioCaster rule engine.

BioCaster Structure

img

Huggingface Models

  • Relevance Classification: The [PubMedBERT] model is used for relevance classification.

  • Entity Linking: The [SapBERT] model is used for entity linking.

    Standard SapBERT as described in [Liu et al., NAACL 2021]. Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. For [SapBERT], use [CLS] (before pooler) as the representation of the input; for [SapBERT-mean-token], use mean-pooling across all tokens.

Acknowledgement

BioCaster is only possible thanks to freely available information sources on the Web. We are very grateful in particular to the following sources listed below. Whilst we acknowledge these organisations, mention here does not imply any endorsement or affiliation.

Please note that disease news data changes rapidly and differs by location and language so it may not reflect the pattern of disease outbreaks in some areas, for example some areas may be over-reported and some may be under-reported. The numbers of news reports on BioCaster may differ from aggregated data on other disease outbreak monitoring sites because the data is gathered and analysed in different ways. The relationship between news report counts and outbreak cases is a complex one that we are currently trying to understand in our research.