Skip to content

cambridgeltl/biocaster_2021

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BioCaster in 2021: Automatic Disease Outbreaks Detection from Global News Media


BioCaster was launched in 2008 to provide an ontology-based text mining system for early disease detection from open news sources. Following a six-year break, we have re-launched the system in 2021. Our goal is to systematically upgrade the methodology using state-of-the-art neural network language models, whilst retaining the original benefits that the system provided in terms of logical reasoning and automated early detection of infectious disease outbreaks. Here we present recent extensions such as neural machine translation in 10 languages, neural classification of disease outbreak reports, and a new cloud-based visualisation dashboard. Furthermore, we discuss our vision for further improvements, including combining risk assessment with event semantics and assessing the risk of outbreaks with multi-granularity. We hope that these efforts will benefit the global public health community.

Please see BioCaster Tutorial for a vedio demostration about how to use our BioCaster system.

Repo Structure

  • med_doc_cls: source codes of relevance classification task.

  • sapbert: source codes of the SapBERT entity linking model.

  • biocaster-ontology: the BioCaster ontology against which the patterns are created.

  • srl-editor: the BioCaster rule engine.

BioCaster Structure

img

Huggingface Models

  • Relevance Classification: The [PubMedBERT] model is used for relevance classification.

  • Entity Linking: The [SapBERT] model is used for entity linking.

    Standard SapBERT as described in [Liu et al., NAACL 2021]. Trained with UMLS 2020AA (English only), using microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext as the base model. For [SapBERT], use [CLS] (before pooler) as the representation of the input; for [SapBERT-mean-token], use mean-pooling across all tokens.

Acknowledgement

BioCaster is only possible thanks to freely available information sources on the Web. We are very grateful in particular to the following sources listed below. Whilst we acknowledge these organisations, mention here does not imply any endorsement or affiliation.

Please note that disease news data changes rapidly and differs by location and language so it may not reflect the pattern of disease outbreaks in some areas, for example some areas may be over-reported and some may be under-reported. The numbers of news reports on BioCaster may differ from aggregated data on other disease outbreak monitoring sites because the data is gathered and analysed in different ways. The relationship between news report counts and outbreak cases is a complex one that we are currently trying to understand in our research.

About

This is a public repo for codes and resources of BioCaster 2021: http://www.biocaster.org

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published