This repo contains an implementation of leADS (multi-label learning based on Active Dataset Subsampling) that leverages the idea of subsampling examples from data to reduce the negative impact of training loss. Specifically, leADS performs an iterative procedure to: (a)- constructing an acquisition model in an ensemble framework; (b) subselect informative examples using an acquisition function (entropy, mutual information, variation ratios, normalized propensity scored precision at k); and (c)- train on reduced selected examples. The ensemble approach was sought to enhance the generalization ability of the multi-label learning systems by concurrently building and executing a group of multi-label base learners, where each is assigned a portion of samples, to ensure proper learning of class labels (e.g. pathways). leADS was evaluated on the pathway prediction task using 10 multi-organism pathway datasets, where the experiments revealed that leADS achieved very compelling and competitive performances against the state-of-the-art pathway inference algorithms.
See tutorials on the GitHub wiki page for more information and guidelines.
If you find leADS useful in your research, please consider citing the following paper:
- M. A. Basher, Abdur Rahman and Nallan, Aditi N. and McLaughlin, Ryan J. and Anstett, Julia and Hallam, Steven J.. "leADS: improved metabolic pathway inference based on active dataset subsampling", bioRxiv (2021).
For any inquiries, please contact: [email protected]