This repository explores the application of various machine learning algorithms for bioinformatics tasks, particularly focusing on the ChEMBL dataset. It delves into the performance comparison of these algorithms against a lazy regressor baseline.
Evaluate the effectiveness of different machine learning algorithms for bioinformatics applications using the ChEMBL dataset.
Assess the performance of chosen algorithms compared to a lazy regressor baseline.
Provide insights into the suitability of various algorithms for specific bioinformatics prediction tasks.
Clone the Repository:
git clone https://github.com/<lala2398>/<Bioinformatics>.git
Install Dependencies:
Navigate to the project directory and install the required Python libraries using a package manager like pip:
cd <Bioinformatics>pip install -r requirements.txt
The notebooks directory contains Jupyter notebooks that guide you through data preparation, model training, evaluation, and comparison. Open these notebooks in a Jupyter Notebook environment to explore the analyses.
The Jupyter notebooks are designed to deliver the following outcomes:
Preprocessed ChEMBL dataset ready for machine learning tasks.
Trained and evaluated machine learning models using selected algorithms.
Performance comparison showcasing the strengths and weaknesses of each algorithm relative to the lazy regressor baseline.
We encourage contributions to this project!
This project is licensed under the terms of the MIT License.