Skip to content

BILL-CHEPTOYEK/MEDICAL-DIAGNOSIS-AI-PROSTATE-CANCER-

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Medical Diagnosis Data Analysis and Machine Learning Project

Introduction

This project aims to perform data analysis and apply a few machine learning techniques on the "prostate_dkfz_2018_clinical_data (2).csv" dataset. The dataset contains clinical data related to prostate cancer patients, including various features such as diagnosis age, cancer type, treatment information, mutation count, and more.

Dataset Description

The dataset "prostate_dkfz_2018_clinical_data (2).csv" consists of the following columns:

  1. Study ID
  2. Patient ID
  3. Sample ID
  4. Diagnosis Age
  5. Age Group at Diagnosis in Years
  6. BCR Status
  7. Cancer Type
  8. Cancer Type Detailed
  9. Clonality
  10. ETS Status
  11. Radical Prostatectomy Gleason Score for Prostate Cancer
  12. Initial Treatment
  13. Localized Tumor
  14. Median Purity
  15. Mono or Multifocal Status
  16. Mutation Count
  17. Oncotree Code
  18. Preop PSA
  19. Sample Class
  20. Number of Samples Per Patient
  21. Sex
  22. Somatic Status
  23. Source
  24. Stage
  25. Time from Surgery to BCR/Last Follow Up
  26. TMB (nonsynonymous)

Project Structure

The project includes the following files:

  1. prostate_dkfz_2018_clinical_data (2).csv: The dataset containing the clinical data for prostate cancer patients.

  2. Explore_Prostate_data.ipynb: A Jupyter notebook that contains the code for data analysis, visualization, and machine learning tasks.

  3. README.md: This file, providing an overview of the project, dataset, and files.

Project Goals

The main objectives of this project are:

  1. Exploratory Data Analysis (EDA): Perform an in-depth analysis of the dataset to gain insights into the characteristics of the prostate cancer patients and the relationship between different features.

  2. Data Preprocessing: Clean the data, handle missing values, and prepare the dataset for machine learning.

  3. Machine Learning: Apply machine learning algorithms to build predictive models for cancer diagnosis or other relevant tasks.

  4. Visualization: Create informative visualizations to present the findings of the data analysis and machine learning models.

Getting Started

To run the code in the Jupyter notebook, you will need the following libraries installed:

  • pandas
  • numpy
  • matplotlib
  • seaborn
  • scikit-learn

You can install these libraries using the following command in your Python environment:

pip install pandas numpy matplotlib seaborn scikit-learn

Usage

  1. Download the "prostate_dkfz_2018_clinical_data (2).csv" dataset and place it in the project folder.

  2. Open the Jupyter notebook "Explore_Prostate_data.ipynb" using Jupyter Notebook or JupyterLab.

  3. Follow the instructions in the notebook to execute the code cells and perform data analysis and machine learning tasks.

Note

This project is intended for educational and research purposes only. It is essential to consult medical professionals and domain experts for accurate medical diagnoses and decisions.

Feel free to explore, analyze, contribute and enhance the project to suit your specific requirements or research objectives.

For any questions or inquiries, please contact [CHEPTOYEK BILL] at [@trojan__bill] on X(formerly twitter)

About

This is a model meant to diagnose prostate cancer

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published