Medical Diagnosis Data Analysis and Machine Learning Project

Introduction

This project aims to perform data analysis and apply a few machine learning techniques on the "prostate_dkfz_2018_clinical_data (2).csv" dataset. The dataset contains clinical data related to prostate cancer patients, including various features such as diagnosis age, cancer type, treatment information, mutation count, and more.

Dataset Description

The dataset "prostate_dkfz_2018_clinical_data (2).csv" consists of the following columns:

Study ID
Patient ID
Sample ID
Diagnosis Age
Age Group at Diagnosis in Years
BCR Status
Cancer Type
Cancer Type Detailed
Clonality
ETS Status
Radical Prostatectomy Gleason Score for Prostate Cancer
Initial Treatment
Localized Tumor
Median Purity
Mono or Multifocal Status
Mutation Count
Oncotree Code
Preop PSA
Sample Class
Number of Samples Per Patient
Sex
Somatic Status
Source
Stage
Time from Surgery to BCR/Last Follow Up
TMB (nonsynonymous)

Project Structure

The project includes the following files:

prostate_dkfz_2018_clinical_data (2).csv: The dataset containing the clinical data for prostate cancer patients.
Explore_Prostate_data.ipynb: A Jupyter notebook that contains the code for data analysis, visualization, and machine learning tasks.
README.md: This file, providing an overview of the project, dataset, and files.

Project Goals

The main objectives of this project are:

Exploratory Data Analysis (EDA): Perform an in-depth analysis of the dataset to gain insights into the characteristics of the prostate cancer patients and the relationship between different features.
Data Preprocessing: Clean the data, handle missing values, and prepare the dataset for machine learning.
Machine Learning: Apply machine learning algorithms to build predictive models for cancer diagnosis or other relevant tasks.
Visualization: Create informative visualizations to present the findings of the data analysis and machine learning models.

Getting Started

To run the code in the Jupyter notebook, you will need the following libraries installed:

pandas
numpy
matplotlib
seaborn
scikit-learn

You can install these libraries using the following command in your Python environment:

pip install pandas numpy matplotlib seaborn scikit-learn

Usage

Download the "prostate_dkfz_2018_clinical_data (2).csv" dataset and place it in the project folder.
Open the Jupyter notebook "Explore_Prostate_data.ipynb" using Jupyter Notebook or JupyterLab.
Follow the instructions in the notebook to execute the code cells and perform data analysis and machine learning tasks.

Note

This project is intended for educational and research purposes only. It is essential to consult medical professionals and domain experts for accurate medical diagnoses and decisions.

Feel free to explore, analyze, contribute and enhance the project to suit your specific requirements or research objectives.

For any questions or inquiries, please contact [CHEPTOYEK BILL] at [@trojan__bill] on X(formerly twitter)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Medical Diagnosis Data Analysis and Machine Learning Project

Introduction

Dataset Description

Project Structure

Project Goals

Getting Started

Usage

Note

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Explore_Prostate_data.ipynb		Explore_Prostate_data.ipynb
README.md		README.md
prostate_dkfz_2018_clinical_data (2).csv		prostate_dkfz_2018_clinical_data (2).csv

BILL-CHEPTOYEK/MEDICAL-DIAGNOSIS-AI-PROSTATE-CANCER-

Folders and files

Latest commit

History

Repository files navigation

Medical Diagnosis Data Analysis and Machine Learning Project

Introduction

Dataset Description

Project Structure

Project Goals

Getting Started

Usage

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages