This project aims to perform data analysis and apply a few machine learning techniques on the "prostate_dkfz_2018_clinical_data (2).csv" dataset. The dataset contains clinical data related to prostate cancer patients, including various features such as diagnosis age, cancer type, treatment information, mutation count, and more.
The dataset "prostate_dkfz_2018_clinical_data (2).csv" consists of the following columns:
- Study ID
- Patient ID
- Sample ID
- Diagnosis Age
- Age Group at Diagnosis in Years
- BCR Status
- Cancer Type
- Cancer Type Detailed
- Clonality
- ETS Status
- Radical Prostatectomy Gleason Score for Prostate Cancer
- Initial Treatment
- Localized Tumor
- Median Purity
- Mono or Multifocal Status
- Mutation Count
- Oncotree Code
- Preop PSA
- Sample Class
- Number of Samples Per Patient
- Sex
- Somatic Status
- Source
- Stage
- Time from Surgery to BCR/Last Follow Up
- TMB (nonsynonymous)
The project includes the following files:
-
prostate_dkfz_2018_clinical_data (2).csv: The dataset containing the clinical data for prostate cancer patients.
-
Explore_Prostate_data.ipynb: A Jupyter notebook that contains the code for data analysis, visualization, and machine learning tasks.
-
README.md: This file, providing an overview of the project, dataset, and files.
The main objectives of this project are:
-
Exploratory Data Analysis (EDA): Perform an in-depth analysis of the dataset to gain insights into the characteristics of the prostate cancer patients and the relationship between different features.
-
Data Preprocessing: Clean the data, handle missing values, and prepare the dataset for machine learning.
-
Machine Learning: Apply machine learning algorithms to build predictive models for cancer diagnosis or other relevant tasks.
-
Visualization: Create informative visualizations to present the findings of the data analysis and machine learning models.
To run the code in the Jupyter notebook, you will need the following libraries installed:
- pandas
- numpy
- matplotlib
- seaborn
- scikit-learn
You can install these libraries using the following command in your Python environment:
pip install pandas numpy matplotlib seaborn scikit-learn
-
Download the "prostate_dkfz_2018_clinical_data (2).csv" dataset and place it in the project folder.
-
Open the Jupyter notebook "Explore_Prostate_data.ipynb" using Jupyter Notebook or JupyterLab.
-
Follow the instructions in the notebook to execute the code cells and perform data analysis and machine learning tasks.
This project is intended for educational and research purposes only. It is essential to consult medical professionals and domain experts for accurate medical diagnoses and decisions.
Feel free to explore, analyze, contribute and enhance the project to suit your specific requirements or research objectives.
For any questions or inquiries, please contact [CHEPTOYEK BILL] at [@trojan__bill] on X(formerly twitter)