This project aims to develop a machine learning model that can classify whether a tumor is benign or malignant based on features extracted from breast cancer cell nuclei. The dataset used for this project is the Breast Cancer Wisconsin (Diagnostic) Dataset from the sklearn.datasets
library. The classification is achieved using the K-Nearest Neighbors (KNN) algorithm.
The Breast Cancer Wisconsin (Diagnostic) Dataset contains the following information:
- Number of Instances: 569
- Number of Features: 30 (plus a target variable)
- Feature Information:
- Mean radius
- Mean texture
- Mean perimeter
- Mean area
- Mean smoothness
- And others related to the cell nuclei's properties
- Target Variable:
- 0: Malignant
- 1: Benign
- Load Dataset: The dataset is loaded using
sklearn.datasets.load_breast_cancer()
. - Data Preprocessing: The dataset is split into training and test sets using
train_test_split
. - Model Building: A K-Nearest Neighbors (KNN) model is built using the
KNeighborsClassifier
fromsklearn.neighbors
. - Model Evaluation: The model's performance is evaluated using accuracy scores and visualized using matplotlib plots.
To run this project, you'll need to install the following Python packages:
scikit-learn
matplotlib
pandas
(optional for additional data handling)
You can install them using pip
:
pip install scikit-learn matplotlib pandas
- Clone the repository or download the notebook file.
- Ensure the necessary libraries are installed.
- Open the notebook file (
Breast_Cancer_Classifier.ipynb
) and run the cells step by step.
This project demonstrates a simple yet effective approach to classifying breast cancer using the K-Nearest Neighbors algorithm. The model provides insights into how machine learning techniques can aid in medical diagnosis tasks.