Skip to content

UfukTanriverdi8/Mushroom-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mushroom Classification Project

This project aims to classify mushrooms as either edible or poisonous based on various features using different machine learning algorithms and a simple neural network.

You can access the code also through the Google Colab or Kaggle.

Dataset

The dataset used in this project is originally from the UCI Machine Learning Repository but i will be using a cleaned version of it. It can be accessed from Kaggle. This dataset consists of 61069 hypothetical mushrooms with caps based on 173 species (353 mushrooms per species). Each mushroom is identified as definitely edible or definitely poisonous. Based on this dataset we will try to predict a given mushroom is edible or not by using its features.

Features

  • cap-diameter: Diameter of the mushroom cap in mm.
  • cap-shape: Shape of the mushroom cap (encoded).
  • gill-attachment: Type of gill attachment (encoded).
  • gill-color: Color of the gill (encoded).
  • stem-height: Height of the mushroom stem in cm.
  • stem-width: Width of the mushroom stem in mm.
  • stem-color: Color of the mushroom stem (encoded).
  • season: Season the mushroom was found (encoded).

Target Variable

  • class: Edibility of the mushroom (0 = poisonous, 1 = edible).

Models Used

The project evaluates the following machine learning models:

  • Logistic Regression
  • Random Forest Classifier
  • Support Vector Machine (SVM)
  • K-Nearest Neighbors (KNN)
  • Gaussian Naive Bayes
  • Decision Tree Classifier
  • XGBoost Classifier
  • A simple neural network model

Files

  • mushrooms_cleaned.csv: Preprocessed dataset after cleaning and feature engineering.
  • mushroom_classification.ipynb: Jupyter Notebook containing the code for data loading, preprocessing, model training, evaluation, and comparison.
  • README.md: This file, providing an overview of the project.
  • output.png: The graph that show the comparison of the models.

Requirements

The project requires the following Python libraries:

  • NumPy
  • Pandas
  • Matplotlib
  • Scikit-learn
  • TensorFlow/Keras
  • XGBoost

You can install the required libraries using the following command:

pip install numpy pandas matplotlib scikit-learn tensorflow xgboost

Usage

  1. Clone the repository:

    git clone https://github.com/UfukTanriverdi8/mushroom-classification.git
    cd mushroom-classification
    

    Run the Jupyter Notebook for mushroom_classification.ipynb:

    jupyter notebook mushroom_classification.ipynb
    

    OR

    Just run the notebook from the Colab link.

  2. Follow the instructions in the notebook to execute each cell and run the project.

Results

  • The project evaluates each model's performance using accuracy, precision, recall, and F1-score metrics.

A comparison of the models

  • Logistic Regression and Gaussian Naive Bayes models performed poorly compared to the other models.
  • Except for these two models, the others achieved high metrics overall. The Random Forest Classifier was the best-performing model, but K-Neighbors Classifier, XGB Classifier, Decision Tree Classifier, and the neural network model also had metrics close to those of the Random Forest Classifier.
  • The validation and test metrics are very close to each other, indicating that we successfully prevented overfitting.
  • With some parameter tuning, the performance of the underperforming models could be improved.
  • Based on performance, Random Forest Classifier, Decision Tree Classifier, and XGB Classifier are highly recommended for this dataset. Additionally, the neural network model shows promise with further parameter tuning.

About

Machine learning project for classification of mushrooms

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published