Skip to content

Explore survival patterns on the Titanic using logistic regression. This project includes visualizations, data preprocessing, and predictive modeling. Future work will enhance accuracy with Gradient Boosting. Dataset from Kaggle.

Notifications You must be signed in to change notification settings

alizahir23/LogisticRegression_Titanic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Titanic Survival Analysis

Overview

This repository hosts a data science project analyzing the survival of passengers aboard the RMS Titanic. The analysis investigates factors influencing survival rates and employs logistic regression to predict outcomes. The dataset used in this project is sourced from Kaggle and can be found here.

Analyses Conducted

1. Missing Data Visualization

  • Objective: Identify missing data in the dataset.

  • Method: Used a heatmap to visualize missing values.

  • Plot:

    missing_data_heatmap.png

2. Survival Count by Gender

  • Objective: Explore the survival count based on gender.

  • Method: Generated a count plot comparing survival rates between genders.

  • Plot:

    survival_by_gender.png

    enter image description here

3. Survival Count by Passenger Class

  • Objective: Analyze how passenger class affects survival rates.

  • Method: A count plot illustrating survival distribution across different classes.

  • Plot:

    enter image description here

4. Age Distribution Among Passengers

  • Objective: Observe the age distribution of the passengers.
  • Method: Created a distribution plot for the age variable.
  • Plot:

enter image description here

5. Logistic Regression Model

  • Objective: Predict survival based on variables such as age, sex, passenger class, etc.

  • Method: A logistic regression model was implemented and trained on the preprocessed data.

  • Results: The model achieved an accuracy score of 0.797752808988764. Model evaluation details are documented using a confusion matrix.

    enter image description here

  • Improvement: To potentially enhance this model, I will be returning to this project to implement a Gradient Boosting model. Gradient Boosting can provide better performance through ensemble learning techniques that combine multiple weak learning models to create a strong predictive model, potentially improving the accuracy further.

Installation

Ensure you have Python installed, then set up a virtual environment and install the required packages:

pip install pandas numpy matplotlib seaborn scikit-learn

About

Explore survival patterns on the Titanic using logistic regression. This project includes visualizations, data preprocessing, and predictive modeling. Future work will enhance accuracy with Gradient Boosting. Dataset from Kaggle.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages