Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scree Plot for Pricipal Component Analysis #804

Closed
pswaldia opened this issue Apr 1, 2019 · 6 comments
Closed

Scree Plot for Pricipal Component Analysis #804

pswaldia opened this issue Apr 1, 2019 · 6 comments

Comments

@pswaldia
Copy link
Contributor

pswaldia commented Apr 1, 2019

Describe the solution you'd like
I would love to have a plot that shows the amount of variance explained by the prinicipal components. It can help to reduce the dimensionality of the features to the dimensions that explains the variance in the dataset the most. It will help to find the optimal principal components without extensive searching.

Examples

Screenshot 2019-04-01 at 21 09 36

This plot shows that around 350 features out of total 784 explains about 95% of the variance.

@lwgray
Copy link
Contributor

lwgray commented Apr 1, 2019

@pawaldia Awesome suggestion! We are catching up on issues and PRs since we just returned from a short hiatus. We will respond properly asap.

@lschumm
Copy link

lschumm commented May 7, 2019

I've done a mockup of some code to make this style of Scree plot:

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.decomposition import PCA

df = pd.read_csv("yellowbrick/examples/data/concrete/concrete.csv")

pca = PCA(n_components=9)
pca.fit(df)

plt.plot(np.arange(1, len(df.columns) + 1), np.cumsum(pca.explained_variance_ratio_))
plt.xlabel('number of components')
plt.ylabel('cumulative explained variance')
plt.xticks(np.arange(1, len(df.coulmns + 1), step=1))

An example Scree plot

@pdamodaran
Copy link
Contributor

@lschumm - fyi - we have a visualizer to do this - it can be found here:

https://github.com/DistrictDataLabs/yellowbrick/blob/develop/yellowbrick/features/decomposition.py

@pswaldia
Copy link
Contributor Author

pswaldia commented Jun 10, 2019

As directed by @pdamodaran , there's already a scree plot visualizer present in yellowbrick for this purpose. I am closing this issue for now.

@BradKML
Copy link

BradKML commented Mar 12, 2022

Are there any Elbow-like tool for Determining optimal PCA count?

@bbengfort
Copy link
Member

@BrandonKMLee I had been working on an explained variance plot for this - the WIP PR is here: #1037 -- it's been a while since I've taken a look at it, but if you want to help get it across the line or just use the code from that PR, I think that's what you're looking for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants