`SilhouetteVisualizer` add support for more estimators #1294

stergion · 2023-01-19T00:53:13Z

This PR closes #1182 .

It adds support for clustering estimators that do not implement a predict() method,
for estimators without attribute n_clusters infer clusters from labels,
uses the estimator's metric or affinity as the silhouette metric.

I decided not to implement special handling for estimators that produce outlier values, eg DBSCAN,
as sklearn doesn't do neither in their examples.

I wasn't sure weather to use the estimator's metric attribute for the silhouette metric,
or add a parameter metric in SilhouetteVisualizer constructor. I chose the first option because it didn't
alter the class signature and seemed like the safer option. Although, I do believe the second option to be the better one.

…method.

…ilhouette_score()` and `silhouette_samples()`

codecov · 2023-01-19T01:04:20Z

Codecov Report

Merging #1294 (e50a829) into develop (7a3c94c) will decrease coverage by 0.19%.
The diff coverage is 58.62%.

@@             Coverage Diff             @@
##           develop    #1294      +/-   ##
===========================================
- Coverage    90.89%   90.70%   -0.19%     
===========================================
  Files           93       93              
  Lines         5303     5327      +24     
===========================================
+ Hits          4820     4832      +12     
- Misses         483      495      +12

Impacted Files	Coverage Δ
yellowbrick/cluster/silhouette.py	`85.22% <58.62%> (-13.22%)`	⬇️

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

…` from the labels.

bbengfort · 2023-02-25T18:10:52Z

@stergion thank you so much for your interest in Yellowbrick and for opening this PR; we really appreciate all contributions to Yellowbrick. We'll find a reviewer for this PR as soon as possible so that we can include it in our next release.

lwgray

This is a great PR. I like how you simply added support for more estimators. The only thing left is to write some tests. Let me know if you need help with that. After that I will approve.

lwgray · 2023-02-26T21:34:03Z

yellowbrick/cluster/silhouette.py

+        if check_fitted(self.estimator, is_fitted_by=self.is_fitted) and hasattr(self.estimator, "predict"):
+            labels = self.estimator.predict(X)
+        else:  # if estimator is NOT fitted, OR estimator does NOT implement predict()
+            labels = self.estimator.fit_predict(X, y, **kwargs)


Great 👍 way to cover fit_predict here.

yellowbrick/cluster/silhouette.py

…-estimators

stergion · 2023-04-19T12:43:58Z

@lwgray Thanks for the comments and the review. Sorry, I took so long to reply.

If you could help me with the tests, it would be great, since I don't have any experience.

…-estimators

lwgray · 2023-06-25T23:40:26Z

I went back and reviewed this again and realized that all but two clustering algorithm work immediately with this fix. I am unsure why I get the ValueError with SpectralClustering. FeatureAgglomeration gives back an AttributeError and it is because it doesn't have a fit_predict method. See table below:

	Works	Error
<class 'sklearn.cluster._kmeans.KMeans'>	Yes
<class 'sklearn.cluster._kmeans.MiniBatchKMeans'>	Yes
<class 'sklearn.cluster._affinity_propagation.AffinityPropagation'>	Yes
<class 'sklearn.cluster._mean_shift.MeanShift'>	Yes
<class 'sklearn.cluster._spectral.SpectralClustering'>	NO	ValueError: Unknown metric rbf. Valid metrics are ['euclidean', 'l2', 'l1', 'manhattan', 'cityblock', 'braycurtis', 'canberra', 'chebyshev', 'correlation', 'cosine', 'dice', 'hamming', 'jaccard', 'kulsinski', 'mahalanobis', 'matching', 'minkowski', 'rogerstanimoto', 'russellrao', 'seuclidean', 'sokalmichener', 'sokalsneath', 'sqeuclidean', 'yule', 'wminkowski', 'nan_euclidean', 'haversine'], or 'precomputed', or a callable
<class 'sklearn.cluster._dbscan.DBSCAN'>	Yes
<class 'sklearn.cluster._optics.OPTICS'>	Yes
<class 'sklearn.cluster._agglomerative.AgglomerativeClustering'>	Yes
<class 'sklearn.cluster._birch.Birch'>	Yes
<class 'sklearn.cluster._agglomerative.FeatureAgglomeration'>	NO	AttributeError: - Does not have a fit_predict method

lwgray · 2023-06-25T23:43:19Z

A couple of questions...

What do we do about tests? Do we write individual test for each clustering algorithm?
After tests have been dealt with, Can we write up issues for the two remaining clustering algorithms that aren't fixed by this PR? I suggest we merge this PR after test added.

bbengfort

@lwgray I think it is simple enough to add a test for all the clusters using pytest parameterize -- we do that for a lot of tests, and I agree it should be part of this PR.

We do need to fix SpectralClustering since the bug was introduced in this PR. FeatureAgglomoration is a transformer not a model -- so I think we can omit that from the tests.

bbengfort · 2023-06-26T12:40:15Z

yellowbrick/cluster/silhouette.py

+        elif hasattr(self.estimator, "affinity"):
+            metric = self.estimator.affinity


@lwgray this is where the error is occurring for SpectralClustering - SpectralClustering does have an attribute affinity which is used to compute the adjacency matrix between instances. For spectral clustering the attribute affinity is not a distance metric and defaults to "rbf" -- which is why that test is failing.

@stergion what model prompted you to add this metric selector?

it was for the AffinityPropagation, AgglomerativeClustering, FeatureAgglomeration.

Since sklearn version 1.2, affinity is deprecated in AgglomerativeClustering and FeatureAgglomeration
and metric is used instead, like the other clustering algorithms.

AffinityPropagation was not updated, it still uses affinity. Although, AffinityPropagation uses the negative
squared euclidean distance between points, when affinity='euclidean'

…d are implementing fit_predict(). Added condition to make sure Spectral Clustering metric is not being set to

lwgray · 2023-06-30T19:50:26Z

@stergion @bbengfort Can you review my changes?

Co-authored-by: stergion <[email protected]> Signed-off-by: Benjamin Bengfort <[email protected]>

bbengfort

@lwgray thank you for making those changes! I made a few changes of my own to try to help with the CI tests and linting. Would you and @stergion please review those changes?

bbengfort · 2023-07-04T20:43:21Z

@lwgray RE: the conda tests; in a separate PR we're going to have to update our Python versions for testing. The Conda 3.8 and 3.9 test failures can be ignored.

stergion

Looks good to me. Your changes also made the code cleaner and easier to understand.

bbengfort · 2023-07-05T18:14:41Z

@stergion thank you so much for your contribution to Yellowbrick!

stergion added 2 commits January 19, 2023 01:58

Use fit_predict() for estimators that do not implement predict() …

7b485b6

…method.

Use the estimator's distance metric, for the parameter metric in `s…

b1b2e9f

…ilhouette_score()` and `silhouette_samples()`

For estimators without attribute n_clusters infer `self.n_clusters_…

e67f05a

…` from the labels.

lwgray reviewed Feb 26, 2023

View reviewed changes

Merge branch 'develop' into SilhouetteVisualizer-add-support-for-more…

e6ce343

…-estimators

Thecave3 mentioned this pull request Apr 16, 2023

Unable to use Silhouette Visualizer with Gaussian Mixture Model #1303

Open

Merge branch 'develop' into SilhouetteVisualizer-add-support-for-more…

6be136a

…-estimators

lwgray mentioned this pull request Jun 25, 2023

WIP: Allows Silhouette Visualizer to accept DensityEstimator #1304

Open

1 task

bbengfort reviewed Jun 26, 2023

View reviewed changes

Added test to verify that cluster estimator without a predict() metho…

36b6e8d

…d are implementing fit_predict(). Added condition to make sure Spectral Clustering metric is not being set to

lwgray requested a review from bbengfort June 30, 2023 19:50

lwgray and others added 3 commits June 30, 2023 13:58

Fix Indentation and Ellipsis Error in Test

1150cc3

More expressive n_clusters computation.

c7490d5

Co-authored-by: stergion <[email protected]> Signed-off-by: Benjamin Bengfort <[email protected]>

review comments

f111a3f

bbengfort approved these changes Jul 4, 2023

View reviewed changes

appease linter

e50a829

stergion commented Jul 4, 2023

View reviewed changes

bbengfort merged commit f7a8e95 into DistrictDataLabs:develop Jul 5, 2023
9 of 14 checks passed

stergion deleted the SilhouetteVisualizer-add-support-for-more-estimators branch July 5, 2023 23:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`SilhouetteVisualizer` add support for more estimators #1294

`SilhouetteVisualizer` add support for more estimators #1294

stergion commented Jan 19, 2023 •

edited

Loading

codecov bot commented Jan 19, 2023 •

edited

Loading

bbengfort commented Feb 25, 2023

lwgray left a comment

lwgray Feb 26, 2023

stergion commented Apr 19, 2023

lwgray commented Jun 25, 2023

lwgray commented Jun 25, 2023

bbengfort left a comment

bbengfort Jun 26, 2023

stergion Jun 28, 2023

lwgray commented Jun 30, 2023

bbengfort left a comment

bbengfort commented Jul 4, 2023

stergion left a comment

bbengfort commented Jul 5, 2023

		elif hasattr(self.estimator, "affinity"):
		metric = self.estimator.affinity

SilhouetteVisualizer add support for more estimators #1294

SilhouetteVisualizer add support for more estimators #1294

Conversation

stergion commented Jan 19, 2023 • edited Loading

codecov bot commented Jan 19, 2023 • edited Loading

Codecov Report

bbengfort commented Feb 25, 2023

lwgray left a comment

Choose a reason for hiding this comment

lwgray Feb 26, 2023

Choose a reason for hiding this comment

stergion commented Apr 19, 2023

lwgray commented Jun 25, 2023

lwgray commented Jun 25, 2023

bbengfort left a comment

Choose a reason for hiding this comment

bbengfort Jun 26, 2023

Choose a reason for hiding this comment

stergion Jun 28, 2023

Choose a reason for hiding this comment

lwgray commented Jun 30, 2023

bbengfort left a comment

Choose a reason for hiding this comment

bbengfort commented Jul 4, 2023

stergion left a comment

Choose a reason for hiding this comment

bbengfort commented Jul 5, 2023

`SilhouetteVisualizer` add support for more estimators #1294

`SilhouetteVisualizer` add support for more estimators #1294

stergion commented Jan 19, 2023 •

edited

Loading

codecov bot commented Jan 19, 2023 •

edited

Loading