Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Passing cluster_labels broken #49

Open
mdruiter opened this issue Mar 2, 2023 · 2 comments
Open

Passing cluster_labels broken #49

mdruiter opened this issue Mar 2, 2023 · 2 comments
Assignees
Labels
bug Something isn't working in progress This issue is being actively worked on

Comments

@mdruiter
Copy link

mdruiter commented Mar 2, 2023

I think I have found a bug that occurs when passing some cluster_labels.

When I completely reverse the order of all input (data and cluster_labels), and I reverse the result (local_outlier_probabilities), I would expect the same numbers. This does happen as long as all cluster_labels values are equal. Once I have two (really separate) clusters, the results change when flipped!
An extra indication that things go wrong (IMHO): the second cluster's neighbor numbers are in the first cluster!

A small reproduction example:

import matplotlib.pyplot as plt
from PyNomaly import loop

np.random.seed(1)
n = 9
data = np.append(np.random.normal(2, 1, [n, 2]), np.random.normal(8, 1, [n, 2]), axis=0)
clus = np.append(np.ones(n),                     2 * np.ones(n)).tolist()  # 2 cluster numbers!
model = loop.LocalOutlierProbability(data, n_neighbors=5, cluster_labels=clus)
fit = model.fit()
res = fit.local_outlier_probabilities
print(res)
print(fit.neighbor_matrix)

data_flipped = np.flipud(data)
clus_flipped = np.flipud(clus).tolist()
model2 = loop.LocalOutlierProbability(data_flipped, n_neighbors=5, cluster_labels=clus_flipped)
fit2 = model2.fit()
res2 = np.flipud(fit2.local_outlier_probabilities)
print(res2)
print(np.flipud(fit2.neighbor_matrix))

s  = 1 + 100 * res.astype(float)
s2 = 1 + 100 * res2.astype(float)
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s,  marker='+')
plt.scatter(data[:, 0], data[:, 1], c=clus, s=s2, marker='x')
plt.show()

@mdruiter
Copy link
Author

mdruiter commented Mar 6, 2023

The problem is in the 'definition' of neighbor_matrix: _compute_distance_and_neighbor_matrix returns indexes within the cluster, but _prob_distances_ev treats the numbers as being global.

@vc1492a
Copy link
Owner

vc1492a commented Mar 20, 2023

Hey @mdruiter - thanks for noting the issue and where it is occurring.

Are you able to submit a fix in a pull request?

@vc1492a vc1492a self-assigned this Mar 20, 2023
@vc1492a vc1492a added the bug Something isn't working label Mar 20, 2023
@vc1492a vc1492a added this to the Address Existing Bug Fixes milestone Aug 19, 2024
@vc1492a vc1492a assigned IroNEDR and unassigned vc1492a Aug 19, 2024
@IroNEDR IroNEDR added the in progress This issue is being actively worked on label Aug 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working in progress This issue is being actively worked on
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

3 participants