Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Add support for NaN and mixed types in infer_labels() #13

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented Dec 15, 2021

Closes #12

This adds support for having NaN in truth or prediction and for having mixed types like ['a', 0] in truth and prediction.

@codecov
Copy link

codecov bot commented Dec 15, 2021

Codecov Report

Merging #13 (513244d) into master (279806b) will not change coverage.
The diff coverage is 100.0%.

Impacted Files Coverage Δ
audmetric/core/utils.py 100.0% <100.0%> (ø)

@frankenjoe
Copy link
Collaborator

frankenjoe commented Dec 15, 2021

I see the use-case for supporting NaN. But does it make sense to have mixed types? Isn't that rather a sign for mis-use? E.g. the user passes strings for truth but ids for prediction. I wonder if it makes more sense to raise an error in that case.

@frankenjoe
Copy link
Collaborator

frankenjoe commented Dec 15, 2021

Btw: how do we treat NaN in our metrics? E.g. how do we account for those labels in a confusion matrix?

@hagenw
Copy link
Member Author

hagenw commented Dec 15, 2021

Btw: how do we treat NaN in our metrics? E.g. how do we account for those labels in a confusion matrix?

We have not really thought much about it. For some it just works:

>>> audmetric.unweighted_average_recall(['a', 'b'], ['a', 'a'])
0.5
>>> audmetric.unweighted_average_recall(['a', 'b'], ['a', np.NaN])
0.5

Others did strange stuff in master:

>>> audmetric.precision_per_class([0, 0, 2, 1], [0, 1, np.NaN, np.NaN])
{0: 1.0, 1: 0.0, 2: 0.0, nan: 0.0}

Now it would look like this

>>> audmetric.precision_per_class([0, 0, 2, 1], [0, 1, np.NaN, np.NaN])
{0: 1.0, 1: 0.0, 2: 0.0}

@hagenw
Copy link
Member Author

hagenw commented Dec 15, 2021

I see the use-case for supporting NaN. But does it make sense to have mixed types? Isn't that rather a sign for mis-use? E.g. the user passes strings for truth but ids for prediction. I wonder if it makes more sense to raise an error in that case.

I'm not so sure about this. If you have audformat with its fixed types in mind, maybe. But I would find it valid if for example the labels are IDs collected by users, and some decide to enter an integer and others to enter a string.

@frankenjoe
Copy link
Collaborator

I'm not so sure about this. If you have audformat with its fixed types in mind, maybe. But I would find it valid if for example the labels are IDs collected by users, and some decide to enter an integer and others to enter a string.

But how do you map between strings and integers? Or do you assume they represent different classes? Why would a user do that? I still see the risk that integer and strings are accidentally mixed and the user will not notice it if allow that.

@frankenjoe
Copy link
Collaborator

frankenjoe commented Dec 15, 2021

We have not really thought much about it. For some it just works:

I guess we either have to come up with a proper solution or otherwise I think it would be safer to raise an error if we encounter NaN.

@hagenw hagenw changed the title Add support for NaN and mixed types in infer_labels() WIP: Add support for NaN and mixed types in infer_labels() Dec 15, 2021
@hagenw
Copy link
Member Author

hagenw commented Dec 15, 2021

OK, I created #14 and set this pull request to WIP.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

audmetric.utils.infer_labels() can fail for NaN
2 participants