Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Predicting confidence scores #81

Open
Yasmen-Wahba opened this issue Jan 26, 2023 · 5 comments
Open

Predicting confidence scores #81

Yasmen-Wahba opened this issue Jan 26, 2023 · 5 comments

Comments

@Yasmen-Wahba
Copy link

Hello,
Is there a predict_proba() method for the LCPPN pipeline ??

@mirand863
Copy link
Collaborator

mirand863 commented Feb 1, 2023

Hello, Is there a predict_proba() method for the LCPPN pipeline ??

Hi @Yasmen-Wahba ,

Not at the moment, but I can add shortly. However, I did not add this yet because the probability scores become skewed since the parent nodes are trained on subsets of the data. Would that be a problem for your application? There are some methods to calibrate/smooth the probability scores in hierarchical classification, but might take me a while to have time to code them since I am currently working on the multi-label problem.

@channeng
Copy link

Hi, thanks for building this library. It really makes it easy to perform hierarchical classification.

It will certainly be useful to have scores for each node along the category path. Then we can decide if instead of a leaf category prediction, we can traverse upward to a parent category.

@PRFina
Copy link

PRFina commented Feb 10, 2024

Hi @mirand863 and thanks for building this library!
We're currently working on a multiclass classification problem achieving good performance with hierarchical models.
To evaluate our models we need to get the confidence score, but as you already mentioned, the API doesn't expose the predict_proba method.

We are thinking of implementing it by ourselves, simply traversing the DAG (a tree in our case) and multiplying the score of each node in the path to get the leaf node score. What do you think about this very simple approach? Can you elaborate a little bit on the "skewness" issue? Can you provide some literature about calibrate/smooth the probability scores in hierarchical classification?
If something good comes out, we'll be very happy to contribute with a PR 😃

@mirand863
Copy link
Collaborator

Hi @mirand863 and thanks for building this library! We're currently working on a multiclass classification problem achieving good performance with hierarchical models. To evaluate our models we need to get the confidence score, but as you already mentioned, the API doesn't expose the predict_proba method.

We are thinking of implementing it by ourselves, simply traversing the DAG (a tree in our case) and multiplying the score of each node in the path to get the leaf node score. What do you think about this very simple approach? Can you elaborate a little bit on the "skewness" issue? Can you provide some literature about calibrate/smooth the probability scores in hierarchical classification? If something good comes out, we'll be very happy to contribute with a PR 😃

Hi @PRFina,

Glad to hear you are getting good results with hierarchical classifiers.

The problem that I mentioned is that the local classifiers are only trained on subsets of the data. Sometimes even a single data point is used for training leaf nodes. Hence, when you try to return the probabilities for your test data it becomes inaccurate. I hope this makes sense.

There is currently a master student working on this issue for his master thesis, but it might still take a few months before any code is released. Would it be OK for you to wait a while longer? Otherwise I think the strategy you describe can possibly work if you have a large amount of data. Another method that come to my mind is shrinkage.

Best regards,
Fabio

@lukas-kania-ccmlp
Copy link

lukas-kania-ccmlp commented Jun 28, 2024

Hi @mirand863, Wanted to check in on this work. This would be very useful to have the probabilities output. Do you have an update on progress?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants