Predicting confidence scores #81

Yasmen-Wahba · 2023-01-26T20:17:55Z

Hello,
Is there a predict_proba() method for the LCPPN pipeline ??

mirand863 · 2023-02-01T09:33:04Z

Hello, Is there a predict_proba() method for the LCPPN pipeline ??

Not at the moment, but I can add shortly. However, I did not add this yet because the probability scores become skewed since the parent nodes are trained on subsets of the data. Would that be a problem for your application? There are some methods to calibrate/smooth the probability scores in hierarchical classification, but might take me a while to have time to code them since I am currently working on the multi-label problem.

channeng · 2023-09-22T09:22:09Z

Hi, thanks for building this library. It really makes it easy to perform hierarchical classification.

It will certainly be useful to have scores for each node along the category path. Then we can decide if instead of a leaf category prediction, we can traverse upward to a parent category.

PRFina · 2024-02-10T08:57:37Z

Hi @mirand863 and thanks for building this library!
We're currently working on a multiclass classification problem achieving good performance with hierarchical models.
To evaluate our models we need to get the confidence score, but as you already mentioned, the API doesn't expose the predict_proba method.

We are thinking of implementing it by ourselves, simply traversing the DAG (a tree in our case) and multiplying the score of each node in the path to get the leaf node score. What do you think about this very simple approach? Can you elaborate a little bit on the "skewness" issue? Can you provide some literature about calibrate/smooth the probability scores in hierarchical classification?
If something good comes out, we'll be very happy to contribute with a PR 😃

mirand863 · 2024-02-14T15:16:43Z

Hi @mirand863 and thanks for building this library! We're currently working on a multiclass classification problem achieving good performance with hierarchical models. To evaluate our models we need to get the confidence score, but as you already mentioned, the API doesn't expose the predict_proba method.

We are thinking of implementing it by ourselves, simply traversing the DAG (a tree in our case) and multiplying the score of each node in the path to get the leaf node score. What do you think about this very simple approach? Can you elaborate a little bit on the "skewness" issue? Can you provide some literature about calibrate/smooth the probability scores in hierarchical classification? If something good comes out, we'll be very happy to contribute with a PR 😃

Hi @PRFina,

Glad to hear you are getting good results with hierarchical classifiers.

The problem that I mentioned is that the local classifiers are only trained on subsets of the data. Sometimes even a single data point is used for training leaf nodes. Hence, when you try to return the probabilities for your test data it becomes inaccurate. I hope this makes sense.

There is currently a master student working on this issue for his master thesis, but it might still take a few months before any code is released. Would it be OK for you to wait a while longer? Otherwise I think the strategy you describe can possibly work if you have a large amount of data. Another method that come to my mind is shrinkage.

Best regards,
Fabio

lukas-kania-ccmlp · 2024-06-28T13:59:43Z

Hi @mirand863, Wanted to check in on this work. This would be very useful to have the probabilities output. Do you have an update on progress?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Predicting confidence scores #81

Predicting confidence scores #81

Yasmen-Wahba commented Jan 26, 2023

mirand863 commented Feb 1, 2023 •

edited

Loading

channeng commented Sep 22, 2023

PRFina commented Feb 10, 2024

mirand863 commented Feb 14, 2024

lukas-kania-ccmlp commented Jun 28, 2024 •

edited

Loading

Predicting confidence scores #81

Predicting confidence scores #81

Comments

Yasmen-Wahba commented Jan 26, 2023

mirand863 commented Feb 1, 2023 • edited Loading

channeng commented Sep 22, 2023

PRFina commented Feb 10, 2024

mirand863 commented Feb 14, 2024

lukas-kania-ccmlp commented Jun 28, 2024 • edited Loading

mirand863 commented Feb 1, 2023 •

edited

Loading

lukas-kania-ccmlp commented Jun 28, 2024 •

edited

Loading