`merge_ebm` produces broken classifiers #576

DerWeh · 2024-10-01T20:03:27Z

The result of merge_ebm is not a valid classifier, as attributes are missing. The repr raises an AttributeError which makes it horrible to debug.

The following is a minimal reproducible example:

from sklearn.datasets import load_iris
from interpret.glassbox import ExplainableBoostingClassifier, merge_ebms

X, y = load_iris(return_X_y=True)

clf1 = ExplainableBoostingClassifier(interactions=0, outer_bags=2)
clf1.fit(X,y)
clf2 = ExplainableBoostingClassifier(interactions=0, outer_bags=2)
clf2.fit(X,y)
clf = merge_ebms([clf1, clf2])
repr(clf)

Which results in the error AttributeError: 'ExplainableBoostingClassifier' object has no attribute 'cyclic_progress'.

A hotfix is simply copying the attributes of the first classifier in the list:

for attr, val in clf1.get_params(deep=False).items():
    if not hasattr(clf, attr):
        setattr(clf, attr, val)

The question is, what the desired strategy is to fix this error. (I haven't delved into merge_ebms to know what's going on.) Another option would be setting the parameters where all classifiers agree, and setting all other parameters to None. This avoids immediate AttributeErrors but might cause problems later, in case None is not a valid parameter.

The text was updated successfully, but these errors were encountered:

paulbkoch · 2024-10-02T07:56:31Z

Ha, this was something I was avoiding because I couldn't think of a great solution, but I guess doing something would be better than nothing.

Some ideas:

We could use the default values for the class
We could take an average from the model on a parameter by parameter basis. For some parameters like learning_rate this would make some sense. For other like "interactions", it could be interesting since some models might use numbers, and others could use lists.
Same as item 2, except weighted by the bags. We store the per-outer bag weights in bag_weights_, so for many merged properties we weigh the contributions by their components.
We could override the scikit-learn function that is throwing the exception and handle it in a way that does not raise the exception.
Use the first model's values, as you mentioned

I don't really have a strong preference, but if I had to rank them I'd probably prefer #3, then #1, then #4, then the rest.

What do you think?

DerWeh · 2024-10-03T15:31:08Z

I agree, 3. seems reasonable enough. There are a few ugly edge cases, but it should be straight forward (still have to check out what's bag_weights_ all about). I would add two more options:

Breaking the API and demand that arguments are passed to merge_ebms forcing the user to make a choice.
Create a dedicated (prediction-only) class. Parameters seem to matter for the most part only for fitting. As (so far) no warm-start is support, I see no point in fitting a merged EBM.

I added 6. mostly for completeness, and don't think this is a particular desirable option. 7. seems to me like a very clean approach, but would probably require considerable work (where should these classes live in the class hierarchy?). I doubt that the benefit is worth the effort.

Should I go ahead implementing option 3.?

paulbkoch · 2024-10-04T10:58:33Z

Nice, I hadn't thought of 7, but that would work too. If you want to implement 3, that would be a very nice PR. We'll have warm_start someday, so it's probably a step in the right direction.

ANNIKADAHLMANN-8451 mentioned this issue Oct 11, 2024

Merge ebm with different subset of variables #564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`merge_ebm` produces broken classifiers #576

`merge_ebm` produces broken classifiers #576

DerWeh commented Oct 1, 2024 •

edited

Loading

paulbkoch commented Oct 2, 2024 •

edited

Loading

DerWeh commented Oct 3, 2024

paulbkoch commented Oct 4, 2024

merge_ebm produces broken classifiers #576

merge_ebm produces broken classifiers #576

Comments

DerWeh commented Oct 1, 2024 • edited Loading

paulbkoch commented Oct 2, 2024 • edited Loading

DerWeh commented Oct 3, 2024

paulbkoch commented Oct 4, 2024

`merge_ebm` produces broken classifiers #576

`merge_ebm` produces broken classifiers #576

DerWeh commented Oct 1, 2024 •

edited

Loading

paulbkoch commented Oct 2, 2024 •

edited

Loading