Merge ebm with different subset of variables #564

sadsquirrel369 · 2024-07-24T16:17:56Z

I am fitting a model with a subset of variables with no interaction present. I now want to fit interactions with a larger subset of variables and merge it with the original model.

The merge ebm method does not allow for this in its current form. Is there not a smart way to build a new model with the components of the two underlying models into a new clean instance?

paulbkoch · 2024-07-24T18:51:25Z

Hi @sadsquirrel369 -- This is supported, but is currently a bit more complicated than it should be. In the future we want to support scikit-learn's warm_start functionality, which will make this simpler. Today, you need to do the following:

Make a dataframe or numpy array with a superset of all the features you'll need for both mains and interactions
set interactions=0 and exclude any individual features that you don't want considered in the mains.
Fit the mains model
Use exclude to exclude all mains, and you can also exclude any additional pairs you don't want to be considered for pairs. Set interactions to either a number for automatic detection, or a list of the specific interactions. Call fit using the init_score parameter set to the mains model so that it boosts the pairs on top of the mains.
call merge_ebms on the two EBMs. There are more details to this which are covered in our docs here: https://interpret.ml/docs/python/examples/custom-interactions.html

sadsquirrel369 · 2024-07-24T19:29:27Z

@paulbkoch Thanks for the prompt reply. So by excluding variables (with the parameter) in the model "mains" fitting, will all of the feature names be in the model.feature_names_in_ variable, irrespective of whether they were in the original dataset?

paulbkoch · 2024-07-24T20:09:15Z

Hi @sadsquirrel369 -- Features that are excluded will be recorded in the model.feature_names_in_ attribute, but they will not be used for prediction. Anything that is used for prediction is called a "term" in EBMs. If you print the model.term_names_ you'll see a list of everything that is used for prediction. For some datatypes like numpy arrays there are no column names and features are determined by their index, so it's important in these cases that both the features used in mains and the features used in pairs are all in the same dataset, even if they are not used in the model.

sadsquirrel369 · 2024-07-25T11:10:47Z

Thanks for the help!

sadsquirrel369 · 2024-07-31T06:27:15Z

Hi @paulbkoch,

When trying to merge the mains model with an interaction model I get this issue:

Inconsistent bin types within a model:

`---------------------------------------------------------------------------
Exception Traceback (most recent call last)
/var/folders/3b/lp8_hqx917138jd8rxttmzjc0000gn/T/ipykernel_35823/3985698511.py in
----> 1 merge_ebms([loaded_model,loaded_int_model])

/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/interpret/glassbox/ebm/merge_ebms.py in merge_ebms(models)
392 for model in models:
393 if any(len(set(map(type, bin_levels))) != 1 for bin_levels in model.bins):
--> 394 raise Exception("Inconsistent bin types within a model.")
395
396 feature_bounds = getattr(model, "feature_bounds", None)

Exception: Inconsistent bin types within a model.`

It appears the issue stems from some variables used in the interaction not having bin values in the main models because they were excluded. The interaction models will work correctly when the variables are present in both the main and interaction models. However, some variables are only beneficial when used in an interaction and not on their own (for example for vehicle classification where the combination of weight and power can help us identify different vehicle types).

paulbkoch · 2024-08-02T22:49:28Z

Hi @sadsquirrel369 -- This is really interesting. It appears you have a model where one of the feature mains is considered a categorical or continuous, but a pair using the same feature is considered to be the opposite. Are you doing any re-merging where you first merge a set of models and then merge that result again with some other models, or is it happening on the first merge when the main and interaction models are combined?

You can probably avoid this error by explicitly setting the feature_types parameter on all calls to the ExplainableBoostingClassifier constructor, thereby ensuring they are identical in all models being merged. This is something we could handle better though within merge_ebms. We can convert a feature from categorical into continuous during merges, but perhaps this isn't completely robust to more complicated scenarios involving pairs.

ANNIKADAHLMANN-8451 · 2024-10-10T15:25:02Z

I'm currently encountering this same error when trying to merge two EBMs I have ~10 features and I'm wondering if there's a streamlined way to specify all of these feature types? I'm getting the inconsistent bins error on the second merge (basically I'm trying to batch train an EBM model since my data is larger than what can fit in memory). I specify the data types using the feature_types parameter using the below snippet:

dtypes = ['continuous' if d == 'float64' else None if d == 'int64' else 'ordinal' if col in ordinal_types else 'nominal' for d, col in zip(X_trn.dtypes, X_trn.columns)]

And when I try to implement the workaround suggested in issue #576, I still get the same error

for attr, val in clf1.get_params(deep=False).items():
    if not hasattr(clf, attr):
        setattr(clf, attr, val)

sadsquirrel369 changed the title ~~Merge ebm with differentsubset of variables~~ Merge ebm with different subset of variables Jul 24, 2024

sadsquirrel369 closed this as completed Jul 25, 2024

sadsquirrel369 reopened this Jul 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge ebm with different subset of variables #564

Merge ebm with different subset of variables #564

sadsquirrel369 commented Jul 24, 2024

paulbkoch commented Jul 24, 2024

sadsquirrel369 commented Jul 24, 2024

paulbkoch commented Jul 24, 2024

sadsquirrel369 commented Jul 25, 2024

sadsquirrel369 commented Jul 31, 2024

paulbkoch commented Aug 2, 2024

ANNIKADAHLMANN-8451 commented Oct 10, 2024 •

edited

Loading

Merge ebm with different subset of variables #564

Merge ebm with different subset of variables #564

Comments

sadsquirrel369 commented Jul 24, 2024

paulbkoch commented Jul 24, 2024

sadsquirrel369 commented Jul 24, 2024

paulbkoch commented Jul 24, 2024

sadsquirrel369 commented Jul 25, 2024

sadsquirrel369 commented Jul 31, 2024

paulbkoch commented Aug 2, 2024

ANNIKADAHLMANN-8451 commented Oct 10, 2024 • edited Loading

ANNIKADAHLMANN-8451 commented Oct 10, 2024 •

edited

Loading