Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

anilkumarpanda · 2021-11-03T15:39:31Z

When I try to run the following code, I see that the model co-efficients are inverted between version 0.7.1 and 1.4.0

from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline
from skorecard.datasets import load_credit_card
from sklearn.model_selection import train_test_split
from skorecard import Skorecard
from sklearn.metrics import roc_auc_score

#Load data
data = load_credit_card(as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    data.drop(['y'], axis=1),
    data['y'], 
    test_size=0.25, 
    random_state=42
)
#Select features
selected_feats = ['x6','x8','x10','x18','x1','x19','x20','x21','x23','x22','x3','x17','x16']
even_cat_cols = ['x6','x8']
odd_cat_cols = ['x19']
num_cols = ['x10','x18','x1','x20','x21','x23','x22','x3','x17','x16']
#Create bucketing process
prebucketing_pipeline=make_pipeline( 
    DecisionTreeBucketer(variables=selected_feats, max_n_bins=40, #loose requirements
                        min_bin_size=0.03
                        ),
   
)

bucketing_pipeline=make_pipeline(
    OptimalBucketer(variables=selected_feats, max_n_bins=6,min_bin_size=0.05,missing_treatment='most_risky'),
)
bucketing_process = BucketingProcess(
    prebucketing_pipeline=prebucketing_pipeline,
    bucketing_pipeline=bucketing_pipeline,
)

bucketing_process = bucketing_process.fit(X_train[selected_feats], y_train)

# Train skorecard version 0.7.1
# scorecard = Skorecard(
#     bucketing_process,
#     selected_features=selected_feats
# )

# Train skorecard version 1.4.0
scorecard = Skorecard(
    bucketing_process,
    variables=selected_feats,
    calculate_stats=True

)
scorecard.fit(X_train[selected_feats], y_train)

# Results
proba_train = scorecard.predict_proba(X_train[selected_feats])[:,1]
proba_test = scorecard.predict_proba(X_test[selected_feats])[:,1]
print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}\n")
print(scorecard.get_stats())

Results version 0.7.1

AUC train:0.7679
AUC test :0.7626

Features	Coef.	Std.Err	z	P>z
const	-1.24442	0.0182161	-68.3145	0
x6	-0.753524	0.0208574	-36.1274	8.4108e-286
x8	-0.277921	0.034096	-8.15111	3.60589e-16
x10	-0.31137	0.0381928	-8.1526	3.56189e-16
x18	-0.277205	0.0510278	-5.43243	5.55932e-08
x1	-0.350234	0.0484375	-7.23063	4.80774e-13
x19	-0.210958	0.0573138	-3.68075	0.000232548
x20	-0.279283	0.0665349	-4.19754	2.69828e-05
x21	-0.138776	0.0806698	-1.7203	0.0853775
x23	-0.176563	0.0754207	-2.34104	0.0192301
x22	-0.146805	0.0807149	-1.81882	0.0689397
x3	-0.0831014	0.136821	-0.607372	0.543604
x17	-0.294193	0.334516	-0.879459	0.379153
x16	0.56809	0.407881	1.39278	0.163685

Results version 1.4.0

AUC train:0.7679
AUC test :0.7626

Features	Coef.	Std.Err	z	P>z
const	-1.24575	0.0182208	-68.3695	0
x6	0.753233	0.0208504	36.1256	8.98644e-286
x8	0.277778	0.0340932	8.14762	3.71169e-16
x10	0.311173	0.0381806	8.15003	3.63845e-16
x18	0.277392	0.0510811	5.43042	5.62211e-08
x1	0.350658	0.0484606	7.23593	4.62344e-13
x19	0.211326	0.057416	3.68061	0.000232678
x20	0.279587	0.0665905	4.1986	2.68564e-05
x21	0.138937	0.0808025	1.71946	0.0855306
x23	0.176326	0.0755183	2.33488	0.0195497
x22	0.146274	0.080873	1.80869	0.0704997
x3	0.0832047	0.136923	0.607673	0.543404
x17	0.295899	0.334592	0.884356	0.376504
x16	-0.562766	0.408206	-1.37863	0.168008

Maybe this is also related to #68 .

timvink · 2021-11-03T17:30:45Z

Yeah the 1.0 upgrade was massive, hence the
major upgrade (https://github.com/ing-bank/skorecard/releases) so backward compatibility is not something to strive for.

Should be consistent however from now on. Do you have trouble upgrading?

idellang · 2021-11-12T02:58:46Z

How were you able to test outputs between two version?

anilkumarpanda · 2021-11-23T13:49:20Z

Shouldn't the co-efficients be negatives as per the tutorial ?

anilkumarpanda · 2021-11-23T13:50:15Z

How were you able to test outputs between two version?

I tested the same model pipeline, using 2 different versions of the package.

orchardbirds · 2021-11-24T14:35:48Z

@anilkumarpanda

I believe the scores are correct. I've just released a new version that also includes the % Event and Non-event, so we can better understand the statistics:

The weight of evidence is calculated as % Goods / % Bads. In our case, the goods are the non-events. There are 2 outcomes:

% Goods > % Bads. This means WoE > 0
% Bads > % Goods. This means WoE < 0

You can see in the above image that the WoE is negative for the expected cases.

If you agree, please go ahead and close the issue :)

orchardbirds · 2021-12-21T15:30:06Z

Gonna close this. @anilkumarpanda if you think this is wrong, please reopen the issue :)

anilkumarpanda closed this as completed Nov 23, 2021

anilkumarpanda reopened this Nov 23, 2021

orchardbirds closed this as completed Dec 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

anilkumarpanda commented Nov 3, 2021 •

edited

Loading

timvink commented Nov 3, 2021

idellang commented Nov 12, 2021

anilkumarpanda commented Nov 23, 2021

anilkumarpanda commented Nov 23, 2021

orchardbirds commented Nov 24, 2021

orchardbirds commented Dec 21, 2021

Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

Comments

anilkumarpanda commented Nov 3, 2021 • edited Loading

timvink commented Nov 3, 2021

idellang commented Nov 12, 2021

anilkumarpanda commented Nov 23, 2021

anilkumarpanda commented Nov 23, 2021

orchardbirds commented Nov 24, 2021

orchardbirds commented Dec 21, 2021

anilkumarpanda commented Nov 3, 2021 •

edited

Loading