Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature co-efficient signs are inverted between version 0.7.1 and 1.4.0 #69

Closed
anilkumarpanda opened this issue Nov 3, 2021 · 6 comments

Comments

@anilkumarpanda
Copy link
Contributor

anilkumarpanda commented Nov 3, 2021

When I try to run the following code, I see that the model co-efficients are inverted between version 0.7.1 and 1.4.0

from skorecard.bucketers import DecisionTreeBucketer, OptimalBucketer
from skorecard.pipeline import BucketingProcess
from sklearn.pipeline import make_pipeline
from skorecard.datasets import load_credit_card
from sklearn.model_selection import train_test_split
from skorecard import Skorecard
from sklearn.metrics import roc_auc_score

#Load data
data = load_credit_card(as_frame=True)
X_train, X_test, y_train, y_test = train_test_split(
    data.drop(['y'], axis=1),
    data['y'], 
    test_size=0.25, 
    random_state=42
)
#Select features
selected_feats = ['x6','x8','x10','x18','x1','x19','x20','x21','x23','x22','x3','x17','x16']
even_cat_cols = ['x6','x8']
odd_cat_cols = ['x19']
num_cols = ['x10','x18','x1','x20','x21','x23','x22','x3','x17','x16']
#Create bucketing process
prebucketing_pipeline=make_pipeline( 
    DecisionTreeBucketer(variables=selected_feats, max_n_bins=40, #loose requirements
                        min_bin_size=0.03
                        ),
   
)

bucketing_pipeline=make_pipeline(
    OptimalBucketer(variables=selected_feats, max_n_bins=6,min_bin_size=0.05,missing_treatment='most_risky'),
)
bucketing_process = BucketingProcess(
    prebucketing_pipeline=prebucketing_pipeline,
    bucketing_pipeline=bucketing_pipeline,
)

bucketing_process = bucketing_process.fit(X_train[selected_feats], y_train)

# Train skorecard version 0.7.1
# scorecard = Skorecard(
#     bucketing_process,
#     selected_features=selected_feats
# )

# Train skorecard version 1.4.0
scorecard = Skorecard(
    bucketing_process,
    variables=selected_feats,
    calculate_stats=True

)
scorecard.fit(X_train[selected_feats], y_train)

# Results
proba_train = scorecard.predict_proba(X_train[selected_feats])[:,1]
proba_test = scorecard.predict_proba(X_test[selected_feats])[:,1]
print(f"AUC train:{round(roc_auc_score(y_train, proba_train),4)}")
print(f"AUC test :{round(roc_auc_score(y_test, proba_test),4)}\n")
print(scorecard.get_stats())

Results version 0.7.1

AUC train:0.7679
AUC test :0.7626

Features Coef. Std.Err z P>z
const -1.24442 0.0182161 -68.3145 0
x6 -0.753524 0.0208574 -36.1274 8.4108e-286
x8 -0.277921 0.034096 -8.15111 3.60589e-16
x10 -0.31137 0.0381928 -8.1526 3.56189e-16
x18 -0.277205 0.0510278 -5.43243 5.55932e-08
x1 -0.350234 0.0484375 -7.23063 4.80774e-13
x19 -0.210958 0.0573138 -3.68075 0.000232548
x20 -0.279283 0.0665349 -4.19754 2.69828e-05
x21 -0.138776 0.0806698 -1.7203 0.0853775
x23 -0.176563 0.0754207 -2.34104 0.0192301
x22 -0.146805 0.0807149 -1.81882 0.0689397
x3 -0.0831014 0.136821 -0.607372 0.543604
x17 -0.294193 0.334516 -0.879459 0.379153
x16 0.56809 0.407881 1.39278 0.163685

Results version 1.4.0

AUC train:0.7679
AUC test :0.7626

Features Coef. Std.Err z P>z
const -1.24575 0.0182208 -68.3695 0
x6 0.753233 0.0208504 36.1256 8.98644e-286
x8 0.277778 0.0340932 8.14762 3.71169e-16
x10 0.311173 0.0381806 8.15003 3.63845e-16
x18 0.277392 0.0510811 5.43042 5.62211e-08
x1 0.350658 0.0484606 7.23593 4.62344e-13
x19 0.211326 0.057416 3.68061 0.000232678
x20 0.279587 0.0665905 4.1986 2.68564e-05
x21 0.138937 0.0808025 1.71946 0.0855306
x23 0.176326 0.0755183 2.33488 0.0195497
x22 0.146274 0.080873 1.80869 0.0704997
x3 0.0832047 0.136923 0.607673 0.543404
x17 0.295899 0.334592 0.884356 0.376504
x16 -0.562766 0.408206 -1.37863 0.168008

Maybe this is also related to #68 .

@timvink
Copy link
Contributor

timvink commented Nov 3, 2021

Yeah the 1.0 upgrade was massive, hence the
major upgrade (https://github.com/ing-bank/skorecard/releases) so backward compatibility is not something to strive for.

Should be consistent however from now on. Do you have trouble upgrading?

@idellang
Copy link

How were you able to test outputs between two version?

@anilkumarpanda
Copy link
Contributor Author

Shouldn't the co-efficients be negatives as per the tutorial ?

@anilkumarpanda
Copy link
Contributor Author

How were you able to test outputs between two version?

I tested the same model pipeline, using 2 different versions of the package.

@orchardbirds
Copy link
Contributor

@anilkumarpanda

I believe the scores are correct. I've just released a new version that also includes the % Event and Non-event, so we can better understand the statistics:

Screenshot 2021-11-24 at 15 29 15

The weight of evidence is calculated as % Goods / % Bads. In our case, the goods are the non-events. There are 2 outcomes:

  • % Goods > % Bads. This means WoE > 0
  • % Bads > % Goods. This means WoE < 0

You can see in the above image that the WoE is negative for the expected cases.

If you agree, please go ahead and close the issue :)

@orchardbirds
Copy link
Contributor

Gonna close this. @anilkumarpanda if you think this is wrong, please reopen the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants