Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tangram crashes on failure to train binary classification model #81

Open
isabella opened this issue Dec 30, 2021 · 1 comment
Open

Tangram crashes on failure to train binary classification model #81

isabella opened this issue Dec 30, 2021 · 1 comment
Labels
bug Something isn't working

Comments

@isabella
Copy link
Contributor

When training a binary classification model, the CLI crashes with the following output:

error: panicked at 'called `Option::unwrap()` on a `None` value', crates/core/train.rs:1729:22
   0: backtrace::capture::Backtrace::new
   1: tangram::train::train::{{closure}}
   2: std::panicking::rust_panic_with_hook
   3: std::panicking::begin_panic_handler::{{closure}}
   4: std::sys_common::backtrace::__rust_end_short_backtrace
   5: _rust_begin_unwind
   6: core::panicking::panic_fmt
   7: core::panicking::panic
   8: tangram_core::train::Trainer::test_and_assemble_model
   9: tangram::main
  10: std::sys_common::backtrace::__rust_begin_short_backtrace
  11: _main

https://github.com/tangramdotdev/tangram/blob/2e51ef1ae3c7ec1e65b9232945d5cfb6d99d52ef/crates/core/train.rs#L1882
We handled this in the regression case here https://github.com/tangramdotdev/tangram/blob/2e51ef1ae3c7ec1e65b9232945d5cfb6d99d52ef/crates/core/train.rs#L1857
by removing the unwrap and outputtting a friendly error message. We need to do the same for binary classification and multiclass classification.

@isabella isabella added the bug Something isn't working label Dec 30, 2021
@isabella
Copy link
Contributor Author

isabella commented Dec 30, 2021

training output on current release:

✅ Inferring train table columns. 0ms
✅ Loading train table. 0ms
✅ Shuffling. 0ms
warning: The train dataset is very small. It has only 7 row(s).
warning: The comparison dataset is very small. It has only 1 row(s).
warning: The test dataset is very small. It has only 2 row(s).
✅ Computing train stats. 0ms
✅ Computing test stats. 0ms
✅ Finalizing stats. 0ms
✅ Computing baseline metrics. 0ms
info: Press ctrl-c to stop early and save the best model trained so far.
✅ Computing features. 0ms
✅ Training model 1 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 1 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 2 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 2 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 3 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 3 AUC ROC: NaN
✅ Computing features. 0ms
✅ Training model 4 of 8: Linear. 0s 2ms
✅ Computing model comparison features. 0s 7ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 4 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 5 of 8: Tree. 0ms
✅ Training model 5 of 8. 0s 88ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 5 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 6 of 8: Tree. 0ms
✅ Training model 6 of 8. 0s 95ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 6 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 7 of 8: Tree. 0ms
✅ Training model 7 of 8. 0s 107ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 7 AUC ROC: NaN
✅ Computing features. 0ms
✅ Preparing model 8 of 8: Tree. 0ms
✅ Training model 8 of 8. 0s 127ms
✅ Computing model comparison features. 0ms
✅ Computing comparison metric. 0ms
info: 🎯 Model 8 AUC ROC: NaN
error: panicked at 'called `Option::unwrap()` on a `None` value', crates/core/train.rs:1881:22
   0: backtrace::backtrace::trace
   1: backtrace::capture::Backtrace::new
   2: tangram::train::train::{{closure}}
   3: std::panicking::rust_panic_with_hook
   4: std::panicking::begin_panic_handler::{{closure}}
   5: std::sys_common::backtrace::__rust_end_short_backtrace
   6: _rust_begin_unwind
   7: core::panicking::panic_fmt
   8: core::panicking::panic
   9: tangram::train::train::{{closure}}
  10: tangram::main
  11: std::sys_common::backtrace::__rust_begin_short_backtrace
  12: _main

csv:

is_fraud,account.state,account.credit_score,account.account_age_days,account.has_2fa_installed,transaction_stats.transaction_count_7d,transaction_stats.transaction_count_30d
Positive,Arizona,685,1547,0,9,41
Negative,Hawaii,625,861,1,11,36
Negative,Arkansas,730,958,0,0,16
Positive,Louisiana,610,1570,0,12,26
Negative,South Dakota,635,1953,0,7,30
Negative,Louisiana,710,32,0,8,22
Positive,New Mexico,645,37,1,5,40
Negative,Nevada,735,1627,0,12,51
Negative,Kentucky,650,88,1,11,23
Negative,Delaware,680,1687,0,2,39

The warning indicates there is only a single row for comparison. This is not enough. we need to enforce a reasonable minimum training dataset size. A valid AUC only exists if there is at least one example whose true value is positive and one example whose true value is negative, otherwise one of the TPR or FPR used to compute the AUC will be NaN.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant