Various improvements to predictive chapters #314

trevorcampbell · 2023-11-13T22:20:40Z

github-actions · 2023-11-14T00:38:42Z

Hello! I've built a preview of your PR so that you can compare it to the current main branch.

PR deploy preview available here
Current main deploy preview available here
Public production build available here

joelostblom · 2023-11-14T17:40:21Z

Nice! I think the added section is clear and will be helpful for students. One thing I would add is a comment on recall and precision compared to the un-tuned classifier, which are actually slightly different (in an undesirable way for recall). I also noticed that we did an error in our computation of recall further up and that we never show how to use sklearn to get these numbers; we only do it manually via the confusion matrix. Details with screenshots:

Mix up of positive label. In 6.3, we correctly define "Malignant" as our positive label, since that is what we are looking for:

However, in 6.5.5, we use "Benign" as the possible label when we manually compute precision and recall
Not showing how to compute precision and recall with sklearn. In 6.5.5 We show how to compute accuracy via .score, but we never show how to compute recall and precision, instead we compute these manually from the confusion matrix:

I think we could consider changing that paragraph to:

The output shows that the estimated accuracy of the classifier on the test data was 88%. To compute the precision and recall we can use the following functions from scikit-learn:
code cell with precision/recall computation
We can see that our precision was .... and our recall was ... . Finally, we can also look at the confusion matrix for the classifier using the crosstab function from pandas. The crosstab function takes two arguments: the actual labels first, then the predicted labels second. The columns and rows are ordered alphabetically, but our positive label is still "Malignant", even if it is not in the top left corner as in the general confusion matrix above.
Comment on precision and/or recall after tuning.

Before tuning:

After tuning:

I think we can add a comment on that although accuracy remain similar, our classifier now has slightly worse recall and misses more malignant samples (TPs), and briefly remind students why TPs are important in our context and that we might think more carefully about how to choose our optimal hyperparameters. Optionally we can also show the sklearn functions to compute the scores here in addition to doing it manually.

trevorcampbell · 2023-11-14T21:37:11Z

@joelostblom all great comments (and I'm not sure how I missed that benign vs malignant issue in the equations...probably when we did the transpose of the matrix I forgot to adjust those glues too)

Thanks a lot -- will fix those and merge.

empty commit

c52774b

trevorcampbell mentioned this pull request Nov 13, 2023

Python sync: predictive chapter improvements UBC-DSCI/introduction-to-datascience#562

Merged

added evaluating test set in clsfn2

d472553

trevorcampbell added 2 commits November 13, 2023 17:12

minor polish to eval on test set

beed8a1

fix inconsistency in train/test split in reg1 and reg2

65b465b

trevorcampbell marked this pull request as ready for review November 14, 2023 02:13

trevorcampbell added 8 commits November 14, 2023 13:57

bugfix in precision/recall computations in cls2

2989301

more discussion of prec/rec; robustifying the cv5 vs 10 result

869a3b6

minor ed

95b4166

minor ed

1924e9b

fixing quotes

797a353

reverting n_neighbors change

1b7788e

re-adding 50fold example now with less seed hacking

dda1e5b

seed hacking

5cfef6e

trevorcampbell merged commit 2a9814e into main Nov 15, 2023
0 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Various improvements to predictive chapters #314

Various improvements to predictive chapters #314

trevorcampbell commented Nov 13, 2023 •

edited

Loading

github-actions bot commented Nov 14, 2023

joelostblom commented Nov 14, 2023

trevorcampbell commented Nov 14, 2023

Various improvements to predictive chapters #314

Various improvements to predictive chapters #314

Conversation

trevorcampbell commented Nov 13, 2023 • edited Loading

github-actions bot commented Nov 14, 2023

joelostblom commented Nov 14, 2023

trevorcampbell commented Nov 14, 2023

trevorcampbell commented Nov 13, 2023 •

edited

Loading