Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Various improvements to predictive chapters #314

Merged
merged 12 commits into from
Nov 15, 2023

Conversation

trevorcampbell
Copy link
Contributor

@trevorcampbell trevorcampbell commented Nov 13, 2023

Copy link

Hello! I've built a preview of your PR so that you can compare it to the current main branch.

  • PR deploy preview available here
  • Current main deploy preview available here
  • Public production build available here

@trevorcampbell trevorcampbell marked this pull request as ready for review November 14, 2023 02:13
@joelostblom
Copy link
Collaborator

Nice! I think the added section is clear and will be helpful for students. One thing I would add is a comment on recall and precision compared to the un-tuned classifier, which are actually slightly different (in an undesirable way for recall). I also noticed that we did an error in our computation of recall further up and that we never show how to use sklearn to get these numbers; we only do it manually via the confusion matrix. Details with screenshots:

  1. Mix up of positive label. In 6.3, we correctly define "Malignant" as our positive label, since that is what we are looking for:
    image

    However, in 6.5.5, we use "Benign" as the possible label when we manually compute precision and recall
    image

  2. Not showing how to compute precision and recall with sklearn. In 6.5.5 We show how to compute accuracy via .score, but we never show how to compute recall and precision, instead we compute these manually from the confusion matrix:
    image

    I think we could consider changing that paragraph to:

    The output shows that the estimated accuracy of the classifier on the test data was 88%. To compute the precision and recall we can use the following functions from scikit-learn:
    code cell with precision/recall computation
    We can see that our precision was .... and our recall was ... . Finally, we can also look at the confusion matrix for the classifier using the crosstab function from pandas. The crosstab function takes two arguments: the actual labels first, then the predicted labels second. The columns and rows are ordered alphabetically, but our positive label is still "Malignant", even if it is not in the top left corner as in the general confusion matrix above.

  3. Comment on precision and/or recall after tuning.

    Before tuning:
    image

    After tuning:
    image

    I think we can add a comment on that although accuracy remain similar, our classifier now has slightly worse recall and misses more malignant samples (TPs), and briefly remind students why TPs are important in our context and that we might think more carefully about how to choose our optimal hyperparameters. Optionally we can also show the sklearn functions to compute the scores here in addition to doing it manually.

@trevorcampbell
Copy link
Contributor Author

@joelostblom all great comments (and I'm not sure how I missed that benign vs malignant issue in the equations...probably when we did the transpose of the matrix I forgot to adjust those glues too)

Thanks a lot -- will fix those and merge.

@trevorcampbell trevorcampbell merged commit 2a9814e into main Nov 15, 2023
0 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants