Accuracy measures the fraction of correct predictions. Specifically, it is the number of correct predictions divided by the total number of predictions.
We can change the decision threshold, it should not be always 0.5. But, in this particular problem, the best decision cutoff, associated with the hightest accuracy (80%), was indeed 0.5.
Note that if we build a dummy model in which the decision cutoff is 1, so the algorithm predicts that no clients will churn, the accuracy would be 73%. Thus, we can see that the improvement of the original model with respect to the dummy model is not as high as we would expect.
Therefore, in this problem accuracy can not tell us how good is the model because the dataset is unbalanced, which means that there are more instances from one category than the other. This is also known as class imbalance.
Classes and methods:
np.linspace(x,y,z)
- returns a numpy array starting atx
untily
withz
evenly spaced samplesCounter(x)
- collection class that counts the number of instances that satisfy thex
conditionaccuracy_score(x, y)
- sklearn.metrics class for calculating the accuracy of a model, given a predictedx
dataset and a targety
dataset.
The entire code of this project is available in this jupyter notebook.
Add notes from the video (PRs are welcome)
The notes are written by the community. If you see an error here, please create a PR with a fix. |