Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Incorrect implementation of conditional probability in Naive Bayes classifier #129

Open
lokesh-vr-17773 opened this issue Jul 13, 2024 · 1 comment

Comments

@lokesh-vr-17773
Copy link

In _probabilities method, the probabilities might go over 1 for this case.

Consider there are three messages in our train dataset, of which one is ham and remaining two are spam.
the spam messages contain 'bitcoin' multiple times, let's say the count of word bitcoin in spam messages are 10.
In brief,

ham messages = 1
spam messages = 2
count of bitcoint token = 10

then,

p_token_spam = (spam + self.k) / (self.spam_messages + 2 * self.k) # k -> smoothening factor = 0.5
p_token_spam = (10 + 0.5) / (2 + 2 * 0.5) = 10.5 / 3 = 3.5

Since probabilities cannot go above 1, how should we interpret 3.5 in this case?

@lokesh-vr-17773
Copy link
Author

lokesh-vr-17773 commented Jul 13, 2024

The correct way to calculate P(token | spam) is,

Message - 1 (spam message)

bitcoin bitcoin bitcoin bitcoin bitcoin testing

Message - 2 (spam message)

bitcoin bitcoin bitcoin bitcoin bitcoin testing

Message - 3 (ham message)

A genuine mail

In brief,
spam messages = 2
ham messages = 1

total spam tokens = 12
total ham tokens = 3

count of tokens in spam messages = { 'bitcoin': 10, 'testing': 2 }
count of tokens in ham messages = { 'a': 1, 'genuine': 1, 'mail': 1 }

P(bitcoin | spam):
=> Count of bitcoin in spam messages / total count of spam tokens
=> 10 / 12
=> 0.833

@lokesh-vr-17773 lokesh-vr-17773 changed the title [Bug] Probabilities in naive bayes is greater than 1 [Bug] Incorrect implementation of conditional probability in Navie Bayes classifier Jul 13, 2024
@lokesh-vr-17773 lokesh-vr-17773 changed the title [Bug] Incorrect implementation of conditional probability in Navie Bayes classifier [Bug] Incorrect implementation of conditional probability in Naive Bayes classifier Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant