Blog: Bayes’ Theorem 101
The Opex 101 series is an introduction to the tools, trends, and techniques that will help you make the most of your organization’s data. Intended for business leaders and practitioners alike, these posts can help guide your analytics journey by demystifying essential topics in data science and operations research.
It’s 9AM on Monday morning, and you receive an email from your boss. You notice that it seems a little different from her usual notes: the message contains several grammatical errors, and ends by asking you to provide your social security number. Though you first assumed it was a legitimate email, the grammar mistakes and suspicious request convince you to send it right to the spam folder.
When making that quick decision to ignore the email from your “boss,” you unconsciously estimated several different probabilities. First, you judged the likelihood of a work email’s legitimacy to be fairly high. But then you assessed the probability that such a weird email could come from your boss to be low. You also have some general sense that phishing emails tend to be weird in a few specific ways, and you know that phishing scams are common enough that this particular email could plausibly be harmful.
With all this information swirling around in your head, you decide that the email is most likely spam. That’s pretty much all conditional probability is: determining the probability of an event given the other information that you know.
You may not realize it, but you unconsciously use conditional probability every single day. Though this spam example was far from a formal proof, you can actually represent all the beliefs you had about this email mathematically.
The notation for conditional probability is 𝗣(A|B), which means “the probability of A given B,” where A and B are events (i.e., things or circumstances that could happen). Conditional probability is therefore the probability of some event happening given some other condition (aka event).
The figures below are visual representations of the major concepts at play here.
To calculate the conditional probability of A given B, you must know two other relevant numbers: the probability of A and B both happening, 𝗣(A∩B), and the probability of B happening, 𝗣(B). It can be written as follows:
𝗣(A|B) = 𝗣(A∩B) / 𝗣(B)
Check out the figures below for an illustration of this formula.
However, there are multiple ways to represent this conditional probability. Bayes’ Theorem is an important mathematical tool for calculating the conditional probability of an event using the probabilities of other related events. The basic formula for Bayes’ Theorem is a slightly modified version of the previous definition of conditional probability:
Let’s try an example. Pretend you could assign actual probabilities to the different email-related events that you unconsciously estimated earlier. The three probabilities you’re interested in are:
- The probability that any email is a phishing scam, 𝗣(phish)
- The probability that an email is “weird” given that it is a phishing attempt, 𝗣(weird | phish)
- The total probability that an email is “weird”, 𝗣(weird)
(“Weird”, in this case, is shorthand for “resembles spam” — such an email might tell you to click a suspicious link, have “CONGRATULATIONS” in all caps, or strongly encourage you to act now lest something bad happen.)
How could we calculate the conditional probability that the email you received is spam given that it’s weird, using just these values?
Our Bayes’ Theorem formulation would be:
𝗣(phish | weird) = 𝗣(phish) × 𝗣(weird | phish) / 𝗣(weird)
Let’s start to solve this equation by assuming that 1 in every 100 emails is a hacking attempt, and that 80% of all phishing emails are clearly “weird.” Therefore:
𝗣(phish) = 0.01; 𝗣(weird | phish) = 0.8
𝗣(phish | weird) = 0.01 × 0.8 / 𝗣(weird)
Though the denominator is unresolved, we can calculate it using the Law of Total Probability, which says that the total probability of an event occurring is equal to the sum of the probabilities of any exhaustive set of mutually exclusive sub-events.
In this example, we can use this law to frame the total probability that an email is “weird” as the sum of (a) the probability that an email is weird and a phishing scam, and (b) the probability that an email is weird and not a phishing scam. This approach works because each email must be either spam or legitimate — it cannot be both, and it cannot be neither.
The formula for the total probability that an email is “weird” is:
𝗣(weird) = 𝗣(weird ∩ phish) + 𝗣(weird ∩ not phish)
The first of those two pieces can be found by rearranging an earlier conditional probability statement:
𝗣(weird ∩ phish) = 𝗣(weird | phish) × 𝗣(phish) = 0.8 x 0.01 = 0.008
For the second, let’s say that one of every 1000 real emails is “weird.” We also established that 𝗣(phish) = 0.01, meaning 𝗣(not phish) = 0.99. Therefore:
𝗣(weird ∩ not phish) = 𝗣(weird | not phish) × 𝗣(not phish)
𝗣(weird ∩ not phish) = 0.001× 0.99 = 0.00099
So, using both the Law of Total Probability and Bayes’ Theorem, we have:
𝗣(phish | weird) = 𝗣(phish) × 𝗣(weird | phish) / 𝗣(weird)
𝗣(phish | weird) = 0.01 × 0.8 / [𝗣(weird ∩ phish) + 𝗣(weird ∩ not phish)]
𝗣(phish | weird) = 0.01 × 0.8 / [0.008 + 0.00099] ≈ 0.8898
If your input probabilities are correct, you can be about 89% confident that the email you received was, in fact, spam.
Whether you know it or not, you routinely use conditional probability in everyday life. Bayes’ Theorem is a useful tool to help you navigate such situations — especially when the required judgments aren’t as immediate or intuitive. What are the chances my customer will want to buy a beach towel given that they put a swimsuit in their cart? How likely is it to rain if I see dark clouds in the sky?
Whether creating targeted advertising, estimating your daily commute time, or spotting a spam email, Bayes’ Theorem gives you the evidence you need to make the best decision.
If there’s a topic you’d like us to cover as part of Opex 101, let us know in the comments below!