Wikipedia’s article Bayesian Search Theory describes an interesting approach to rescue missions using Bayes theorem. One of the mathematical results it quotes is the “update rule” by which revised probabilities are calculated. Although the equations are simple, I wanted to check them over for myself. I want to present that calculation here, along with a real neat way of remembering the Bayes equation without having to look anything up; using this ‘visualization’ method, I was able to write out the basic Bayes equation spontaneously even though I hadn’t seen it in at least a year.

First, we start with this simple Venn diagram:

Amazingly, this is all you really need to memorize. The rectangle “P” represents the entire probability space, and the circles represent conditions “A” and “B”. P(A) is the probability that condition “A” occurs and P(B) is the probability that condition “B” occurs (obviously both P(A) and P(B) are less than 1). The intersection of the two circles, which is shaded, represents the condition “both A and B occur”. The magic of Bayes theorem is to look at this situation in two equivalent ways and to deduce an equation:

We start with either A or B first – that is, either first we consider condition “A” and then we see if a simultaneous condition “B” coexists OR first we consider condition “B” and then we see if a simultaneous condition “A” coexists. The magic of Bayes rule is that these two quantities will always be equal because the probability of coexistence of these two conditions is the same regardless of which one we “consider” first. The probability of the first event being true can be represented by P(A)*P(B|A) and the probability of the second event being true can be represented by P(B)*P(A|B) – where P(X|Y) represents “the probability that condition X coexists if condition Y already exists”. So the first step in deriving Bayes rule is noting, as we did above, that these two quantities are equal: P(A)*P(B|A) = P(B)*P(A|B). Notice that the shaded area is not necessarily the same as P(A)*P(B), which would be true if A and B were independent events, and in a Bayesian sense iff P(A|B) = P(A) and P(B|A) = P(B) – that is, if the ratio of the shaded area to P(A) were equal to P(A) itself (and similarly for B).

So by the above argument we get P(A)*P(B|A) = P(B)*P(A|B) since the probability space of the shaded area is equal regardless of which of the two conditions we considered first. This is actually a remarkable formula, which is seen more dramatically in the following diagram:

Here P(A) is much smaller than P(B), and we see that P(A|B) is much smaller than P(B|A). Although we may know almost nothing about the size of the intersection area, we can always know that . Somehow that is just awesome.

From the equation P(A)*P(B|A) = P(B)*P(A|B), it’s relatively easy to calculate Bayes rule. Divide both sides by P(A) to get:

You remember, of course, that the denominator in Bayes rule is quite large. So what should be do to P(A) to make it ‘large’? Just look back at the Venn diagram and its obvious that ‘A’ could either happen with ‘B’ or without ‘B’ (in the Venn diagram the shaded area is ‘A’ happening with ‘B’). Remembering that the symbol ~ means ‘not’, this is English for P(A) = P(A|B)*P(B) + P(A|~B)*P(~B). Just to be clear, in English this would translate to ‘the probability of condition A is the equal to the probability of A coexisting with B or the probability of A NOT coexisting with B’. (Either A happens with B or A happens without B, that covers all possibilities). Now, finally, dividing by this quantity we get a common form of Bayes rule:

Terrific! Next for the Wikipedia example restated: If the likelihood of finding a missing airplane is P(found), and the probability of its being there is P(there), then we can rephrase the question “What is the likelihood of an airplane being there if it was not found?” as the probability statement “What is P(there|~ found)?”. Using simple substitution from Bayes rule above we therefore have:

P(there|~ found) =

The Wikipedia article gives probabilities p and q such that “probability p = a given area contains the wreck” and “probability q = successfully detecting the wreck if it is there” (don’t get confused by small p and capital P). By our notation, p = P(there), q = P(found | there). It follows that 1-q = P(~found|there). Also it’s obvious that you can’t find something if it’s not there, so P(~found| ~there) = 1. So by making these substitutions we get:

P(there | ~ found) = , which is what Wikipedia states, since obviously the likelihood of a plane being there has dropped. The chance that the plane is actually located somewhere else, or in other words the chance that the plane is actually NOT there, however has increased precisely by 1 – P(there | ~found), or more precisely

P(~ there | ~ found) = 1 – P(there | ~found).

Doing the simple calculation this gives us:

P(~there | ~found) = =

QED!

Great explanation, to the point and without a lot of confusing and unnecessary complication often introduced in Bayesian texts.