13 minute read

In this post, we will explore conditional probability and how it relates to Bayes’ Theorem.

Conditional Probability

Conditional probability is the likelihood of an event or outcome occurring based on the occurrence of a previous event or outcome. Below is the equation for the conditional probability.

$P(A|B) = \frac{P(A \cap B) }{P(B)}$

where:

P(A ∩ B) is the probability that event A and event B both occur.

P(B) is the probability that event B occurs.

For example:

  • Event A is a person completing their projects on time in a year. There is 85% chance that this person will complete their work on time.
  • Event B is that this person receiving a raise at the end of the year. Raises are only given to to 45% of those that complete their work on time in a year.
  • P(complete work and raise) = P(raise|complete work) P(complete work) = 0.45 * 0.85 = 0.3825

Looks like working hard does pay off. Conditional probability implies there is a relationship between these two events, such as the probability as completing your work on time in a year, and receiving a raise. In other words, conditional probability depends on a previous result. In this case, it is the probability of completing your work on time.

Let’s take a look at a survey given to male and female students and asking what their favorite past times are. Below a prepared a function rng_survey() to generate the survey answers.

#helper function for our survey
rng_survey <- function(max, n){ 
  #max is the maximum number of surveys for a participant group (ie. Male)
  #n is the number of possible answers
  total = 0
  while (total != max){
    x <- sample(1:max, n, replace = TRUE)
    total = sum(x)
  }
  return(x)
}
#male = rng_survey(150,4)
#female = rng_survey(150,4)
#cat("male answers:", male, "\n") 
#cat("female answers:", female, "\n") 

#male answers: 7 19 26 98 
#female answers: 7 69 40 34 

Using the function, we create a data frame with the survey answers.

#create data frame for the survey responses
df <- data.frame(gender=rep(c('Male', 'Female'), each=150),
                 sport=rep(c('Exercise', 'Cooking', 'Reading', 'Television',
                             'Exercise', 'Cooking', 'Reading', 'Television'),
                              times=c(7, 19, 26, 98, 7, 69, 40, 34 )))
df
##     gender      sport
## 1     Male   Exercise
## 2     Male   Exercise
## 3     Male   Exercise
## 4     Male   Exercise
## 5     Male   Exercise
## 6     Male   Exercise
## 7     Male   Exercise
## 8     Male    Cooking
## 9     Male    Cooking
## 10    Male    Cooking
## 11    Male    Cooking
## 12    Male    Cooking
## 13    Male    Cooking
## 14    Male    Cooking
## 15    Male    Cooking
## 16    Male    Cooking
## 17    Male    Cooking
## 18    Male    Cooking
## 19    Male    Cooking
## 20    Male    Cooking
## 21    Male    Cooking
## 22    Male    Cooking
## 23    Male    Cooking
## 24    Male    Cooking
## 25    Male    Cooking
## 26    Male    Cooking
## 27    Male    Reading
## 28    Male    Reading
## 29    Male    Reading
## 30    Male    Reading
## 31    Male    Reading
## 32    Male    Reading
## 33    Male    Reading
## 34    Male    Reading
## 35    Male    Reading
## 36    Male    Reading
## 37    Male    Reading
## 38    Male    Reading
## 39    Male    Reading
## 40    Male    Reading
## 41    Male    Reading
## 42    Male    Reading
## 43    Male    Reading
## 44    Male    Reading
## 45    Male    Reading
## 46    Male    Reading
## 47    Male    Reading
## 48    Male    Reading
## 49    Male    Reading
## 50    Male    Reading
## 51    Male    Reading
## 52    Male    Reading
## 53    Male Television
## 54    Male Television
## 55    Male Television
## 56    Male Television
## 57    Male Television
## 58    Male Television
## 59    Male Television
## 60    Male Television
## 61    Male Television
## 62    Male Television
## 63    Male Television
## 64    Male Television
## 65    Male Television
## 66    Male Television
## 67    Male Television
## 68    Male Television
## 69    Male Television
## 70    Male Television
## 71    Male Television
## 72    Male Television
## 73    Male Television
## 74    Male Television
## 75    Male Television
## 76    Male Television
## 77    Male Television
## 78    Male Television
## 79    Male Television
## 80    Male Television
## 81    Male Television
## 82    Male Television
## 83    Male Television
## 84    Male Television
## 85    Male Television
## 86    Male Television
## 87    Male Television
## 88    Male Television
## 89    Male Television
## 90    Male Television
## 91    Male Television
## 92    Male Television
## 93    Male Television
## 94    Male Television
## 95    Male Television
## 96    Male Television
## 97    Male Television
## 98    Male Television
## 99    Male Television
## 100   Male Television
## 101   Male Television
## 102   Male Television
## 103   Male Television
## 104   Male Television
## 105   Male Television
## 106   Male Television
## 107   Male Television
## 108   Male Television
## 109   Male Television
## 110   Male Television
## 111   Male Television
## 112   Male Television
## 113   Male Television
## 114   Male Television
## 115   Male Television
## 116   Male Television
## 117   Male Television
## 118   Male Television
## 119   Male Television
## 120   Male Television
## 121   Male Television
## 122   Male Television
## 123   Male Television
## 124   Male Television
## 125   Male Television
## 126   Male Television
## 127   Male Television
## 128   Male Television
## 129   Male Television
## 130   Male Television
## 131   Male Television
## 132   Male Television
## 133   Male Television
## 134   Male Television
## 135   Male Television
## 136   Male Television
## 137   Male Television
## 138   Male Television
## 139   Male Television
## 140   Male Television
## 141   Male Television
## 142   Male Television
## 143   Male Television
## 144   Male Television
## 145   Male Television
## 146   Male Television
## 147   Male Television
## 148   Male Television
## 149   Male Television
## 150   Male Television
## 151 Female   Exercise
## 152 Female   Exercise
## 153 Female   Exercise
## 154 Female   Exercise
## 155 Female   Exercise
## 156 Female   Exercise
## 157 Female   Exercise
## 158 Female    Cooking
## 159 Female    Cooking
## 160 Female    Cooking
## 161 Female    Cooking
## 162 Female    Cooking
## 163 Female    Cooking
## 164 Female    Cooking
## 165 Female    Cooking
## 166 Female    Cooking
## 167 Female    Cooking
## 168 Female    Cooking
## 169 Female    Cooking
## 170 Female    Cooking
## 171 Female    Cooking
## 172 Female    Cooking
## 173 Female    Cooking
## 174 Female    Cooking
## 175 Female    Cooking
## 176 Female    Cooking
## 177 Female    Cooking
## 178 Female    Cooking
## 179 Female    Cooking
## 180 Female    Cooking
## 181 Female    Cooking
## 182 Female    Cooking
## 183 Female    Cooking
## 184 Female    Cooking
## 185 Female    Cooking
## 186 Female    Cooking
## 187 Female    Cooking
## 188 Female    Cooking
## 189 Female    Cooking
## 190 Female    Cooking
## 191 Female    Cooking
## 192 Female    Cooking
## 193 Female    Cooking
## 194 Female    Cooking
## 195 Female    Cooking
## 196 Female    Cooking
## 197 Female    Cooking
## 198 Female    Cooking
## 199 Female    Cooking
## 200 Female    Cooking
## 201 Female    Cooking
## 202 Female    Cooking
## 203 Female    Cooking
## 204 Female    Cooking
## 205 Female    Cooking
## 206 Female    Cooking
## 207 Female    Cooking
## 208 Female    Cooking
## 209 Female    Cooking
## 210 Female    Cooking
## 211 Female    Cooking
## 212 Female    Cooking
## 213 Female    Cooking
## 214 Female    Cooking
## 215 Female    Cooking
## 216 Female    Cooking
## 217 Female    Cooking
## 218 Female    Cooking
## 219 Female    Cooking
## 220 Female    Cooking
## 221 Female    Cooking
## 222 Female    Cooking
## 223 Female    Cooking
## 224 Female    Cooking
## 225 Female    Cooking
## 226 Female    Cooking
## 227 Female    Reading
## 228 Female    Reading
## 229 Female    Reading
## 230 Female    Reading
## 231 Female    Reading
## 232 Female    Reading
## 233 Female    Reading
## 234 Female    Reading
## 235 Female    Reading
## 236 Female    Reading
## 237 Female    Reading
## 238 Female    Reading
## 239 Female    Reading
## 240 Female    Reading
## 241 Female    Reading
## 242 Female    Reading
## 243 Female    Reading
## 244 Female    Reading
## 245 Female    Reading
## 246 Female    Reading
## 247 Female    Reading
## 248 Female    Reading
## 249 Female    Reading
## 250 Female    Reading
## 251 Female    Reading
## 252 Female    Reading
## 253 Female    Reading
## 254 Female    Reading
## 255 Female    Reading
## 256 Female    Reading
## 257 Female    Reading
## 258 Female    Reading
## 259 Female    Reading
## 260 Female    Reading
## 261 Female    Reading
## 262 Female    Reading
## 263 Female    Reading
## 264 Female    Reading
## 265 Female    Reading
## 266 Female    Reading
## 267 Female Television
## 268 Female Television
## 269 Female Television
## 270 Female Television
## 271 Female Television
## 272 Female Television
## 273 Female Television
## 274 Female Television
## 275 Female Television
## 276 Female Television
## 277 Female Television
## 278 Female Television
## 279 Female Television
## 280 Female Television
## 281 Female Television
## 282 Female Television
## 283 Female Television
## 284 Female Television
## 285 Female Television
## 286 Female Television
## 287 Female Television
## 288 Female Television
## 289 Female Television
## 290 Female Television
## 291 Female Television
## 292 Female Television
## 293 Female Television
## 294 Female Television
## 295 Female Television
## 296 Female Television
## 297 Female Television
## 298 Female Television
## 299 Female Television
## 300 Female Television

We convert the data frame to a table.

#create two-way table from data frame
survey_data <- addmargins(table(df$gender, df$sport))

survey_data
##         
##          Cooking Exercise Reading Television Sum
##   Female      69        7      40         34 150
##   Male        19        7      26         98 150
##   Sum         88       14      66        132 300

We can extract information from our table by calling a row and a column. For instance, let’s ask for the number of males that prefer cooking.

survey_data['Male', 'Cooking']
## [1] 19

Now we can ask the probability of being male given that they prefer cooking. We know that the probability of being male is 0.5. We can calculate the rest from the table.

P_male = 0.5
P_cooking_male = survey_data['Male', 'Cooking'] / survey_data['Male', 'Sum'] #probability of only males that prefer cooking

P_male_P_cooking_male = P_male * P_cooking_male
P_cooking = survey_data['Sum', 'Cooking'] / survey_data['Sum', 'Sum'] #probability of male and female that prefers cooking

P_male_cooking =  P_male_P_cooking_male / P_cooking

P_male_cooking #probability of being male given they prefer cooking
## [1] 0.2159091

Alternatively, we can use the table to easily answer the same problem.

survey_data['Male', 'Cooking'] / survey_data['Sum', 'Cooking'] 
## [1] 0.2159091

Next, we can ask the probability of being female given that they prefer reading.

survey_data['Female', 'Reading'] / survey_data['Sum', 'Reading']
## [1] 0.6060606

Bayes’ Theorem

Suppose that we have the same survey but male and female was not recorded by accident. How could we solve the probability of being male given that they prefer cooking? Let us assume from a previous survey we knew that 12.67% of males prefer to cook or is the probability of preferring to cook given that they are male. We can use something called Bayes’ Theorem,

$P(A|B) = \frac{P(B|A) P(A) }{P(B)}$.

Bayes’ Theorem describes the probability of an event based on prior knowledge of conditions that may be related to the event.

Knowing that the equation for conditional probability is

$P(A|B) = \frac{P(A \cap B) }{P(B)}$

then

P(A ∩ B) = P(A|B)P(B)

and

P(A ∩ B) = P(B|A)P(A).

We can solve for P(A ∩ B) with substitution to yield

$P(A|B) = \frac{P(B|A)P(A)}{P(B)}$.

#view table
survey_data
##         
##          Cooking Exercise Reading Television Sum
##   Female      69        7      40         34 150
##   Male        19        7      26         98 150
##   Sum         88       14      66        132 300
P_male = 0.5
P_cooking_male = 0.1267 #probability of prefering to cook given that they are male
P_cooking = survey_data['Sum', 'Cooking'] / survey_data['Sum', 'Sum'] #probability of male and female that prefers cooking

P_male_cooking =  (P_cooking_male * P_male) / P_cooking

P_male_cooking #probability of being male given they prefer cooking
## [1] 0.2159659

Let’s try to solve for the reverse, the probability of preferring to cook given that they are male, using Bayes’ Theorem.

P_male_cooking * P_cooking / P_male
## [1] 0.1267

Is there a way to solve for the probability of preferring to cook given that they are female using Bayes’ Theorem?

P_female = 1 - P_male #We have a binary choice and the probabilities sum up to 1
P_female_cooking = 1 - P_male_cooking
P_female_cooking * P_cooking / P_female
## [1] 0.4599667

Looking back at the table, we can use Bayes’ Theorem to solve this problem.

survey_data['Female', 'Cooking'] / survey_data['Female', 'Sum']
## [1] 0.46

Additional Resources