Today, when we were trying to calculate P(M is Spam|F_i), the actual number of spam messages and ham messages (we used 4000 and 6000 if I remember right) would only matter if they were selected apriori. However assuming that we obtained a completely random sample of messages and categorised them into ham and spam ourselves, the actual number of ham and spam messages will never affect our result of P(M is Spam|F_i). But if we select 4000 messages known to be spam, 6000 messages known to be ham, and then create the table (as seemed to be the case in our lecture), we would also require P(M is Spam) to be provided … the absolute probability of a random message being spam.