# The Birthday Problem: What are the chances?

When writing a birthday card for one of my friends, I noticed a small coincidence: this person had the same birthday as someone else I knew. Since leaving the online world and returning to a slowly-adjusting post-covid University life, I was looking forward to my first night back out being a friend’s birthday party. But what were the odds of someone else also having theirs, too? Lets take a look at the Maths…

## The “Birthday Problem”

In a group of 23 people, the same number of people on a Football pitch in a match, the chances of 2 people having the same birthday exceeds 50%. At first glance, it may be surprising to hear this, since our intuition would tell us that the number of people required is much larger.

Using Combinatorics, the probability is calculated as follows:

By letting: $P(A)$=probability of shared birthday, and $P’(A)$=probability of no shared birthday.

Starting with person 1, the probability of a birthday is 1, or $\dfrac{365}{365}$.

Person 2 has a probability of not having the same birthday as person 1 of $\dfrac{365}{365} \times \dfrac{364}{365} =0.997$ (since one of the days is already taken), a very high chance!

Person 3 has a probability of not having the same birthday as neither person 1 or 2 of $\dfrac{365}{365} \times \dfrac{364}{365} \times \dfrac{363}{365}$, since 2 days are now taken.

Repeating this for 23 people, we have:

$$P'(A) = \dfrac{365}{365} \times \dfrac{364}{365} \times \cdots \times \dfrac{343}{365} = 0.497 = 49.7\%$$

This is the probability of 2 people not having the same birthday, and since the 2 events are mutually exclusive, the probability of 2 people who do have the same birthday is:

$$P(A) = 1-0.497=0.503 = 50.3\%$$

Note: this does not include certain variables such as the probability of a group including twins, birthdays on leap years, as well as one birthday being more probable than another (In practice, September is the most popular month for birthdays).

For more information on popular birthdays, check out this interactive birthday heat map.

## A generalised version

With $n$ people and $365$ days in a year, the probability $P'(A)$ then becomes:

$$365 \times \dfrac{364}{365} \times \dfrac{363}{365} \cdots \dfrac{365-(n-1)}{365} = \dfrac{365!}{((365-n)!\times 365^{n-1})},$$

$$P'(A) = 1 – \frac{38 093 904 702 297 390 785 243 708 291 056 390 518 886 454 060 947 061}{75 091 883 268 515 350 125 426 207 425 223 147 563 269 805 908 203 125} \approx 0.503297,$$ for $n = 23$ people.

When plugged into a calculator, we receive a math error, because this is the sum it is calculating!

## More than 2 people?

With the same conditions as before, how many people are required in a group for there to be a 50% chance of 3 people sharing a birthday? Or 4 people?

Where $i$ is the number of people sharing the same birthday, and $n$ is the minimal number of people required, this forms the beginnings of the sequence:

$$\begin{array} {|r|r|r|r|} \hline \\ i & 2 & 3 & 4 \\ \hline \\ n & 23 & 88 & 187 \\ \hline \end{array}$$

This can be read as: The minimal number of people required in a group for there to be a 50% chance of 3 people sharing the same birthday is 88, and so on.

## Same birthday as you?

When first asked the question: “What are the chances?”, we normally think about whether we have been in a situation where someone has had the same birthday as us, which doesn’t happen often. In fact, using the same number of people $n=23$, the chances of someone having a specific match with your birthday is only around $6.1\%$! (The answer usually expected in the beginning!)

Comparing this to the probability above:

The reason why $p(n)$ is so much higher than $q(n)$ comes down to combinatorics. When looking at matching pairs, the total number of pairs needed to be checked is:

$$\begin{pmatrix} 23 \\ 2 \end{pmatrix} = \dfrac{23\times22}{2}=253$$

Or, 23 people, choosing 2 for a match. 253 pairs to check is much greater than just checking the birthdays of 23 different people, so a roughly 50% chance for any of the 253 pairs to be a match now seems more reasonable.

Here are some complete graphs, where the lines represent all the possible pairings that we need to check, for different values of $n$:

## The Birthday Attack

This funny coincidence of birthdays may seem harmless, however when 2 items in a set are run through an algorithm and produce the same output, this can cause a “collision”. This is a problem in the cryptographic world and is related to the “Birthday attack”.

A function $f$ is non-injective when there is an example of $x_1,x_2$ that are unique, but $f(x_1)=f(x_2)$.

Documents, passwords and many other online texts are coded using a hash function. When you login to a website, and type your password: “Password12345”, the website does not store this version of your password. Rather, it is put through a function and can be saved as something that looks like: b39c765a983c947b293aa894 – not a very fun password to remember!

This is meant to be an irreversible function. Given the hash code it is impossible to find the original plaintext. However, since there are only a limited amount of hash codes, more than one input can have the same value, making it non-injective. If such a collision is found, then we could potentially use this to create a new input (like a password for example) that has the same hash. Luckily, this cannot be done just by hand!

The problem is that the number of documents needed for a collision is fewer than what we would expect, because the number of possible pairs of documents that could have the same hash is much larger than the actual number of documents, just like the fewer number of people for a chance of having the same birthday! Producing a number of documents and checking all the pairs of their hash codes is known as a Birthday attack.

## An example applied to the real world

Say a person signs a digital document, $x_1$ that is transformed into a certain hash key. Then, this document is tweaked by someone continuously to make it look like something completely different than it was initially. If it is changed enough times, it can be rerun through the hashing algorithm and produce the same hash output, being our $x_2$ such that $f(x_1)=f(x_2)$. The person can be made to look like they have signed something different!

So how many people do you know that have the same birthday as you? I only know one person. What about the same day, and year too? Exponents can drastically change our intuition, where one probability may seem too low or much higher than originally estimated. If you do however come across matching birthdays in a small group, just think – what are the chances?

Graph of percentage clashes to 3dp: https://datagenetics.com/blog/february72019/index.html

More Maths behind the Birthday Problem: https://mathworld.wolfram.com/BirthdayProblem.html