Celebrate the year of the dog

A little over one year ago, this blog marked the beginning of the year of the rooster in this blog post. Now it’s time to celebrate the year of the dog.

2018 is the year of the dog

The Lunar New Year actually falls on February 16 this year. To be precise, the latest year of the dog will span from February 16, 2018 to February 4, 2019. We mark the New Year by highlighting a math problem that is related to the Chinese zodiac signs. In fact, this problem was also discussed in the blog post on the year of the rooster.

The Problem

The problem is this. There are 12 animal signs in Chinese zodiac – Rat, Ox, Tiger, Rabbit, Dragon, Snake, Horse, Goat, Monkey, Rooster, Dog and Pig. You ask people repeatedly their animal signs until all 12 signs have been represented. How many people do you have to ask in order to have encountered all 12 animal signs? We assume that the 12 animal signs are equally represented. This assumption makes the math more tractable.

The number of people you ask is a random quantity. Obviously the number of people you have to ask is at least 12. It is likely that you have to ask many more than 12 people. In the previous post, we discuss this problem in two different ways – through simulations and using a math formula based on the coupon collector problem. In this post we view this problem as an occupancy problem and solve it using the method of Markov chains.

The Occupancy Problem

Imagine there are 12 cells (e.g. boxes or urns). Balls are thrown into the cells one at a time at random (assuming that each ball always lands into one of the 12 cells). On average, how many balls do we need to throw so that each cell has at least one ball? In other words, how many balls do we need to throw so that there are no empty cell? This is one form of the occupancy problem. The occupancy problem being described here is a natural reformulation of the problem about asking people until all 12 animal signs are represented.

The problem can also be described as sampling with replacement. Put 12 balls labeled 1 through 12 in an urn. Draw balls at random one at a time with replacement. How many selections do we need to make so that each of the 12 numbers are drawn at least once?

Of course, there is nothing special about 12 cells. The number of cells in the occupancy problem can vary. Consider the problem of 6 cells. So balls are thrown until all 6 cells are occupied. Or an urn has 6 balls labeled 1 through 6. Then balls are drawn from the urn until all each of the 6 numbers has been chosen at least once. There is another way of looking at the 6-cell occupancy problem – rolling a die repeatedly until each face has appeared at least once.

Markov Chain Approach

There are more than one way to solve the occupancy problem (two ways are discussed in the previous posts).

The method to highlight here is to use Markov chains. A Markov chain is a series of random steps. Each random step is identified by a state. Each state is dependent only on the preceding state and not on the states prior to the preceding state. We use X_n to denote the state after the nth random step or at time n.

In the 12-cell occupancy problem, a state is the number of occupied cells after a ball is thrown. For example, X_0=0, which is the initial state (we assume that before throwing the first ball, the cells are empty). Then X_1=1 (after throwing one ball, one cell is occupied). After throwing the second ball, the number of occupied cells is random. For example, it could be that X_2=1 (if the second ball goes into the occupied cell holding the first ball) or it could be that X_2=2 (if the second ball goes into an empty cell). The first scenario has probability 1/12 and the second scenario has probability 11/12. Similarly, we can list out all the scenarios for X_3 (the number of occupied cells after throwing the third ball) and their probabilities. In fact, it will be much more efficient if we describe all these probabilities in a matrix.

\mathbf{P} =       \bordermatrix{ & 0 & 1 & 2 & 3 & 4 & 5 & 6 & 7 & 8 & 9 & 10 & 11 & 12\cr        0 & 0 & 1 & 0  & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \cr        1 & 0 & 1/12 & 11/12  & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \ \cr        2 & 0 & 0 & 2/12  & 10/12 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \cr        3 & 0 & 0 & 0  & 3/12 & 9/12 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \cr        4 & 0 & 0 & 0  & 0 & 4/12 & 8/12 & 0 & 0 & 0 & 0 & 0 & 0 & 0 \cr        5 & 0 & 0 & 0  & 0 & 0 & 5/12 & 7/12 & 0 & 0 & 0 & 0 & 0 & 0 \cr        6 & 0 & 0 & 0  & 0 & 0 & 0 & 6/12 & 6/12 & 0 & 0 & 0 & 0 & 0 \cr        7 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 7/12 & 5/12 & 0 & 0 & 0 & 0 \cr        8 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 0 & 8/12 & 4/12 & 0 & 0 & 0 \cr        9 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 0 & 0 & 9/12 & 3/12 & 0 & 0 \cr         10 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 10/12 & 2/12 & 0 \cr        11 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 11/12 & 1/12 \cr       12 & 0 & 0 & 0  & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 \cr   } \qquad

The above matrix looks kind of huge. But there is an order behind it. The Markov chain of the 12-cell occupancy problem has 13 states (0, 1, 2, …, 12). In this case, a state refers to the number of occupied cells after a ball is thrown into the cells. So X_n is the number of occupied cells after the nth ball is thrown. For convenience, we can also think of the subscripts n as time. The type of Markov chains discussed here would be called discrete-time Markov chains.

The above matrix \mathbf{P} is called the transition probability matrix. It contains the probabilities of transitioning from one state into various states. Here’s how to read the matrix. Pick a row, say the row labeled 2. This refers to the situation of the process being at a point when 2 of the 12 cells are occupied. The next state can be 2, meaning that the next ball goes into one of the two occupied cells. Since there are two occupied cells, the probability is 2/12. The next state can be 3, meaning that the next ball goes into one of the empty cells. Since there are 10 cells not occupied, the probability is 10/12.

In general, this matrix tells us that if the process starts at state i, the process will remain at state i in the next time period with probability i /12 and the process will transition to i+1 with probability 1-i/12.

The state 12 is an interesting one. When all the cells are occupied, any additional balls thrown at them will still mean that the cells are occupied. Thus when the process reaches state 12, the process remains at 12 indefinitely. A state is called an absorbing state if the process does not leave that state once that state is entered. Thus state 12 is an absorbing state in 12-cell occupancy problem. The other states (0 to 11) are called transient states.

The probabilities in the matrix \mathbf{P} are called one-step transition probabilities because these probabilities tell us how the process transitions from a given step to the next step. These probabilities are notated by P_{ij}, signifying the probability that the state in the next period is j given that the process begins in state i. For example, P_{5,6}=7/12 and P_{10,11}=2/12. What about the two-step transition probabilities P_{ij}^2? The quantity P_{ij}^2 is the probability that the process will be in state j after two steps given that the process is currently in state i. The idea generalizes quite naturally. The quantity P_{ij}^k, a k-step transition probability, is the probability that the process will be in state j after k steps are taken given that the process is currently in state i.

To compute the k-step transition probabilities, all we need to do is to raise the matrix \mathbf{P} to the kth power. A matrix calculator will be helpful. Using this online matrix calculator, we raise \mathbf{P} to the 4th power, the following shows the non-zero elements in the first row.

    \displaystyle P_{0,1}^4=\frac{1}{20736}=0.00048

    \displaystyle P_{0,2}^4=\frac{165}{20736}=0.00796

    \displaystyle P_{0,3}^4=\frac{2750}{20736}=0.13262

    \displaystyle P_{0,4}^4=\frac{9900}{20736}=0.47743

    \displaystyle P_{0,5}^4=\frac{7920}{20736}=0.38194

After throwing 5 balls into 12 empty cells, the above probabilities tells us the likelihoods of the number of occupied cells. It is very unlikely to have 1 or 2 occupied cells. The most likely would be 4 or 5 occupied cells. To determine the likelihoods after throwing 6 balls into 12 empty cells, raise \mathbf{P} to 6 and so on.

For a detailed discussion on calculation of k-step transition probabilities, see this blog post.

Solving the Occupancy Problem

We now give an indication on how to solve the 12-cell occupancy problem (or the 12 animal signs problem). Recall that state 12 is an absorbing state. Now consider the matrix obtained by removing state 12 from \mathbf{P}. Call this new matrix Q. It is a 12 x 12 matrix. Let I be the 12 x 12 identity matrix. Compute the matrix I-Q. Then use an online matrix calculator to find the inverse matrix of I-Q. Let (I-Q)^{-1} be the inverse. The sum of the all the elements in the first row of (I-Q)^{-1} is the average number of steps that the Markov process takes to reach state 12. The following is the sum of the elements on the first row of (I-Q)^{-1}.

    \displaystyle 1+\frac{12}{11}+\frac{6}{5}+\frac{4}{3}+\frac{3}{2}+\frac{12}{7}+2+\frac{12}{5}+3+4+6+12=37.2385

On average, we need to throw 37.24 balls into the 12 cells to have no empty cells. With respect to the 12 animal signs, on average, it requires sampling 37.24 people to have encountered all 12 signs.

The matrix I-Q is called the fundamental matrix of the Markov chain represented by the matrix \mathbf{P}. The method is detailed in this blog post. That blog post also discusses the occupancy problem in some details.

The discussion here is a roundabout way and a long winded way to solve the coupon collector problem, which is discussed in this blog post from a year ago. However, the example discussed here is an excellent introduction to Markov chains. Further information can be found in the following blog posts.

\text{ }

\text{ }

Dan Ma math blog
Daniel Ma math blogs

Dan Ma math
Daniel Ma math

Dan Ma mathematics
Daniel Ma mathematics

\copyright 2018 – Dan Ma

Advertisements

Using a 4000-year old clay tablet to solve math problems

This post discusses the clay tablet that is known as Plimpton 322. This tablet gives us a glimpse of the powerful mathematics practiced in ancient Babylonia almost 4000 years ago. We aim to give a sense of why Plimpton 322 is fascinating. The math in the tablet informs us of the past as well as informing us of the present. We also work some trigonometry problems using Plimpton 322.

What is Plimpton 322?

Figure 1 – Plimpton 322 (Credit: UNSW/Andrew Kelly)

The dimensions of the tablet are 8.8 cm by 12.7 cm (3.5 inches by 5 inches), about the size of a small pocket calculator. It was purchased by the New York publisher George Arthur Plimpton in 1923 from Edgar J. Banks and was donated to Columbia University upon Plimpton’s death in 1936. Henceforth, the tablet had been known as Plimpton 322, signifying that it is the 322nd item in Plimpton’s collection.

According to Edgar J. Banks, the tablet came from a location near the ancient city of Larsa (modern Tell Senkereh) in Southern Iraq. Eleanor Robson, an Oriental scholar at the University of Oxford, estimated that Plimpton 322 was created around 1800 BC in Babylonia, more specifically about six decades before Larsa fell to Hammurabi of Babylon in 1762 BC. Thus Plimpton 322 dated back to the Old Babylonian period in Mesopotamia about 4,000 years ago.

What is in Plimpton 322?

The Plimpton 322 was written in cuneiform script. The numbers contained in the tablet are sexagesimal numbers (base 60). It was at first assumed to be just another Babylonian ledgerbook. In the 1940s, Otto Neugebauer, a historian of ancient science at Brown University, and his assistant Abraham Sachs found that Plimpton 322 actually contains interesting mathematical contents. The entries in the tablet are essentially Pythagorean triples, i.e. the integer solutions to the equation a^2+b^2=c^2.

In order to appreciate Plimpton 322, let’s look at how the contents of the tablet are structured. The front side of Plimpton 322 has 15 lines of numbers displayed in four columns. The line at the top above the numbers contains some labels. The rightmost column contains the row numbers (or line numbers) from 1 to 15. The middle two columns contain the short side s and the hypotenuse d of 15 right triangles. In other words, the second and third columns of Plimpton 322 are two sides a right triangle such as the one shown below.

Figure 2 – A right triangle

The third column of the tablet shows d, the hypotenuse of a right triangle (or diagonal). The second column shows s, the short side of a right triangle. The long side l of a right triangle is not shown. The first column in Plimpton 322 is the square of a ratio, which can be one of two interpretations, either the square of \frac{d}{l} (diagonal over long side) or the square of \frac{s}{l} (short side over long side). The following diagram shows the descriptions of the four columns.

Figure 3 – The structure of Plimpton 322

What is Special about Plimpton 322?

The discovery made by Otto Neugebauer and his assistant in the 1940s was an important one. The numbers in Plimpton 322 are what are now called Pythagorean triples. It gives the short side and the diagonal (hypotenuse) of 15 right triangles. The long sides of the right triangles are not shown. As we will see below, the 15 right triangles have steadily decreasing slopes. The Sumerians in the Old Babylonian period knew about the Pythagorean theorem over 1,000 years before the time of Pythagoras!

Since the discovery made by Otto Neugebauer, Plimpton 322 was a subject of extensive research by mathematicians. Obviously mathematicians are intrigued by the connection of a 4000-year tablet with modern mathematics. Because of the intricate mathematical interpretations they made of the tablet, many mathematicians thought highly of the tablet. For example, the author of the tablet must be a mathematical prodigy or a professional mathematician, doing high level research in the Old Babylonian Period.

However, there are opposing views. Eleanor Robson does not view Plimpton 322 as the work of a math prodigy or professional mathematician. Her view of Plimpton 322 is more mundane. She believes that Plimpton 322 was created as teaching aid with a purpose of generating problems involving right triangles and reciprocal pairs. Links are provided below for research stating these different points of view.

Looking at Plimpton 322 in Decimal Numbers

High level math research or merely teaching aid, the fact that the tablet contains Pythagorean triples is fascinating and interesting from a mathematical point of view. Let’s continue to examine the tablet. We give a small demonstration that it can be used for working trigonometry problems. The numbers in the tablet are sexagesimal numbers (base 60). To make things easy for us, the following table shows the decimal conversion of Plimpton 322, taken from the Wikipedia entry on Plimpton 322.

Table 1 – Decimal Conversion of Plimpton 322

Squared Ratios Short Side Diagonal Row
(1).9834028 119 169 1
(1).9491586 3367 4825 2
(1).9188021 4601 6649 3
(1).8862479 12709 18541 4
(1).8150077 65 97 5
(1).7851929 319 481 6
(1).7199837 2291 3541 7
(1).6927094 799 1249 8
(1).6426694 481 769 9
(1).5861226 4961 8161 10
(1).5625 45 75 11
(1).489417 1679 2929 12
(1).4500174 161 289 13
(1).430289 1771 3229 14
(1).3871605 56 106 15

The first column is either the square of the diagonal over the long side or the square of the short side over the long side. For example, the long side in Row 1 is 120. The square of 169/120 is 1.9834. The square of 119/120 is 0.9834. To help us work problems, we expand the table with three more columns.

Table 2 – Decimal Conversion of Plimpton 322 (Expanded)

Squared Ratios Short Side Diagonal Row Long Side S/L D/L
(1).9834028 119 169 1 120 0.99167 1.40832
(1).9491586 3367 4825 2 3456 0.97425 1.39612
(1).9188021 4601 6649 3 4800 0.95854 1.38521
(1).8862479 12709 18541 4 13500 0.94141 1.37341
(1).8150077 65 97 5 72 0.90278 1.34722
(1).7851929 319 481 6 360 0.88611 1.33611
(1).7199837 2291 3541 7 2700 0.84852 1.31148
(1).6927094 799 1249 8 960 0.83229 1.30104
(1).6426694 481 769 9 600 0.80167 1.28167
(1).5861226 4961 8161 10 6480 0.76559 1.25941
(1).5625 45 75 11 60 0.75 1.25
(1).489417 1679 2929 12 2400 0.69958 1.22042
(1).4500174 161 289 13 240 0.67083 1.20417
(1).430289 1771 3229 14 2700 0.65593 1.19593
(1).3871605 56 106 15 90 0.62222 1.17778

The three additional columns are the long side and the ratios of Short over Long and Diagonal over Long. The square of these two ratios would be the first column of the table. The 6th column (S/L) is the slope of the right triangle in Figure 1. So the 15 right triangles in the table have steadily decreasing slopes. The angle between the diagonal and the long side goes from 44.76 degrees (in Row 1) to 31.89 degrees (in Row 15).

How did the creator of Plimpton 322 calculate the long side l in Table 2? For example, in Row 2, the short side is 3367 and the diagonal side is 4825. Modern calculation for the long side would be the square root \sqrt{4825^2-3367^2}=\sqrt{11943936}=3456. How was 3456 obtained for the creator of Plimpton 322? It turns out that the triples (s, l, d) in the tables are of special form. The three sides can be expressed as:

    s=p^2-q^2

    d=p^2+q^2

    l=2 \times p \times q

such that p and q are integers. The ingenuity is that the special right triangles obtained in this fashion can be used for solving trigonometric problems as demonstrated below.

Working Examples

Solve for the unknown side for each of the following two right triangles A and B.

Figure 4 – Solve for the unknown sides using Plimpton 322

Obviously the triangles are not drawn to scale. They are only meant to convey the problems. We show how to use Plimpton 322 to estimate x and y.

In triangle A, we are given the diagonal and the long side. Immediately we can compute the ratio D/L=190/145=1.310344828. This ratio is closest to the D/L ratio in Row 7 in Table 2. We use the right triangle in Row 7 as the reference triangle, i.e. the triangle in Row 7 and triangle A are (approximately) congruent. The ratios of the short side to the long side for both triangles should be approximately identical.

    \displaystyle \frac{x}{190}=\frac{2291}{3541}

Solving for x gives 122.9285513. The modern approach would be to use the Pythagorean theorem with the help of a calculator in taking square root. Thus the exact answer is x=\sqrt{190^2-145^2}=\sqrt{15075}=122.7802916. Of course, this approach was not possible in the Old Babylonian period as the concept of square root was not known at the time.

In triangle B, the short side and the long side are given. It follows that the square of the diagonal equals the square of the short side plus the square of the long side (this is known as the Pythagorean theorem to us but the relationship is also known to Babylonian users of Plimpton 322). Compute the following ratio.

    \displaystyle \frac{56^2+79^2}{79^2}=\frac{9377}{6241}=1.502483576

The above result is identical to the square of the ratio of the diagonal over the long side. Then compare this result to the first column of Table 2 (or Table 1). The closest is the number 1.489417 in Row 12. Then use the right triangle in Row 12 as the reference triangle. Thus the two triangles are approximately congruent.

    \displaystyle \frac{y}{79}=\frac{2929}{2400}

Solving for y gives 96.4291667. The answer from a modern approach would be y=\sqrt{56^2+79^2}=\sqrt{9377}=96.83491106.

Why the Examples are Special

Both answers using Plimpton 322 are quite close to the exact answers, even though the discrepancies are significant (0.148 for x and 0.422 for y). However the problem solving using Plimpton 322 is indeed special. Essentially we are solving trigonometric problems without using trigonometry, i.e. using angles and sine and cosine functions. In the Old Babylonian period, there was no concept of angles and there certainly was no trigonometry as we know it today. Hipparchus (190-120 BC), a Greek astronomer, geographer, and mathematician, is considered to be the father of trigonometry. He lived 1,600 years after the creation of Plimpton 322!

Note that the modern answer for x is \sqrt{15075} and for y is \sqrt{9377}. These two square roots are the results of applying the Pythagorean theorem. Plimpton 322 allows taking square root without using square root! Pythagoras (570-495 BC) lived more than a thousand years after the creation of Plimpton 322. So using a cheat sheet from 1800 BC, we can solve trigonometric problems without using methods that only came thousand or more years later.

Recent research by Daniel Mansfield and N. J. Wildberger compared the methods of using Plimpton 322 to using the well-known sine table created by the Indian astronomer-mathematician Madhava (1340–1425 AD), over 3,000 years after Plimpton 322. Their problems are similar to the examples given here. The approach using Plimpton 322 produces much more accurate answers than the approach of using the sine table of Madhava. It is amazing that an 1800 BC “trigonometric” table beats a trigonometric table that came 3,000 later! If the Babylonians were indeed using Plimpton 322 as a trigonometric table, then it preceded Hiapparchus’ table of chords by about 1,600 years.

The accuracy of using Plimpton 322 is highly dependent on the given sides of the right triangles, i.e. the approach is accurate only if the squared ratios are close to the ratios in Plimpton 322. However, Mansfield and Wildberger showed that the usefulness of Plimpton 322 can be extended using interpolation (the ancient Babylonians were big on interpolation).

The calculation demonstrated here is only scratching the surface. What we have shown is just a small demonstration of the mathematical specialness of Plimpton 322. The examples we work are in no way a suggestion that it is how the stone tablet was used in the Old Babylonian period.

The math in Plimpton 322 is wonderful and exciting. There is actually quite a bit of controversy about the tablet. What purpose did the tablet serve at its time? There are divergent views just on this questions alone.

According to Mansfield and Wildberger, Plimpton 322 not only replaces Hiapparchus’ table of chords as the world’s oldest trigonometric table, it is the world’s only completely accurate trigonometric table. For more information, see the article by Mansfield and Wildberger. Mansfield and Wildberger believe that Plimpton 322 would have been used in engineering calculations for the construction of palaces, canals or perhaps the Hanging Gardens of Babylon.

According to Eleanor Robson, Plimpton 322 was merely a teaching aid for problems involving right triangles (article). According to Robson, ancient mathematical texts and artifacts such as Plimpton 322 must be viewed in light of their historical and cultural contexts (in addition to the mathematical). The mathematics contained in Plimpton 322 should not be examined in isolation.

Though the modern mathematical interpretations of Plimpton 322 may have no relation to its original use, it is undeniable that the mathematics in Plimpton 322 is fascinating. It makes for good material for any math teacher’s lesson plans. It ought to be reassuring to students that the math topics that they deal with were also practiced by students 4,000 years ago. The proof is in the tablet called Plimpton 322.

Reporting of Plimpton 322 is easily found on the Internet (examples: here and here). The wikipedia entry on Plimpton 322 is a good source of information, as is the entry on Edgar J. Banks.

\text{ }

\text{ }

Dan Ma math blog
Daniel Ma math blogs

Dan Ma math
Daniel Ma math

Dan Ma mathematics
Daniel Ma mathematics

\copyright 2017 – Dan Ma

Should dictionary words be used in passwords?

It is commonly advised that dictionary words should not be used when forming passwords. We would like to make the case that dictionary words can be used as long as the words are randomly chosen. This post illustrates how this may be done.

We pick 5 words at random from the following dictionary.

The idea is to choose 5 pages at random. Then choose a word at random from each page. There are 1,317 pages. We calculate the Excel function =RANDBETWEEN(1, 1317) 5 times to generate the following random numbers, each of which is considered a page number in the dictionary.

    562, 1292, 397, 857, 1171

Assuming that there are around 50 words in a page, calculate the Excel function =RANDBETWEEN(1, 50) and generate the following random numbers.

    40, 8, 19, 13, 29

Thus the first random word is the 40th word in the 562nd page in the dictionary, the second word is the 8th word in the 1292nd page in the dictionary and so on. The 5 random words are:

    idiotic, wideopen, evulsion, pinhead, theodolite

Putting these 5 words in a string produces the following password, which is 41-character long.

    idioticwideopenevulsionpinheadtheodolite

How secure is this password? The 5 words are selected at random from a fairly large dictionary. It has 1317 pages. Assuming 50 words per page, the dictionary would have around 65,000 words. According to the multiplication principle, there would be 65000^5=1.16 \times 10^{24} many ways to choose 5 words from this dictionary. This is 1 followed by 24 zeros, which is 1 septillion. When 1 is followed by 12 zeros, the result is 1 trillion. So 1 followed by 24 zeros is the same as 1 trillion times 1 trillion.

So a brute force dictionary attack would have to cover the universe of these 1 septillion 5-word strings. To get a sense of how big 1 septillion is, try this scenario. For a computer than can check 1,000 5-word strings per second, it will take over 1 million years to exhaust all the 1 septillion 5-word strings. Such a brute force attack may be more suitable for a parallel computing project that involves a massive number of computers than for a cyber criminal who has only a limited number of computers. Examples of parallel computing projects include the ones for searching for the largest known prime number (one example is GIMPS – Great Internet Mersenne Prime Search).

The words have to be chosen at random for this approach to work. If the words are based on movie titles, sport team names, names of celebrities and other types of familiar proper nouns as well as idiomatic phrases, then the universe of the word strings would be much smaller, maybe 20,00 or 30,000. In relation 1 septillion, 30,000 is in effect zero. The word strings from this tiny universe would be vulnerable to brute force attack.

Of course, the security of the random 5-word strings can be further enhanced. Use more random words, for example. Another possibility is to make them case sensitive. The above 41-character string can become the following:

    iDIoTIcwideoPenevuLsIonpiNhEADthEodoLItE

Another possibility is to add numeric characters and special characters ($, *, # etc).

Of course, the password will be harder to remember if it is made case sensitive (especially if the upper cases letters are chosen at random). So a possible compromise is to make the first letter of a word upper case just to satisfy the case sensitivity requirement of many systems and websites along with throwing in some numbers and special characters. Simply add more random words for enhanced security.

In general, the approach of using multi-word phrase should be taken with care. The 5-word string that is demonstrated above requires some effort to produce – randomly selecting pages in the dictionary and randomly selecting one word from each selected page. I actually use a function in Excel to generate the random numbers to locate the pages. Instead, I can randomly flip through the pages. For some, that may still be too much effort. The danger is that someone may get lazy and simply use familiar proper nouns like favorite movies and sport teams such as the following:

    PirateoftheCaribbeanLALakers

Instead, the following is a better alternative.

    MfmiPotCaIaadhLALf

The above string is taken from the first letters of the sentence “My favorite movie is Pirate of the Caribbean and I am a die hard LA Lakers fan”. It is an 18-character password that is taken from a memorable phrase. The resulting password is definitely much more secure than stringing the movie title and the basket team name together. See here for information on the approach of using a memorable phrase or several phrases.

\text{ }

\text{ }

\text{ }

Dan Ma math
Daniel Ma math

Dan Ma mathematics
Daniel Ma mathematics

\copyright 2017 – Dan Ma

Strength in numbers

It is commonplace for frequent users of the Internet to have multiple online accounts that require passwords. Advice and guidelines abound on the Internet on picking safe and strong passwords. They are usually common sense guidelines – e.g. it has to be long enough and it should not be easily guessed. Sometimes a long enough password may not be safe and secure. The word password is an 8-letter word but sits on top of just about every list of no-no passwords. On the other hand, the 8-character string such as kHgfDsAa is a reasonable choice.

Recently it was reported in Cnet.com that security researcher Troy Hunt has released a searchable tool that taps the database of previously compromised passwords (here’s the Cnet.com article and here’s the tool). There are 306 millions of them. Supposedly the utility of the tool is that no one should use the passwords in this list. The thinking is that 306 million is a large number, and since it is such a large number, the search database would be a public service.

A counting exercise is sometimes needed in order to get a proper perspective on password strength. How big is 306 million? Consider the total number of 8-character strings. It would be 26^8 which is 208,827,064,576. This is 208 billion or 208,827 million, about 682 times of 306 million. The universe of 8-letter passwords is 682 times bigger than this searchable database tool. This universe of 208 billion 8-letter passwords is only for lower case passwords. If including case sensitive passwords, this would be even a much bigger universe. It can be even bigger by adding the possibility of numeric characters and special symbols. The following table gives the numbers of possible passwords from lengths of 8 to 12 (just letters, case insensitive).

Number of case insensitive passwords

Length Total Total
8-letter 26^8 208,827,064,576
9-letter 26^9 5,429,503,678,976
10-letter 26^{10} 141,167,095,653,376
11-letter 26^{11} 3,670,344,486,987,780
12-letter 26^{12} 95,428,956,661,682,200

It is a good idea to use the searchable tool. But bear in mind that any password that is reasonably long and reasonably strong is likely not to be in this list of 306 million. Note that the number of 12-letter case insensitive (letter only) passwords is 95,428 trillion! The passwords represented in this table alone would dwarf the 306 million in this searchable database.

Here’s a peculiar way to find strong passwords. This scheme is to produce 26-letter passwords such that every letter is known and is fixed! In fact, the first letter of the password is the first letter in the English alphabets, the second letter of the password is the second letter of the English alphabets and so on. The length of the password is long but every letter is fixed. This scheme is discussed in this blog post. This universe of passwords is not as big as the ones in the above table. But it is a big enough collection of possibilities that it is all but impossible to hack without computer help. There are 67,108,864 many different possibilities (over 67 million). How does this scheme work? Why is it that every letter is known but the passwords can be strong?

Curious? Think about it or go to this blog post. This particular scheme is a way to learn the concept of binomial distribution. Any one who understands this scheme understands binomial distribution.

Having to come up with multiple passwords for multiple online accounts is a fact of life in the age of the Internet. Having a good way to generate secure passwords is critical. Keeping track of all the passwords is definitely a challenge. Often times, what is overlooked is that thinking about passwords is a good way to get close to the mathematics of counting.

\text{ }

\text{ }

\text{ }

\copyright 2017 – Dan Ma

Tower of Hanoi

The following shows a set for the game of Tower of Hanoi.

8-disk Tower of Hanoi (Wikipedia)

The game is discussed in a companion blog. The first paragraph in that blog post:

    The tower of Hanoi is a game that works on multiple levels. It is a challenging game that test the agility and organization skills of the player. It is also a game mathematicians would love since the game is an excellent illustration of math concepts such as mathematical induction and exponential growth. It is also a concrete illustration of a recursive algorithm. To read more …

The game is discussed previously in this blog post in this blog, which is more detailed in terms of strategy of playing the game.

\copyright 2017 – Dan Ma

The probabilities of poker hands

This post works with 5-card Poker hands drawn from a standard deck of 52 cards. The discussion is mostly mathematical, using the Poker hands to illustrate counting techniques and calculation of probabilities

Working with poker hands is an excellent way to illustrate the counting techniques covered previously in this blog – multiplication principle, permutation and combination (also covered here). There are 2,598,960 many possible 5-card Poker hands. Thus the probability of obtaining any one specific hand is 1 in 2,598,960 (roughly 1 in 2.6 million). The probability of obtaining a given type of hands (e.g. three of a kind) is the number of possible hands for that type over 2,598,960. Thus this is primarily a counting exercise.

___________________________________________________________________________

Preliminary Calculation

Usually the order in which the cards are dealt is not important (except in the case of stud poker). Thus the following three examples point to the same poker hand. The only difference is the order in which the cards are dealt.

These are the same hand. Order is not important.

The number of possible 5-card poker hands would then be the same as the number of 5-element subsets of 52 objects. The following is the total number of 5-card poker hands drawn from a standard deck of 52 cards.

    \displaystyle \binom{52}{5}=\frac{52!}{5! \ 47!}=2,598,960

The notation \binom{n}{r} is called the binomial coefficient and is pronounced “n choose r”, which is identical to the number of r-element subsets of a set with n objects. Other notations for \binom{n}{r} are _nC_r, C_{n,r} and C(n,r). Many calculators have a function for _nC_r. Of course the calculation can also be done by definition by first calculating factorials.

Thus the probability of obtaining a specific hand (say, 2, 6, 10, K, A, all diamond) would be 1 in 2,598,960. If 5 cards are randomly drawn, what is the probability of getting a 5-card hand consisting of all diamond cards? It is

    \displaystyle P(\text{all diamond})=\frac{1287}{2598960}=0.000495198=0.0495198 \%

This is definitely a very rare event (less than 0.05% chance of happening). The numerator 1,287 is the number of hands consisting of all diamond cards, which is obtained by the following calculation.

    \displaystyle \binom{13}{5} \times \binom{39}{0}=\frac{13!}{5! \ 8!} \times 1=1,287

The reasoning for the above calculation is that to draw a 5-card hand consisting of all diamond, we are drawing 5 cards from the 13 diamond cards and drawing zero cards from the other 39 cards. Since \binom{39}{0}=1 (there is only one way to draw nothing), \binom{13}{5}=1287 is the number of hands with all diamonds.

If 5 cards are randomly drawn, what is the probability of getting a 5-card hand consisting of cards in one suit? The probability of getting all 5 cards in another suit (say heart) would also be 1287/2598960. So we have the following derivation.

    \displaystyle \begin{aligned} P(\text{one suit})&=P(\text{all diamond})+P(\text{all heart})+P(\text{all club})+P(\text{all spade}) \\&=\frac{1287}{2598960}+\frac{1287}{2598960}+\frac{1287}{2598960}+\frac{1287}{2598960} \\&=\frac{5148}{2598960} \\&=0.001980792 \\&=0.198 \% \end{aligned}

Thus getting a hand with all cards in one suit is 4 times more likely than getting one with all diamond, but is still a rare event (with about a 0.2% chance of happening). Some of the higher ranked poker hands are in one suit but with additional strict requirements. They will be further discussed below.

Another example. What is the probability of obtaining a hand that has 3 diamonds and 2 hearts? The answer is 22308/2598960 = 0.008583433. The number of “3 diamond, 2 heart” hands is calculated as follows:

    \displaystyle \binom{13}{3} \times \binom{13}{2} \times \binom{26}{0}=\frac{13!}{3! \ 11!} \times \frac{13!}{2! \ 13!}=286 \times 78=22308

One theme that emerges is that the multiplication principle is behind the numerator of a poker hand probability. For example, we can think of the process to get a 5-card hand with 3 diamonds and 2 hearts in three steps. The first is to draw 3 cards from the 13 diamond cards, the second is to draw 2 cards from the 13 heart cards, and the third is to draw zero from the remaining 26 cards. The third step can be omitted since the number of ways of choosing zero is 1. In any case, the number of possible ways to carry out that 2-step (or 3-step) process is to multiply all the possibilities together.

___________________________________________________________________________

The Poker Hands

Here’s a ranking chart of the Poker hands.

The chart lists the rankings with an example for each ranking. The examples are a good reminder of the definitions. The highest ranking of them all is the royal flush, which consists of 5 consecutive cards in one suit with the highest card being Ace. There is only one such hand in each suit. Thus the chance for getting a royal flush is 4 in 2,598,960.

Royal flush is a specific example of a straight flush, which consists of 5 consecutive cards in one suit. There are 10 such hands in one suit. So there are 40 hands for straight flush in total. A flush is a hand with 5 cards in the same suit but not in consecutive order (or not in sequence). Thus the requirement for flush is considerably more relaxed than a straight flush. A straight is like a straight flush in that the 5 cards are in sequence but the 5 cards in a straight are not of the same suit. For a more in depth discussion on Poker hands, see the Wikipedia entry on Poker hands.

The counting for some of these hands is done in the next section. The definition of the hands can be inferred from the above chart. For the sake of completeness, the following table lists out the definition.


Definitions of Poker Hands

Poker Hand Definition
1 Royal Flush A, K, Q, J, 10, all in the same suit
2 Straight Flush Five consecutive cards,
all in the same suit
3 Four of a Kind Four cards of the same rank,
one card of another rank
4 Full House Three of a kind with a pair
5 Flush Five cards of the same suit,
not in consecutive order
6 Straight Five consecutive cards,
not of the same suit
7 Three of a Kind Three cards of the same rank,
2 cards of two other ranks
8 Two Pair Two cards of the same rank,
two cards of another rank,
one card of a third rank
9 One Pair Three cards of the same rank,
3 cards of three other ranks
10 High Card If no one has any of the above hands,
the player with the highest card wins

___________________________________________________________________________

Counting Poker Hands

Straight Flush
Counting from A-K-Q-J-10, K-Q-J-10-9, Q-J-10-9-8, …, 6-5-4-3-2 to 5-4-3-2-A, there are 10 hands that are in sequence in a given suit. So there are 40 straight flush hands all together.

Four of a Kind
There is only one way to have a four of a kind for a given rank. The fifth card can be any one of the remaining 48 cards. Thus there are 48 possibilities of a four of a kind in one rank. Thus there are 13 x 48 = 624 many four of a kind in total.

Full House
Let’s fix two ranks, say 2 and 8. How many ways can we have three of 2 and two of 8? We are choosing 3 cards out of the four 2’s and choosing 2 cards out of the four 8’s. That would be \binom{4}{3} \times \binom{4}{2} = 4 x 6 = 24. But the two ranks can be other ranks too. How many ways can we pick two ranks out of 13? That would be 13 x 12 = 156. So the total number of possibilities for Full House is

    \displaystyle 13 \times 12 \times \binom{4}{3} \times \binom{4}{2}= 13 \times 12 \times 4 \times 6 =3744

Note that the multiplication principle is at work here. When we pick two ranks, the number of ways is 13 x 12 = 156. Why did we not use \binom{13}{2} = 78?

Flush
There are \binom{13}{5} = 1,287 possible hands with all cards in the same suit. Recall that there are only 10 straight flush on a given suit. Thus of all the 5-card hands with all cards in a given suit, there are 1,287-10 = 1,277 hands that are not straight flush. Thus the total number of flush hands is 4 x 1277 = 5,108.

Straight
There are 10 five-consecutive sequences in 13 cards (as shown in the explanation for straight flush in this section). In each such sequence, there are 4 choices for each card (one for each suit). Thus the number of 5-card hands with 5 cards in sequence is 10 \times 4^5= 10240. Then we need to subtract the number of straight flushes (40) from this number. Thus the number of straight is 10240 – 10 = 10,200.

Three of a Kind
There are 13 ranks (from A, K, …, to 2). We choose one of them to have 3 cards in that rank and two other ranks to have one card in each of those ranks. The following derivation reflects all the choosing in this process.

    \displaystyle \binom{13}{1} \times \binom{4}{3} \times \binom{12}{2} \times \binom{4}{1} \times \binom{4}{1}=13 \times 4 \times 66 \times 4 \times 4 = 54912

Two Pair and One Pair
These two are left as exercises.

High Card
The count is the complement that makes up 2,598,960.

The following table gives the counts of all the poker hands. The probability is the fraction of the 2,598,960 hands that meet the requirement of the type of hands in question. Note that royal flush is not listed. This is because it is included in the count for straight flush. Royal flush is omitted so that he counts add up to 2,598,960.


Probabilities of Poker Hands

Poker Hand Count Probability
2 Straight Flush 40 0.0000154
3 Four of a Kind 624 0.0002401
4 Full House 3,744 0.0014406
5 Flush 5,108 0.0019654
6 Straight 10,200 0.0039246
7 Three of a Kind 54,912 0.0211285
8 Two Pair 123,552 0.0475390
9 One Pair 1,098,240 0.4225690
10 High Card 1,302,540 0.5011774
Total 2,598,960 1.0000000

___________________________________________________________________________
\copyright 2017 – Dan Ma

How to calculate winning odds in Powerball

Powerball is a lottery game available for play in 44 states in the United States and in the District of Columbia, Puerto Rico and the US Virgin Islands. It is a “mega” lottery because of the usually huge jackpot. As of the writing of this post, the estimated jackpot is $113 million (see the below picture). The largest Powerball jackpot is $1.59 billion on January 13, 2016 (this was also the largest in US lottery history). The average Powerball jackpot is $62.71 million. Recent jackpots had been in the range of hundreds of million dollars.

With such large jackpots, it is not surprising that Powerball is a popular lottery game, especially when the impending jackpot is in the hundreds of millions dollars. Bear in mind that the odds of winning the jackpot are slim: 1 in 292,201,338, i.e. one in over 292 million. This post is to demonstrate how to mathematically derive the winning odds of the jackpot and other smaller prizes of Powerball (the first objective). Calculating the odds is a great math lesson. It is equally interesting and equally important to make sense of the large numbers in the small winning odds. The second objective is to focus on the dim prospect of Powerball players as told by the mathematical odds.

Figure 1 – Powerball Results on April 26, 2017

___________________________________________________________________________

The Game of Powerball

This is how the game is played. In a Powerball playslip, a player picks 5 numbers from 1 through 69 and 1 number from 1 through 26 (this is the Powerball number). Drawings are held Wednesday and Saturday evenings at 10:59 p.m. Eastern Time. In each drawing, five balls are drawn from a drum with 69 white balls (labeled 1 through 69) and one ball is drawn from a drum with 26 red balls (labeled 1 through 26). Figure 1 above shows that the winning numbers on April 26, 2017 are 1, 15, 18, 26 and 51 (white) and 26 (red). The winnings are determined by whether the numbers in the playslip match the balls drawn at the Powerball drawing.

The grand prize is awarded to the player (or players) whose ticket matches all of the numbers on the five chosen white balls and the one chosen red ball. Smaller prizes are awarded to players whose tickets match fewer number of white balls with or without the red ball. The following table shows the levels of prizes and the odds. These Powerball payout rules had been in place since October 7, 2015.

Figure 2 – Powerball Prizes and Odds

\bold S \bold o \bold u \bold r \bold c \bold e: \bold w \bold w \bold w. \bold p \bold o \bold w \bold e \bold r \bold b \bold a \bold l \bold l. \bold c \bold o \bold m

As shown by the table, the prizes are for scenarios ranging from matching zero white balls to all 5 of the white balls (with or without the match of the red ball). Of course, matching all 6 balls would lead to the grand prize. Matching just the 5 white balls without the red ball would be the fixed prize of $1 million. Matching any 4 of the white balls plus the red ball would lead to the prize of $50,000. The rest of the prizes are for practically insignificant amounts.

Each Powerball ticket is $2. For an additional $1 per game, a player may activate the Power Play option. When this option is activated, the lower prizes that are $50,000 or less are multiplied by a factor from 1 up to 5 or 10 (10x is available when the jackpot is under $150 million). With the Power Play Option, the prize for the “5 white balls and no red ball” match is increased to $2 million. That is, the “5 + 0” prize is doubled under the Power Play Option. For further information on Powerball, see the Wikipedia entry.

The odds represented in the above table are essentially winning probabilities for a Powerball player (per $2 bet). The odds for the grand prize is 1 in 292,201,338. So the probability of winning the grand prize with one ticket is 1 / 292,201,338, which of course is essentially zero. This probability is essentially this statement: out of 292,201,338 different possible Powerball combinations and only one of them is the winning combination.

The odds for the $1 million prize is 1 in 11,688,353.52, about 1 in 11 million (still very slim odds). So the probability of winning $1 million with one ticket is 1 / 11,688,353.52, which is also essentially zero. Note that 1 / 11,688,353.52 is the same as 25 / 292,201,338. This last probability is the statement: out of 292,201,338 different possible Powerball combinations, only 25 of them are winning (i.e. matching all 5 white balls without the matching the red Powerball number).

Thus calculating the odds for Powerball (or any other lottery game for that matter) is about counting correctly the number of possible lottery ticket combinations (the denominator) and counting the number of winning tickets (the numerator). So this is a counting exercise. In the area of mathematics called combinatorics, which at the elementary level concerns with the counting of number of ways objects can be chosen from a set (a set of balls in the case of lottery). Several previous posts of this blog are devoted to this subject. We will reference them when necessary. However, we strive to make the discussion here as self contained as possible.

___________________________________________________________________________

Using a Toy Lottery

As indicated above, there are 292,201,338 different possible Powerball tickets. Out of these, there are 25 possible winning pickets for the $1 million prize. Of the 292,201,338 possible tickets, there are 320 possible winning tickets for the $50,000 prize. Instead of attacking these problems head on, we use a “toy” lottery to illustrate the idea. Any reader who feels that such introduction is not needed can skip to the next section.

Based on the last paragraph in the preceding section, the winning probability is the number of possible ways to win over the number of possible “tickets.”

    Probability of winning = \displaystyle \frac{\text{the number of winning numbers}}{\text{the total number of possible lottery numbers}}

For example, if 40 tickets are sold in a raffle and only one of them is the winner, then the odds of winning would be 1 in 40. The probability of winning would be 1 / 40 = 0.025 (2.5% chance). If two of the 40 tickets are winners, then the odds are 1 in 20. The probability of winning would then be 2 / 40 = 0.05 (5% chance). Though the Powerball calculation is based on this simple idea, there are subtle points that need to be addressed in order to fully understand the calculation. So we take the approach of using a toy lottery.

Let’s call the game “toy Powerball.” The player picks two numbers from 5 white balls (labeled 1 through 5) and picks one number from 4 red balls (labeled 1 through 4). The prizes are:

  • grand prize (match all 2 white balls and 1 red ball),
  • $100 prize (match 2 white balls and not matching the red ball),
  • $10 prize (match 1 white ball and the red ball).

How many different possible tickets are there? Since there are two drawings (one for the white ball and one for the red ball), we need to count them separately and multiply the results. Since there are only 5 white balls, we can actually count the number of ways to choose 2 balls out of 5 white balls.

    1, 2
    1, 3
    1, 4
    1, 5
    2, 3
    2, 4
    2, 5
    3, 4
    3, 5
    4, 5

There are obviously 4 ways to choose one ball out of 4 red balls. So the total number of different possible tickets are 10 x 4 = 40. This is based on the so called multiplication principle – if one event can occur in M ways and another event can occur, independent of the first event, in N ways, then the two events together can occur in M \times N ways. The multiplication principle, discussed here, will be a great help in calculating the odds. For example, the first event is the choosing of the white balls and the second event is the choosing of the red balls. This idea can be used to find the number of possibilities in both the denominator and the numerator.

There is only one possible winning ticket for the grand prize (of course, if two people pick the same winning combination, they will split the prize). So the odds for winning the grand prize is 1 in 40. The probability of winning the grand prize is 1 / 40 = 0.025 (2.5%).

Is there a way to calculate the number of ways to choose two balls out of 5 white balls (labeled 1 through 5) without writing down all possibilities? This will be key to the Powerball calculation. There are two ways. One is an intuitive approach using the multiplication principle and the other is to use a formula for combination.

The first idea. Choosing two numbers is like filling two spots with numbers. In this case, the first spot has 5 choices and the second spot has 4 choices after the first spot is filled. This gives a total of 5 x 4 = 20. But 20 is an over count. For example, the result 3-2 (first number is 3 and the second number is 2) is actually the same as 2-3 as far as the lottery is concerned. So each 2-number combination appears twice in the count of 20. Thus dividing by 2 gives 20 / 2 = 10.

The other way is via a formula for combination. In the toy Powerball example, we need to choose 2 balls out of 5 balls. The following is the calculation.

    \displaystyle \frac{5!}{2! \ 3!}=\frac{120}{2  \times 6}=10

Let’s unpack this calculation. The notation 5!, read 5 factorial, is 5 x 4 x 3 x 2 x 1 = 120, in other words, obtained by multiplying together 5 and all the positive integers below 5. Thus 6! = 6 x 5 x 4 x 3 x 2 x 1 = 6 x 120 = 720. In general, whenever n is a positive integer, n!, read n factorial, is obtained by multiplying n and all the positive integers below n. To make the calculation works out, we define 0! = 1.

Is there any natural interpretation of factorial? For example, 5! is the number ways to arrange 5 people in a row for a group photo. Five people are to be assigned into 5 spots. There are 5 possibilities for the first spot, 4 possibilities for the second spot after the first person is chosen and so on. By the multiplication principle, the total number of ways to do this would be 5 x 4 x 3 x 2 x 1. In general n! is the number of ordered arrangements of n objects.

Thus the number of ways to choose 2 balls out of 5 balls is 10 (also the number of ways to choose two people out of 5 to form a committee of 2, the number of ways to choose 2 students out of 5 for awards, or the number of ways to choose 2 ice cream flavors out of 5 – the possibility is endless). The notation for “choose 2 from 5” is \binom{5}{2}. The top number in this notation is the total number of objects to choose from and the bottom number is the number of objects to be chosen. In general, the number of ways to choose r objects out of n objects is

    \displaystyle \binom{n}{k}=\frac{n!}{r! \ (n-r)!}

The number \binom{n}{r} is called the number of possible combinations of n objects chosen r at a time. Other notations include _nC_r and C(n,r). Regardless the notations, it is the number of ways to choose r objects from a set of n objects or the number of different groups of size r that can be chosen from a set of n distinct objects. Many calculators have a function for the number \binom{n}{r} (in a calculator the notation is probably _nC_r). If it is to be calculated by hand, bear in mind that n!, factorial of the total number of object, is in the numerator and the two factorials in the denominator are r! and (n-r)! where r is the number of objects to be chosen. Note that the sum of r and n-r is n.

For more information about how to calculate combination, see here and the summary section here.

Let’s look at the second prize in toy Powerball – match 2 white balls and no match for red ball. As example, let’s say the winning numbers are 1 and 3 (white) and 2 (red). How many 3-number combinations satisfy the winning criteria for this prize – matching two white numbers and not the red number? There is still only one way to match the two white numbers. But there are 3 ways to not match the red winning numbers (the non-winning red numbers would be 1, 3, and 4). The number of possible winning tickets would be 1 x 3 = 3. So the odds for winning the $100 prize are 3 in 40. The probability of winning the $100 prize is 3 / 40 = 0.075 (7.5%).

The third prize is $10 won by matching 1 white ball and 1 red ball. Again, using the example of winning numbers of 1 and 3 (white) and 2 (red), how many 3-number combinations satisfy the winning criteria for this prize – matching 1 white number and the red number? Remember that the player of the game chooses 2 white numbers and 1 red number. Since there is only 1 correct match for white number, there are 1 winning white number and 1 non-winning white number in the ticket. There are two ways to match 1 winning white number (1 or 3) and there are 3 ways to match 1 non-winning white number (2, 4 and 5). Of course, there is only one way to match the winning red number. The number of possible winning tickets would be 2 x 3 x 1 = 6 (by the multiplication principle again). This observation will be crucial in understanding the Powerball calculation below.

So the odds for winning the $10 prize are 6 in 40. The probability of winning would be 6 / 40 = 0.15 (15%).

___________________________________________________________________________

Back to Powerball

The winning probability of a prize in a lottery would be the fraction of all of the possible lottery numbers which count as winning. In other words, the winning probability would be a fraction with the denominator being the total number of possible number combinations that can be picked by the lottery players and the numerator being the number of possible winning combinations. So the denominator is always fixed for a given lottery. The numerator would vary depending on the prize. The bigger the prize, the smaller the numerator (the harder it is to win).

For the Powerball game uses a 6-number combination (5 white numbers + 1 red number). the 5 white numbers are chosen out of 69 numbers and the 1 red number is chosen out of 26 numbers. So based on the discussion in the preceding section, the total number of possible 6-number combinations would be:

Total Number of Possible Powerball Combinations

    \displaystyle \binom{69}{5} \cdot \binom{26}{1}=\text{11,238,513} \times 26=\text{292,201,338}

There are \binom{69}{5} many possible 5-number sets that can be picked from 69 numbers. There are over 11 millions ways to match the 5 winning white balls. There are \binom{26}{1} = 26 possible ways to pick one number from 26 numbers. By the multiplication principle, the product of these numbers would be the total number of possible Powerball combinations.

For the winning probability of any Powerball prize, the denominator would be 292,201,338, which is a staggering number. To help with understanding what should go into the numerator, let’s use the latest winning combination: 1, 15, 18, 26 and 51 (white balls) and 26 (red ball), drawn on April 26, 2017 (shown in Figure 1 above). So all the winning calculations below are based on this example. The number that goes into the numerator would be the number of tickets matching some or all of the white numbers with or without matching the red number depending on the rules. Let’s consider the prizes one by one.

Match 5 white balls + 1 red ball (grand prize)
There is only winning combination. So the winning probability is 1 over 292,201,338. The winning odds would be 1 in 292,201,338.

Match 5 white balls + no red ball ($1,000,000)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{5} \cdot \binom{1}{0} \cdot \binom{25}{1}}{292201338}=\frac{1 \cdot 25}{292201338}=\frac{25}{292201338}

The relevant question here is: how many ticket combinations are that that match all 5 winning white numbers and that do match the 1 red winning number? There is only one way to match all 5 white numbers 1, 15, 18, 26 and 51 (as in the grand prize). There are 25 ways to not match the winning red number 26 (1 through 25). So the numerator is 25. So the odds for winning $1 million are 25 in 292,201,338 or 1 in 11,688,053.52 (1 in over 11 million).

Match 4 white balls + 1 red ball ($50,000)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{4} \cdot \binom{64}{1} \cdot \binom{1}{1} \cdot \binom{25}{0}}{292201338}=\frac{5 \cdot 64}{292201338}=\frac{320}{292201338}

To win, a player needs to match 4 of the white numbers. To count all the winning combinations, choose 4 numbers from the 5 winning white numbers and choose 1 number from the 64 non-winning white numbers Recall that the ticket has 5 white numbers. If only 4 of them match the winning numbers, then one of the white numbers on the ticket must from a non-winning white number.

Furthermore, choose 1 number from the 1 winning red number and 0 numbers from the 25 non-winning red numbers. This observation explains what appear in the numerator. Then the odds for winning the $50,000 prize are 320 in 292,201,338 or 1 in 913,129.18.

Match 4 white balls + no red ball ($100)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{4} \cdot \binom{64}{1} \cdot \binom{1}{0} \cdot \binom{25}{1}}{292201338}=\frac{5 \cdot 64 \cdot 25}{292201338}=\frac{8000}{292201338}

Without the red ball match, there are more ways to win (25 more to be precise). To count the number of winning tickets, choose 4 from the 5 winning white numbers, choose 1 from the 64 non-winning white numbers, choose 0 from the 1 winning red number and choose 1 from the 25 non-winning red numbers. Doing this gives 8,000. Then the odds for winning the $100 prize are 8,000 in 292,201,338 or 1 in 36,525.17.

Match 3 white balls + 1 red ball ($100)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{3} \cdot \binom{64}{2} \cdot \binom{1}{1} \cdot \binom{25}{0}}{292201338}=\frac{10 \cdot 2016 \cdot 1}{292201338}=\frac{20160}{292201338}

To count the number of winning tickets, choose 3 from the 5 winning white numbers, choose 2 from the 64 non-winning white numbers, choose 1 from the 1 winning red number and choose 0 from the 25 non-winning red numbers. Doing this gives 20,160. Then the odds for winning the second $100 prize are 20,160 in 292,201,338 or 1 in 14,494.11.

Match 3 white balls + no red ball ($7)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{3} \cdot \binom{64}{2} \cdot \binom{1}{0} \cdot \binom{25}{1}}{292201338}=\frac{10 \cdot 2016 \cdot 25}{292201338}=\frac{504000}{292201338}

Let’s use the latest winning numbers 1, 15, 18, 26 and 51 (white balls) and 26 (red ball) to illustrate. How many ticket combinations would satisfy this criteria: match 3 numbers from 1, 15, 18, 26 and 51 (white balls) and not match the winning red number 25? The answer is the number of ways to choose 3 from 1, 15, 18, 26 and 51, choose 2 from the other 64 white numbers, choose 0 from 1 winning red number (26) and choose 1 from 25 non-winning red numbers. This explains what is in the numerator, which is 504,000. Then the odds for winning the first $7 prize are 504,000 in 292,201,338 or 1 in 579.76.

Match 2 white balls + 1 red ball ($7)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{2} \cdot \binom{64}{3} \cdot \binom{1}{1} \cdot \binom{25}{0}}{292201338}=\frac{10 \cdot 41664 \cdot 1}{292201338}=\frac{416640}{292201338}

Using similar idea to set up the numerator, the numerator is 416,640. Then the odds for winning the second $7 prize are 416,640 in 292,201,338 or 1 in 701.33.

Match 1 white ball + 1 red ball ($4)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{1} \cdot \binom{64}{4} \cdot \binom{1}{1} \cdot \binom{25}{0}}{292201338}=\frac{5 \cdot 635376 \cdot 1}{292201338}=\frac{3176880}{292201338}

As discussed earlier, when the match of the white balls is partial, the white numbers that do not match the winning white numbers must come from the 64 non-winning numbers. So we choose 1 from 5 winning white numbers and choose 4 from 64 non-winning white numbers. There is only one way to pick the winning red number. The numerator is 3,176,880. Then the odds for winning the first $4 prize are 3,176,880 in 292,201,338 or 1 in 91.98.

Match no white ball + 1 red ball ($4)

    Probability of winning = \displaystyle \frac{\displaystyle \binom{5}{0} \cdot \binom{64}{5} \cdot \binom{1}{1} \cdot \binom{25}{0}}{292201338}=\frac{1 \cdot 7624512 \cdot 1}{292201338}=\frac{7624512}{292201338}

The numerator is 7,624,512. Then the odds for winning the second $4 prize are 7,624,512 in 292,201,338 or 1 in 38.32.

The overall odds of winning a prize
The total number of possible 6-number combinations that result in a prize is obtained by summing all the numerators in the above calculation of odds. There is a total of 11,750,538 many possible prizes. The following table shows the breakdown.

Number of Possible Prizes
\begin{array}{rrrrrrr}      \text{Match} & \text{ } & \text{Prize} & \text{ } & \text{Odds} & \text{ } & \text{Number of Prizes}\\      \text{ } & \text{ } \\      \text{5 white, 1 red} & \text{ } & \text{Grand Prize} & \text{ } & \text{1 in 292,201,338} & \text{ } & 1\\      \text{5 white, no red}  & \text{ } & \$ \text{1 million} & \text{ } & \text{1 in 11,688,053.52} & \text{ } & 25\\      \text{4 white, 1 red}  & \text{ } & \$ \text{50,000} & \text{ } & \text{1 in 913,129.18} & \text{ } & 320\\      \text{4 white, no red}  & \text{ } & \$ 100 & \text{ } & \text{1 in 36,525.17}  & \text{ } & \text{8,000}\\  \text{3 white, 1 red}  & \text{ } & \$ 100 & \text{ } & \text{1 in 14,494.11}  & \text{ } & \text{20,160}\\  \text{3 white, no red}  & \text{ } & \$ 7 & \text{ } & \text{1 in 579.76}  & \text{ } & \text{504,000}\\  \text{2 white, 1 red}  & \text{ } & \$ 7 & \text{ } & \text{1 in 701.33}  & \text{ } & \text{416,640}\\  \text{1 white, 1 red}  & \text{ } & \$ 4 & \text{ } & \text{1 in 91.98}  & \text{ } & \text{3,176,880}\\  \text{no white, 1 red}  & \text{ } & \$ 4 & \text{ } & \text{1 in 38.32}  & \text{ } & \text{7,624,512}\\      \text{ }  & \text{ } & \text{ } & \text{ } & \text{ } & \text{ } & \text{ }\\      \text{ }  & \text{ } & \text{Any Prize} & \text{ } & \text{1 in 24.87} & \text{ } & 11,750,538    \end{array}

The odds of winning any prize would be 11,750,538 in 292,201,338, or 1 in 24.87. It ought to be mentioned that the last column in the table is only for the number of potential prizes in a category (they are number of possible winning combinations). A prize is awarded only when someone purchased the winning combination. For example, on April 26, 2017, no one won the grand prize since no one had purchased the winning numbers of 1, 15, 18, 26, and 51 and 26. No one had won the $2 million prize for match 5 power play. However, there were three winners for the match 5 (they did not pay for the extra power play option). There are 25 possible winning combinations but only three were won. Figure 1 also shows that there were 592,253 winners on that day. Potentially, there could be 11,750,538 (or more) winners in each drawing.

Remarks on Calculations
First, we emphasize again the point that has been made several times already. The smaller prizes only require partial matching of white numbers with or without matching of the red number. The matching of white numbers must account for winning and non-winning numbers. For example, if the prize requires the matching of 3 white numbers, then we need to count 3 numbers from the 5 winning white numbers and 2 numbers from 64 non-winning white numbers. Hence \binom{5}{3} \times \binom{64}{2}. Recall that to play Powerball, the player chooses 5 numbers from the 69 white balls. If the player matches only 3 winning white balls, the other two white numbers chosen by the player must come from the 64 non-winning white balls.

The odds for the last prize (matching the red number only) are not 1 in 26. This is a common misconception. There are only 26 red balls. So matching the red ball must have a 1 in 26 chance. However, we cannot ignore the white balls! The fact that only the red ball is matching means that the 5 white numbers chosen by the player must come from the 64 non-winning white numbers. So the correct value to go into the numerator is \binom{5}{0} \times \binom{64}{5}.

___________________________________________________________________________

Do You Feel Lucky Today?

The odds for winning the Powerball jackpot are mathematically zero (being 1 in 292 millions). The entire population of the United States is 321.4 millions in 2015, with 248 millions of them age 18 or over. If every adult in the U.S. purchases a Powerball ticket, it is still possible that there will be no winner of the grand prize (but there could be a few $1 million winners). To get a sense of how big the number 292 million is, look at this piece from WSJ. The piece strives to illustrate how vast a quantity 292,201,338 is. Just to scroll the page over that many dots is a near impossible task.

One thing is for sure. With no winners of jackpot in a several drawings in a row, the excitement for Powerball is turned into a frenzy. In fact, the last revision of the Powerball rules in 2015 made winning of the jackpot much less likely. As a result the jackpot usually keeps building until it reaches hundreds of millions range or the 1 billion range. The rules were designed to rachet up the excitement and as a result driving up sales (discussed in this piece from Washington Post).

Playing Powerball is an excellent entertainment. With a $2 admission price, you can dream and fantasize for a few days. Once a week habit would be about $100 a year in entertainment cost. Regular customers of Starbucks would spend more than that amount. Of the regular lottery players, roughly the top 5% spend a few thousand dollars a year (a hundred thousand dollars on average in their lifetime, assuming no interest) for an illusive chance to win $1 million or more. What if these regular players invest the money elsewhere?

If they invest the money instead on a conservative fund earning 2% a year, they would accumulate an nest egg of hundreds of thousands of dollars. Assuming a rate of returns at 2% a year, investing $3,000 a year would yield approximately $185,000. In fact, if they invest in a broad base stock market index (e.g. S&P 500 index), they would do even better in the long run. The long run historical S&P 500 returns are around 10% a year (7% after inflation). Assuming a rate of returns at 7% a year, investing $3,000 a year would yield around $640,000!

Even in the rare chance that someone wins $1 million, the winning is taxed and would be greatly reduced, e.g. the winning could be $500,000 after tax. Thus the rates of return of a lifetime investment in playing lottery are not as great as people imagine. Without winning, the lifetime lottery investment of a few thousands dollars a year would be money going down the drain. For a more detailed discussion, see this piece from Washington Post.

The calculation of the Powerball winning odds is a great math lesson. Rather than looking at the winning odds, perhaps it is more instructive to look at losing odds. The odds for not winning the Powerball grand prize (per $2 bet) are 292,201,337 to 292,201,338, which are essentially 1 in 1.

___________________________________________________________________________
\copyright 2017 – Dan Ma