According to the Chinese zodiac, the year 2017 is the year of the rooster. In fact, today (January 28, 2017) is the start lunar New Year, which is the first day of the year of the rooster. What better way to celebrate the year of the rooster than working a related math puzzle, or to perform a related random experiment!

At the office yesterday, conversation at the beginning of a meeting, before discussing the main topics, centered on the Chinese zodiac animal signs (the Chinese zodiac system is a 12-year cycle with a different animal representing each year). One coworker mentioned he is a tiger. Another coworker did not know his sign and Googled to find out that he is a rat! Another coworker is a rooster. It turns out that a pig is also represented. Imagine that you keep picking people at random and ascertain his/her animal sign. How many people do you have to ask in order to have met all 12 animal signs?

The number of people you have to ask is a random quantity. If you are very lucky, you may encounter all 12 signs by asking just 12 people (the minimum number of people you have to ask). The outcome of twelve people is not likely (it has only about 0.005% chance of happening). It is almost certain that you have to ask many more people. It turns out that the first few animals are easy to obtain (just like in the meeting with 4 different animal signs for 4 people). As you ask more people, the animal signs tend to repeat. So you have to keep asking more people. As you encounter more animal signs, it takes longer to get a new animal sign. For example, if you have encountered 11 animal signs already, you will have to ask on average 12 more people to find the last animal sign. So the overall average is not a number near 12. On average, you will have to ask about 37 people to get the entire set of 12 animals.

The random experiment that has been described is this. Put 12 slips of papers numbered 1 through 12 in a hat. Randomly draw a piece of paper and note the number and then put it back into the hat. Keep drawing until all 12 numbers have been chosen. Let be the number of selections that are required to perform this random experiment. Of course, you can expand the sample space to include more slips of papers (i.e. with more numbers). But the context will not be picking animal signs.

There are two ways to get a handle on the random variable as described above. One is through simulation and the other is through math.

Before discussing the simulation or the math, let’s point out that the problem discussed here is a classic problem in probability that goes by the name “the coupon collector problem”. The numbers 1 to 12 in a hat are like coupon (prizes) that are randomly given out when purchasing a product (e.g. a box of cereal). The problem discussed here is that the coupon collector desires to collect the entire set of coupons.

___________________________________________________________________________

**Simulation**

To get a sense of how long it will take, simulate random numbers from 1 through 12 until all numbers have appeared. The following is 5 iterations of the experiment.

9, 2, 6, 8, 9, 10, 2, 9, 8, 1, 5, 11, 1, 1, 2, 10, 9, 8, 8, 9, 5, 11, 9, 3, 7, 9, 8, 8, 4, 3, 1, 4, 3, 12 (34 draws)

7, 7, 1, 10, 11, 11, 10, 4, 5, 8, 8, 2, 6, 4, 6, 2, 12, 6, 6, 12, 9, 5, 8, 10, 1, 5, 10, 4, 9, 4, 1, 11, 11, 6, 2, 1, 6, 6, 3 (39 draws)

9, 5, 2, 2, 1, 5, 6, 11, 7, 11, 4, 6, 1, 12, 3, 7, 8, 3, 3, 2, 2, 3, 5, 6, 2, 5, 1, 6, 8, 5, 4, 10 (32 draws)

1, 5, 5, 4, 5, 12, 10, 1, 8, 1, 3, 9, 1, 3, 11, 9, 10, 3, 9, 11, 4, 4, 4, 7, 7, 3, 1, 11, 11, 4, 10, 6, 3, 2 (34 draws)

6, 7, 6, 1, 12, 6, 1, 1, 7, 1, 11, 10, 3, 3, 9, 6, 9, 4, 2, 6, 11, 7, 7, 11, 2, 6, 2, 1, 7, 2, 5, 9, 6, 12, 6, 11, 1, 11, 11, 2, 5, 6, 7, 5, 2, 11, 2, 2, 6, 2, 12, 5, 5, 5, 12, 10, 3, 11, 1, 10, 10, 6, 9, 11, 10, 7, 11, 5, 1, 9, 11, 9, 8 (73 draws)

Each of the number is generated by using the =RANDBETWEEN(1, 12) in Excel. In each iteration, the numbers are generated until all 12 numbers have been generated.

There is considerable fluctuation in this 5 iterations of the experiment. With the 5th one being exceptionally long, it is possible that it takes a long time to find all 12 animal signs. The average of the first iteration is obviously 34. The average of the first two iteration is 36.5. The averages of the first 3, 4, and 5 iterations are 35, 34.75, and 42.4, respectively.The last average of 42.4 is quite different from the average of 37 indicated earlier.

What if we continue to run the experiment, say, for 10,000 times? What would the long run averages look like? The following graph shows the averages from first 100 runs of the experiment. It plots the average of the iterations from to .

Figure 1 shows that there is quite a bit of fluctuation in the averages in the first 25 runs or so. Eventually, the aberages settle around 37 but still retain noticeable fluctuation. The following graph shows the averages from first 1000 runs of the experiment.

The graph is Figure is smoother as it moves toward 1000, but still has noticeable fluctuation from 37 (in fact the graph is mostly below 37). The following graph shows the averages from first 10000 runs of the experiment.

The graph in Figure 3 shows the average of the first iterations with goes from 1 to 10,000. The graph is for the most parts a horizontal line slightly above 37, especially after . In fact the average of all 10,000 iterations is 37.3381, which is close to the theoretical average of 37.2385.

The simulation is an illustration of the law of larger numbers. The essence of the law of large numbers is that the short run results of a random experiment are unpredictable while the long run results are stable and predictable and eventually settle around the theoretical average.

The first 5 runs of the experiment (as shown above) are certainly unpredictable. It may take 34 draws or may take 73 draws. The first 100 simulations also have plenty of ups and downs, even though graph in Figure 1 shows a movement toward 37. The first 1000 simulations display more stable results but are below average as the graph move toward 1000 (Figure 2). In simulating the experiment 10,000 times (Figure 3), the long run averages settle around the theoretical average of 37.2385.

So if you survey people their animal signs, the time it takes has a great deal of random fluctuations. It may take 34 asks for 73 asks (as shown in the first 5 simulations). If the experiment is done repeatedly, the results are predictable, i.e. the average is around 37.

The long run results of a gambling game are predictable too and will settle around the theoretical average. The theoretical average of a gambling game is usually referred to as the house edge. For example, for the game of roulette, the house edge is 5.06%. For each bet of $1, the gambler is expected to lose 5.26 cents. In playing a few games, the gambler may win big. In the long run, the house is expected to lose 5.26 cents per one dollar bet. Thus the law of large numbers can mean financial ruin for the gambler (or profits for the casino). For an illustration of the law of large numbers in the context of the game of Chuck-a-Luck, see here. For an illustration in the context of the roulette wheel, see here.

Another piece of useful information from the 10,000 simulated runs of the experiment is the frequency distribution.

*Table 1*

*Frequency Distribution of the 10,000 Simulated Runs*

Figures 1 to 3 tell us the long run behavior of the simulations (e.g. the long run average is 37). Table 1 gives the counts of the simulations that fall into each interval and the corresponding relative frequency (the percentage). Table 1 tells us how often or how likely a given possibility occurs. The total number of simulations that fall within the range 20 to 49 is 8015. So about 80% of the time, the experiment ends in 20 to 49 draws. Furthermore, 92.95% of the simulations fall into the interval 20 to 69. This really tells us what the likely results would be if we perform the experiment. The frequency distribution also tells us what is unlikely. There is only 3.75% chance that the experiment can be completed with less than 20 draws. In the simulations, there are two that are above 140 (they are 141 and 142). These extreme results can happen but are extremely rare. They only happened about 2 times per 10,000 simulations.

___________________________________________________________________________

**The math angle**

There is also a mathematical description of the random experiment of survey people until all 12 animal signs are obtained. For example, there is a formula for calculating mean, and there is also a formula for calculating the variance. There is also a probability function, i.e. a formula for calculating probabilities (akin to Table 1). The formula for the mean is actually simple to describe.

Let be the number of draws from the set with replacement such that each number in the set is picked at least once. The expectation of is the following.

The 37.2385 theoretical average discussed above comes from this formula. The the case of , the mean would be

For more details of the math discussion, see this previous post. For a more in-depth discussion including the probability function, see this post in a companion blog on probability.

___________________________________________________________________________

Pingback: How long does it take to collect all coupons? | A Blog on Probability and Statistics