The coupon collector problem

The coupon collector problem is a classic problem in probability. It is usually described in this way: a person (called the coupon collector) is trying to collect all coupons (promotional gift items) when purchasing a certain type of products (e.g. boxes of breakfast cereal). Suppose that there are n different types of coupons. Suppose that when someone buys a unit of the product, he or she does not know what type of coupons that is inside the box. How many units of the product does the coupon collector have to buy in order to collect the entire set of coupons?

Note that the number of units to buy is a random quantity. To see it for yourself, roll a fair die until each of the six faces appears at least once. You can try this with your own die and record the numbers as you go. We rolled our own die and the following shows three such experiments.

    3, 4, 6, 2, 6, 1, 3, 2, 1, 4, 6, 2, 6, 1, 6, 1, 2, 3, 5

    6, 6, 4, 1, 3, 6, 6, 4, 1, 5, 4, 2

    6, 2, 1, 1, 5, 5, 1, 1, 1, 3, 6, 1, 3, 5, 4

The first experiment has 19 trials, the second one has 12 trials and the third one has 15 trials. If we run the random experiment another time, we would get another sequence. Also note that the first 5 of the faces (coupons) show up fairly quickly. It takes a few more trials just to get the last coupon.

Rolling a dice is simply a way to generate random numbers. Thus we can describe the coupon collector problem as a random sampling problem. Draw numbers repeatedly from the sample space S=\left\{1,2,3,\cdots n \right\} with replacement until each of the n numbers has been chosen at least once. Let X_n be the number of trials that are required to perform this experiment. As demonstrated above (or as confirmed by your rolling of a die), X_n is a random variable. The smallest value it can take on is n. Theoretically, the random variable X_n could be n or some other integer greater than n. The goal is to describe the probability distribution of the random variable X_n.

Some interesting things to know about this probability distribution are the expected value E[X_n] (the mean or the average number of trials that are needed to complete the experiment). Another one is the probability function (or probability mass function) so that we can determine how likely it is to complete the experiment in k trials. For a more detailed discussion, go to this previous post. In the remainder of the post, we discuss the expected value.

The expected value E[X_n] has a clear formula and is easily computed.

    \displaystyle E[X_n]=n \biggl[ 1 + \frac{1}{2}+\frac{1}{3}+ \cdots+\frac{1}{n-1}+\frac{1}{n} \biggr]

For the 6-coupon case (or the rolling of a die), it would take on average 15 trials to get all 6 faces.

    \displaystyle E[X_6]=6 \biggl[ 1 + \frac{1}{2}+\frac{1}{3}+\frac{1}{4}+\frac{1}{5}+\frac{1}{6} \biggr]=14.7 \approx 15

Of course, 15 is simply the average. The actual results may be more or less than 15 as the above illustrations show (or your own rolling of a die). To find out how likely it is to have a result around 15, it is necessary to have the probability function of X_6. See the previous post for more information on that.

Consider this example. Suppose that you want to throw candies at random to a group of 10 kids. Each candy is assigned to each child at random. You want to throw candies until each of the children has at least one candy. Of course some kids would receive a few candies. But the goal is that no one would come away empty handed. On average, you would need to throw about 30 candies.

    \displaystyle E[X_{10}]=10 \biggl[ 1 + \frac{1}{2}+\frac{1}{3}+ \cdots+\frac{1}{9}+\frac{1}{10} \biggr]=29.29 \approx 30

If there are 50 coupons, it would take on average 225 trials to get all 50 coupons. One interesting thing is that it is relatively easy to get the first few coupons. To get the last coupon, it would take a quite a few trials. In fact, it would take on average n trials to get the last coupon (if n is the number of coupon types).

Another point we would like to make is that the coupon collector problem requires an understanding of the geometric distribution. A random experiment that can modeled by a geometric distribution is the performance of a series of independent trials until some criterion is satisfied. The random variable X_n described here has some similarity with a geometric model (but not totally similar). The similarity: we keep drawing numbers with replacement until all numbers are drawn.

Let’s look at the third experiment 6, 2, 1, 1, 5, 5, 1, 1, 1, 3, 6, 1, 3, 5, 4 listed above. We draw the first number (happens to be 6). Next we wish to draw another number different from 6. Luckily it takes only one trail to get 2. Now we wish to draw a number different from 6 and 2. It also takes one trial (to get 1). Next, continue to draw until we get a number different from 6, 2 and 1. This time it takes two trials to get a 5. Then continue to draw until we get a number different from 6, 2, 1, and 5. It takes 5 trials to get a 3. Lastly it takes 5 more trials to get the last number 4. What this shows is that the random variable X_6 is a sum of 6 geometric random variables.

    X_6=C_1+C_2+C_3+C_4+C_5+C_6

Each of the C_i is a geometric random variable. The first one C_1 is kind of special. The value of C_1 is always 1 (it takes one trial to get the first number). The second variable C_2 is a geometric random variable (the number of trials to get a number different from the first chosen number. Then C_3 is the number of trials to get a number different from the first two chosen numbers and so on. If there are n coupons, then X_n is the sum of n geometric random variables.

    X_n=C_1+C_2+C_3+\cdots +C_{n-1}+C_n

The formula for the expected value E[X_n] is derived through this connection with the geometric distribution. See the previous post for a more in-depth discussion.

___________________________________________________________________________
\copyright \ 2017 \text{ by Dan Ma}

Advertisements

3 thoughts on “The coupon collector problem

  1. Pingback: How long does it take to collect all coupons? | A Blog on Probability and Statistics

  2. Pingback: Celebrate the year of the rooster | All Math Considered

  3. Pingback: Five probability problems to help us think better | Math in the Spotlight

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s