Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Suppose we want to create a random sample of the set {1, 2, 3, ..., n }, that is

ID: 3624146 • Letter: S

Question

Suppose we want to create a random sample of the set {1, 2, 3, ..., n}, that is, an m-element subset S, where 0 m n, such that each m-subset is equally likely to be created. One way would be to set A[i]=i for i=1, 2, 3, ..., n, call RANDOMIZED-IN-PLACE(A), and then take just the first m array elements. This method would make n calls to the RANDOM procedure. If n is much larger than m, we can create a random sample with fewer calls to RANDOM. Show that the following recursive procedure returns a random m-subset S of {1, 2, 3, ..., n}, in which each m-subset is equally likely, while making only m calls to RANDOM:

RANDOM-SAMPLE (m, n)

1 if m == 0

2 return

3 else S = RANDOM-SAMPLE (m-1, n-1)

4 i = RANDOM (1, n)

8 return S

[REF.]

RANDOMIZED-IN-PLACE (A)

1 n = A.length

2 for i = 1 to n

3 swap A[i] with A[ RANDOM (i, n) ]

Explanation / Answer

Proof by induction on m. We'll prove the following statement holds for every m:
For every n >= m, RANDOM-SAMPLE(m,n) produces a random subset of {1,...,n} of size m.

Base case:
If m=0 then there is only one possible subset of size m, namely the empty set, which the algorithm returns.

Inductive case:

Suppose for every n >= m-1, RANDOM-SAMPLE(m-1,n) produces a random subset of {1,...,n} of size m.

Let p >= m be given.
Let T be a random subset of {1,...,p}.
We want to show that P(T) = 1/(p choose m)
(where p choose m = p! / (m! (p-m)!) is the number of possible subsets of {1,...,p} of size m)
(If this is true for every T then it means every T is equally likely.)
Let i be the number that was chosen in the top-level call to RANDOM-SAMPLE.
Let S = T - {i}.
By the inductive hypothesis,
P(S) = 1 / ((p-1)!/[(m-1)![(p-1)-(m-1)]!] = (m-1)!(p-m)! / (p-1)!

Case 1: T contains p
This means that either i was in S, or i = p.
P(T) = P(i being in S) P(S) + P(i = p) P(S)
       = P(S)(P(i in S) + P(i = p))
       = P(S)((m-1)/p) + 1/p)
       = P(S)(m/p)
       = [(m-1)!(p-m)! / (p-1)!] (m/p)
       = [m(m-1)!(p-m)!] / (p(p-1)!)
       = (m!(p-m)!] / (p!)
       = 1 / (p choose m)

Case 2: T does not contain p
Then i could have been any element of T.
P(T) = sum over i in T of P(i)P(S)
       = sum over i in T of (1/p)((m-1)!(p-m)! / (p-1)!)
       = m[(1/p)((m-1)!(p-m)!/(p-1)!)]
       = (m(m-1)!(p-m)!) / (p (p-1)!)
       = (m!(p-m)!) / p!
       = 1 / (p choose m)

This proves the inductive case.