Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Matlab The histogram of N random realizations of a random variable is essentiall

ID: 3883767 • Letter: M

Question

Matlab

The histogram of N random realizations of a random variable is essentially an approximation of the probability distribution for that random variable, except for some scaling factor. Try to figure out what that scaling factor is, for the specific case of the normal distribution Generate random samples by using z-randn(N,1);, and then create a histogram with nbins bins (without actually plotting it) by using [n,zbins]-hist(z.nbins);. Now try plotting that histogram (with different scaling factors for n) in the same plot as the exact probability distribution function, using something like Z =-5:0.015; Zpdf- normpdf(Z); plot(Z,Zpdf,'k','linewidth',2); Can you figure out what the correct scaling factor is (i.e., what factor makes the histogram approximate the probability distribution? Hint: try using different values of N and nbins

Explanation / Answer

Let X1…XNX1…XN be i.i.d. random variables from some distribution, for which we have a cumulative distribution function (CDF); P(Xia):R[0,1]P(Xia):R[0,1]. For some semi-closed interval [a,b)[a,b), let X={Xi:Xi[a,b)}X={Xi:Xi[a,b)} and further, let M=|X|NM=|X|N be the number of draws which happen to fall in our interval [a,b)[a,b). Note that MM is a random variable.

We can see intuitively that

limNMNpP(X[a,b)),limNMNpP(X[a,b)),

that is, if we have infinite samples the fraction falling in our interval is exactly equal to the probability mass between aa and bb. Further one can show that even for finite NN we have

E[MN]=P(X[a,b))E[MN]=P(X[a,b))

To show these facts we wish to know what the distribution of the random variable MM is. In this case because MM is discrete, it has a probability mass function (not a density) P(M=m)P(M=m).

For any one draw XiXi, the probability of being in our interval is just P(X[a,b)):=pP(X[a,b)):=p, and the probability of being not in our interval is just (1p)(1p), i.e. a Bernoulli RV. Thus for NN draws the probability of having mm in our interval is given by

P(M=m)=(Nm)pm(1p)NmP(M=m)=(Nm)pm(1p)Nm

which should be recognizable as the Binomial distribution. The binomial distribution has mean (expectation) of NpNp and thus we get the second equation from above directly;

E[MN]=1NE[M]=NpN=pE[MN]=1NE[M]=NpN=p

This tells us that the expected value of M/NM/N is indeed the mass associated with our interval. To show the limit above holds we just need to show that as NN the variance becomes zero. This also follows easily from the fact MM is binomially distributed;

Var(MN)=1N2Var(M)=Np(1p)N2=p(1p)N