Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Please answer number 4 using R. This is the 2nd R project. This problem is about

ID: 3052747 • Letter: P

Question

Please answer number 4 using R.

This is the 2nd R project. This problem is about sampling distribution by using R and calculation of the so called p value.

For the Bernoulli distribution X ? Ber( p ) = Bin(1 ,p ), figure out the follow- ing problems:

1. Sample 50 points from X ? Ber( p 0 = 0 . 2).

2. Calculate the sample proportion ˆ p ? of “1”s shown in the above sample you obtained.

3. Do you think it’s reasonably good enough? Explain why.

4. Pretend that you forgot the probability of success p 0 you used to generate the above sample of size 50. Your guess now is 0.4. And you want to figure out whether it is 0.4. Here are two ways for your choice, and which one do you think is more reasonable?

(a) Compare the sample proportion you got in above part 2 with your guess 0.4. If they are reasonably close, you probably will adopt your guess 0.4. Think about how could you judge the closeness.

(b) Here is the other procedure. The logic behind this is that if p 0 = 0 . 4 , what will happen? If something strange happened, then you should doubt your guess; if everything happend is reasonable under p 0 = 0 . 4 , then it seems no reason for you to reject your guess. Here is the implementation. Use p 0 = 0 . 4 to generate N = 10000 samples from X ? Ber( p 0 = 0 . 4) with sample size n = 50. And calculate the sample proportion for each of these N = 10000 samples, denoted by { ˆ p k 50 ,k = 1 , 2 ··· ,N = 10000 } . And then plot the histogram of { ˆ p k 50 ,k = 1 , 2 ··· ,N = 10000 } to see the distribution of the random variable sample proportion ˆ p 50 under the 1 2 assumption that the true p 0 = 0 . 4. And if the observed ˆ p ? is not in the extreme region of the distribution, you probably will adopt your guess 0.4. In particular, calculate the probability that ˆ p 50 < ˆ p ? through the frequency of { ˆ p k 50 < ˆ p ? ,k = 1 , 2 ··· ,N = 10000 } . This probability is actually related with the important concept in Statistics, p value !

5. For the above part (b), we are actually using simulation to approximate the probability of ˆ p 50 = X 1 + X 2 + ··· + X 50 50 < ˆ p ? given that { X i ,i = 1 , 2 , ··· , 50 } are independent and identically distributed as X ? Ber( p 0 = 0 . 4). Could you figure the probability out exactly without any approximation? What is the exact probability?

Explanation / Answer

#a
obs=rbinom(50,size=1,prob=0.2)
(prop1=sum(obs)/50) #sum gives numberof 1's since otherwise it's 0

#Now prop1 must be "close" to 0.4 if p0 =0.4, since by SLLN prop1 must be close to p0
# You must have an interval around 0.4. If prop1 lies in this interval we accept our belief of p0=0.4

#b

N=10000

prop=numeric(N)

for(i in 1:N) # no of samples
{
spl=rbinom(50,size=1,prob=0.4) #sample drawn
prop[i]=sum(spl)/50
}
hist(prop)

v=numeric(N)
for(i in 1:N) # no of samples
{
v[i]=ifelse(prop[i] < prop1, 1, 0) #count of samples where prop measured is less than prop1
}
(pvalue=sum(v)/N) #p-value