Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Coding Required. In class, we have seen that the confidence interval (CI) we der

ID: 3274144 • Letter: C

Question

Coding Required. In class, we have seen that the confidence interval (CI) we derived to estimate the mean z-Ely of a distribution Y, r distributed and is known. For other distributions, we can only ensure asymptotic validity (ie, as r 00) via the CLT. Moreover, the unknown variance " is estimated via the zero-bias estimator r , holds exactly when Y is normally a a thCLT Moreover, the unknown variance o is ett S2 = F >i : 1 Yi - Zr)'. In this exercise, we will use simulation experiments to quantify the true coverage due to these approximations We will assume the following distributions for the output Y: i. Standard Normal YN(0,1) ii. Exponential Y ~ exp(1), that is with rate = 1. iii. Lognormal Y ~ LN(b = 1, a2 = 5), where Y is such that Y ~ eb-aN where N ~ N(0,1). In other words, log. Y~ N(b, a2), and hence the name. (Note: b and a2 are not the mean and variance of Y, you need to calculate these.)

Explanation / Answer

1) Y~N(0,1)

For the normal distribution we can see that the coverage is more or less consistent for both the cases, known variance and estimated variance. The increment of sample size seems to have less effect on the coverage as the coverage is satisfactory for small sample sizes also.

iii) Y~exp(1)

For this case of exponential distribution sample size impact can be verified clearly from the estimated variance case. The increment of sample size shows increase in coverage.

iii) Y~lm(1,5)

Here also, in case of lognormal distribution, the sample size increment increases the coverage are, that can be directly infer from the estimated variance case. Known variance case has a far better coverage than estimated variance case.

For all of three cases, we see that, for known variance case the coverage is better than the estimated variance case.

R code is given below:

#---Normal Distribution-----#

R=1000
mean=0;sig=1 ## mean and sd of given dist.
r=seq(5,50,by=5)
z_cov_1=NULL;z_cov_2=NULL
del_cov_1=NULL;del_cov_2=NULL
for(j in 1:length(r))
{
Y_1=NULL; Y_2=NULL
for(i in 1:R)
{
y=rnorm(r[j]) ## sampling from normal(0,1)
Z=mean(y)
s=sd(y)
a=.05
del1=sig*qnorm(1-a/2)/sqrt(r[j]) ### for known case
UCI1=Z+del1
LCI1=Z-del1
if(mean>=LCI1 & mean<=UCI1)
{
Y_1[i]=1
}
else{Y_1[i]=0}

del2=s*qnorm(1-a/2)/sqrt(r[j]) #### for estimated case
UCI2=Z+del2
LCI2=Z-del2
if(mean>=LCI2 & mean<=UCI2)
{
Y_2[i]=1
}
else{Y_2[i]=0}
}
z_cov_1[j]=sum(Y_1)/length(Y_1) ### est. coverage=sample proportion of 1 known case
del_cov_1[j]=sd(Y_1)*qnorm(1-a/2)/sqrt(length(Y_1))
z_cov_2[j]=sum(Y_2)/length(Y_2) ### est. coverage=sample proportion of 1 est. var case
del_cov_2[j]=sd(Y_2)*qnorm(1-a/2)/sqrt(length(Y_2))
}
result=cbind(r,z_cov_1,del_cov_1,z_cov_2,del_cov_2)
write.csv(result,"Normal.csv")
#------------------
#---Exponential Distribution-----#

R=1000
mean=1;sig=1
r=seq(5,50,by=5)
z_cov_1=NULL;z_cov_2=NULL
del_cov_1=NULL;del_cov_2=NULL
for(j in 1:length(r))
{
Y_1=NULL; Y_2=NULL
for(i in 1:R)
{
y=rexp(r[j]) ##--sampling from exp(1)
Z=mean(y)
s=sd(y)
a=.05
del1=sig*qnorm(1-a/2)/sqrt(r[j])
UCI1=Z+del1
LCI1=Z-del1
if(mean>=LCI1 & mean<=UCI1)
{
Y_1[i]=1
}
else{Y_1[i]=0}

del2=s*qnorm(1-a/2)/sqrt(r[j])
UCI2=Z+del2
LCI2=Z-del2
if(mean>=LCI2 & mean<=UCI2)
{
Y_2[i]=1
}
else{Y_2[i]=0}
}
z_cov_1[j]=sum(Y_1)/length(Y_1)
del_cov_1[j]=sd(Y_1)*qnorm(1-a/2)/sqrt(length(Y_1))
z_cov_2[j]=sum(Y_2)/length(Y_2)
del_cov_2[j]=sd(Y_2)*qnorm(1-a/2)/sqrt(length(Y_2))
}
result=cbind(r,z_cov_1,del_cov_1,z_cov_2,del_cov_2)
colnames(result)=c("r", "z_known_var", "del_known_var","z_est_var","del_est_var")
write.csv(result,"exp.csv")

#---------------Log-Normal------

R=1000
mean=exp(1+5/2);sig=sqrt((exp(5)-1)*exp(2*1+5))
###-- mean of lognormal(a,b^2) is exp(a+b^2/2) & variance is (exp(b^2)-1)*exp(2*a+b^2)
r=seq(5,50,by=5)
z_cov_1=NULL;z_cov_2=NULL
del_cov_1=NULL;del_cov_2=NULL
for(j in 1:length(r))
{
Y_1=NULL; Y_2=NULL
for(i in 1:R)
{
y=rlnorm(r[j],1,sqrt(5)) ##sampling from lognormal(1,5)
Z=mean(y)
s=sd(y)
a=.05
del1=sig*qnorm(1-a/2)/sqrt(r[j])
UCI1=Z+del1
LCI1=Z-del1
if(mean>=LCI1 & mean<=UCI1)
{
Y_1[i]=1
}
else{Y_1[i]=0}

del2=s*qnorm(1-a/2)/sqrt(r[j])
UCI2=Z+del2
LCI2=Z-del2
if(mean>=LCI2 & mean<=UCI2)
{
Y_2[i]=1
}
else{Y_2[i]=0}
}
z_cov_1[j]=sum(Y_1)/length(Y_1)
del_cov_1[j]=sd(Y_1)*qnorm(1-a/2)/sqrt(length(Y_1))
z_cov_2[j]=sum(Y_2)/length(Y_2)
del_cov_2[j]=sd(Y_2)*qnorm(1-a/2)/sqrt(length(Y_2))
}
result=cbind(r,z_cov_1,del_cov_1,z_cov_2,del_cov_2)
colnames(result)=c("r", "z_known_var", "del_known_var","z_est_var","del_est_var")
write.csv(result,"lm.csv")

Known Variance Estimated Varinace r Zcov,R cov,R Zcov,R cov,R 5 0.946 0.014015 0.876 0.020438 10 0.951 0.013386 0.931 0.015717 15 0.939 0.014841 0.916 0.017201 20 0.956 0.012718 0.946 0.014015 25 0.944 0.014258 0.936 0.015177 30 0.956 0.012718 0.94 0.014727 35 0.944 0.014258 0.937 0.015066 40 0.948 0.013768 0.935 0.015287 45 0.964 0.011552 0.947 0.013892 50 0.94 0.014727 0.934 0.015396