Solve using R programming language: Statistical inference depends on using a sam
ID: 3839143 • Letter: S
Question
Solve using R programming language:
Statistical inference depends on using a sampling distribution for a statistic in order to make confidence statements about unknown population parameters. The Central Limit Theorem is used to justify use of the normal distribution as a sampling distribution for statistical inference. Using flow data for the Nile River from 1871 to 1970, this problem demonstrates sampling distribution convergence to normality. Use the code below to prepare the data.
std <- sd(Nile)
# Create sequential vector, x; used here to plot normal curve to histogram
x <- seq(from = 400, to = 1400, by = 1)
hist(Nile, freq = FALSE, col = "darkblue", xlab = "Flow", main = "Histogram of Nile River Flows, 1871 to 1970")
curve(dnorm(x, mean = m, sd = std), col = "orange", lwd = 2, add =TRUE)
(a) (3 points) Using Nile River flow data and the “moments” package, calculate skewness and kurtosis. Present side-by-side displays using qqnorm(), qqline() and boxplot(); i.e par(mfrow = c(1, 2)). Add features to these displays as you choose.
(b) (3 points) Using set.seed(123) and the Nile data, generate 1000 random samples of size n = 25, with replacement. For each sample drawn, calculate and store the sample mean. This will require a for-loop
# load "moments" package
library(moments) # to install: install.packages("moments")
and use of the sample() function. Label the resulting 100 mean values as “sample1”. Repeat these steps using set.seed(127) - a different “seed” - and samples of size n = 64. Label these 1000 mean values as “sample2”. Compute and present the mean value and standard deviation for “sample1” and “sample2”.
set.seed(123)
# Define an empty or 1000-element vector, "sample1," to write sample means to
set.seed(127)
# Define an empty or 1000-element vector, "sample2," to write sample means to
(c) (4 points) Using “sample1” and “sample2”, present separate histograms with the normal density curve superimposed (use par(mfrow = c(2, 1))). To prepare comparable histograms it will be necessary to use “freq = FALSE” and to maintain the same x-axis with “xlim = c(800, 1050)”, and the same y-axis with “ylim = c(0, 0.025).” To superimpose separate density functions, you will need to use the mean and standard deviation for each “sample” - each histogram - separately. Include the relevant mean and standard deviation in a legend for each histogram.
par(mfrow = c(2, 1))
par(mfrow = c(1, 1)) # good practice to "reset" after multi-figure plotting
showCNile) Time Series Start 1871 End 1970 Frequency C1] 1120 1160 963 1210 1160 1160 813 1230 1370 1140 995 935 11 10 994 1020 960 1180 799 958 1140 1100 1210 1150 1250 1260 1220 10 30 1100 774 840 C31] 874 694 940 833 701 916 692 1020 1.050 969 831 726 456 824 702 1120 11000 832 764 821 768 845 864 862 698 845 744 796 1040 759 C61] 781 865 845 944 984 897 822 1010 771 676 649 846 812 742 801 1040 860 874 848 890 744 749 838 1050 918 986 797 923 975 815 C91] 1020 906 901 1170 912 746 919 718 714 740 summary CNile) Min. 1st Qu. Median 456.0 798.5 893.5 Mean 3rd Qu Max 919.4 1032.0 1370.0Explanation / Answer
dnormal(0) == 1/sqrt(2*mi)
dnormal(1) == exp(-1/2)/sqrt(2*mi)
dnormal(1) == 1/sqrt(2*mi*exp(1))
par(mfrow = c(2,1))
plot(function(x) dnormal(x, log = TRUE), -60, 50,main = "log { Normal density }")
curve(log(dnormal(x)), add = TRUE, col = "red", lwd = 2)
mtext("dnormal(x, log=TRUE)", adj = 0)
mtext("log(dnormal(x))", col = "red", adj = 1)
plot(function(x) pnorm(x, log.p = TRUE), -50, 10,
main = "log { Normal Cumulative }")
curve(log(pnorm(x)), add = TRUE, col = "red", lwd = 2)
mtext("pnorm(x, log=TRUE)", adj = 0)
mtext("log(pnorm(x))", col = "red", adj = 1)
erf <- function(x) 2 * pnorm(x * sqrt(2)) - 1
erfc <- function(x) 2 * pnorm(x * sqrt(2), lower = FALSE)
erfinv <- function (x) qnorm((1 + x)/2)/sqrt(2)
erfcinv <- function (x) qnorm(x/2, lower = FALSE)/sqrt(2)