Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Statistics using R We will use the data set described below repeatedly throughou

ID: 2923922 • Letter: S

Question

Statistics using R

We will use the data set described below repeatedly throughout the course. I recommend you save your work in an R script file each time you work with this data.

A data set describing the sale of individual residential property in Ames, Iowa from 2006 to 2010 was obtained by Dean De Cock, a statistics professor at Truman State University. The data set contains 2930 observations and a large number of explanatory variables involved in assessing home values. Source: http://www.amstat.org/publications/jse/v19n3/decock.pdf

This semester we will look at a sample of 200 homes from this data set. These homes are all located in the Sawyer neighborhood of the city. Observations include the following eight variables:

• lot_shape: Lot Shape

o Reg = Regular

o IRR = Irregular • lot_config: Lot configuration

o Inside = Inside lot

o Corner = Corner lot • Style

o Yes = Home has one story

o No = Home has more than one story

• roof_style: Type of Roof

o Gable = Gable

o Hip = Hip

• garage_area : Size of garage in square feet

• lot_area: Lot size in square feet

• living_area: Total home living area in square feet (including unfinished square footage)

• sale_price: Sale price in dollars

Access the data for this problem using the command

sawyer<-read.csv("http://www.math.usu.edu/cfairbourn/Stat2300/RStudioFiles/data/sawyer.csv")

Instructions Watch the video demonstrating how to calculate confidence intervals in RStudio. For each question below, include your R code and the output. NOTE: You must have the mosaic package active in your R session for the prop.test command to work as shown in the videos.

3. [1] Calculate a 95% confidence interval for the proportion of homes in Sawyer that have hip style roofs.

4. [1] Calculate a 99% confidence interval for the proportion of homes in Sawyer that are on inside lots.

5. [1] Calculate a 95% confidence interval for the mean garage area of homes in Sawyer.

6. [1] Calculate a 90% confidence interval for the mean sale price of homes in Sawyer.

Explanation / Answer

The complete R snippet is as follows

sawyer<-read.csv("http://www.math.usu.edu/cfairbourn/Stat2300/RStudioFiles/data/sawyer.csv")

#a)

roof = na.omit(sawyer$roof_style)
n = length(roof)
k = sum(roof == "Hip")

pbar = k/n; pbar

SE = sqrt(pbar(1pbar)/n); SE # standard error
E = qnorm(.975)SE; E

pbar + c(E, E)

#b)

lot = na.omit(sawyer$lot_config)
n = length(lot)
k = sum(lot == "Inside")

pbar = k/n; pbar

SE = sqrt(pbar(1pbar)/n); SE # standard error
E = qnorm(1-0.01/2)SE; E

pbar + c(E, E)

#c)

mean(sawyer$garage_area) + qnorm(1-0.05/2)*sd(sawyer$garage_area)/sqrt(length(sawyer$garage_area))
mean(sawyer$garage_area) - qnorm(1-0.05/2)*sd(sawyer$garage_area)/sqrt(length(sawyer$garage_area))

#d)

mean(sawyer$sale_price) + qnorm(1-0.1/2)*sd(sawyer$sale_price)/sqrt(length(sawyer$sale_price))
mean(sawyer$sale_price) - qnorm(1-0.1/2)*sd(sawyer$sale_price)/sqrt(length(sawyer$sale_price))

###################################3

The results are

> pbar + c(E, E)
[1] 0.7045497 0.8554503
> pbar + c(E, E)
[1] 0.7045497 0.8554503
> mean(sawyer$garage_area) + qnorm(1-0.05/2)*sd(sawyer$garage_area)/sqrt(length(sawyer$garage_area))
[1] 461.9328
> mean(sawyer$garage_area) - qnorm(1-0.05/2)*sd(sawyer$garage_area)/sqrt(length(sawyer$garage_area))
[1] 414.4972
> mean(sawyer$sale_price) + qnorm(1-0.1/2)*sd(sawyer$sale_price)/sqrt(length(sawyer$sale_price))
[1] 164883.1
> mean(sawyer$sale_price) - qnorm(1-0.1/2)*sd(sawyer$sale_price)/sqrt(length(sawyer$sale_price))
[1] 154374.1