Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Please include R code in the answers! Thank you. data for hw_3_dat1.txt: data fo

ID: 3043124 • Letter: P

Question

Please include R code in the answers! Thank you.

data for hw_3_dat1.txt:

data for hw_3_dat2.txt:

For each of the data sets a 3 dat! .txt and b) hw 3 dat2. txt, find the "best" (OLS) fit, and report R-squared and the standard deviation of the errors. Do not use some ad hoc criterion (like maximum R2) to determine what is the "best" model. Instead, use your knowledge of regression to find the best model, and explain in words why you think you have the best model. Specifically, make sure you address 1) collinearity, 2) interaction, and 3) nonlinearity.

Explanation / Answer

############ hw_3_dat1.txt

## Command

data1=read.csv("hw_3_dat1.csv", header = TRUE)
head(data1)
model1=lm(y~x1+x2, data=data1)
summary(model1)

## Output

> data1=read.csv("hw_3_dat1.csv", header = TRUE)
> head(data1)
x1 x2 y
1 -0.4127341 -0.5164471 2.1740530
2 -0.0629520 0.3353391 3.9773150
3 -1.4134620 0.1740243 -4.5222660
4 1.1906670 1.1755170 15.7272400
5 0.9095557 -0.4208167 -0.6010847
6 0.4607902 -1.6777420 -11.0754600
> model1=lm(y~x1+x2, data=data1)
> summary(model1)

Call:
lm(formula = y ~ x1 + x2, data = data1)

Residuals:
Min 1Q Median 3Q Max
-27.574 -2.896 -0.216 2.827 49.911

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 1.3649 0.3110 4.389 1.39e-05 ***
x1 2.3026 0.3080 7.475 3.51e-13 ***
x2 2.7800 0.2956 9.404 < 2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.946 on 497 degrees of freedom
Multiple R-squared: 0.2351, Adjusted R-squared: 0.232
F-statistic: 76.37 on 2 and 497 DF, p-value: < 2.2e-16

###########

## Command


data2=read.csv("hw_3_dat2.csv", header = TRUE)
head(data2)
model2=lm(y~x1+x2, data=data2)
summary(model2)

## Output

> data2=read.csv("hw_3_dat2.csv", header = TRUE)
> head(data2)
x1 x2 y
1 -0.5933061 -0.62787710 2.4348270
2 0.1126115 0.24537520 4.3510050
3 -1.0790510 -0.54988900 -0.9891745
4 1.5574130 1.55236200 23.6993900
5 0.5428932 0.09943572 2.8467830
6 -0.4432717 -1.15611600 -3.6053480
> model2=lm(y~x1+x2, data=data2)
> summary(model2)

Call:
lm(formula = y ~ x1 + x2, data = data2)

Residuals:
Min 1Q Median 3Q Max
-11.465 -5.256 -2.880 2.187 78.367

Coefficients:
Estimate Std. Error t value Pr(>|t|)   
(Intercept) 6.4788 0.3990 16.238 < 2e-16 ***
x1 2.2861 0.8764 2.608 0.00937 **
x2 3.1395 0.8602 3.650 0.00029 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.911 on 497 degrees of freedom
Multiple R-squared: 0.2667, Adjusted R-squared: 0.2637
F-statistic: 90.38 on 2 and 497 DF, p-value: < 2.2e-16

## To test the correlation

> cor(data1$x1, data1$x2)
[1] 0.05645841
> cor(data2$x1, data2$x2)
[1] 0.8914847

From the above two summaries, we know that the hw_3_dat1.txt has R-square 0.2351 and standard deviation of error 6.949. Whereas the hw_3_dat2.txt has R-square 0.2667 and standard deviation of error 8.911. But the correlation of x1 and x2 in data sets hw_3_dat1.txt and hw_3_dat2.txt are 0.05645841 and 0.8914847 respectively. hence, the regression on data set hw_3_dat1.txt is better than hw_3_dat2.txt.