I ran two linear regressions in R on a data set for the LA Dodgers in regards to
ID: 3149595 • Letter: I
Question
I ran two linear regressions in R on a data set for the LA Dodgers in regards to attendace and how it's affected by temperature, skies (cloudy, clear) and their bobblehead giveaways. Also, one on how the day of the week, month and bobblehead giveaway affects attendance.
I need some help on how to compare and contrast the resulting statistics. If any one is better than the other or anything else of signficance. Here are the following results:
temp, skies, bobblehead:
Call:
lm(formula = my.model, data = dodgers)
Residuals:
Min 1Q Median 3Q Max
-15142.7 -4561.8 209.9 3871.7 16545.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37103.65 7234.10 5.129 2.12e-06 ***
temp 35.09 96.34 0.364 0.717
skiesCloudy -2169.30 1879.36 -1.154 0.252
bobbleheadYES 13832.32 2207.69 6.266 1.97e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6794 on 77 degrees of freedom
Multiple R-squared: 0.3547, Adjusted R-squared: 0.3296
F-statistic: 14.11 on 3 and 77 DF, p-value: 2.042e-07
Analysis of Variance Table
Response: attend
Df Sum Sq Mean Sq F value Pr(>F)
temp 1 53929532 53929532 1.1683 0.2831
skies 1 87616870 87616870 1.8981 0.1723
bobblehead 1 1812086953 1812086953 39.2569 1.968e-08 ***
Residuals 77 3554299532 46159734
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residualPlots(my.model.fit)
Test stat Pr(>|t|)
temp -2.034 0.045
skies NA NA
bobblehead NA NA
Tukey test -0.758 0.449
> print(outlierTest(my.model.fit))
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
1 2.559671 0.012463 NA
day, month, bobblehead:
Call:
lm(formula = my.model, data = dodgers)
Residuals:
Min 1Q Median 3Q Max
-10786.5 -3628.1 -516.1 2230.2 14351.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33909.16 2521.81 13.446 < 2e-16 ***
ordered_monthMay -2385.62 2291.22 -1.041 0.30152
ordered_monthJune 7163.23 2732.72 2.621 0.01083 *
ordered_monthJuly 2849.83 2578.60 1.105 0.27303
ordered_monthAug 2377.92 2402.91 0.990 0.32593
ordered_monthSept 29.03 2521.25 0.012 0.99085
ordered_monthOct -662.67 4046.45 -0.164 0.87041
ordered_day_of_weekTue 7911.49 2702.21 2.928 0.00466 **
ordered_day_of_weekWed 2460.02 2514.03 0.979 0.33134
ordered_day_of_weekThur 775.36 3486.15 0.222 0.82467
ordered_day_of_weekFri 4883.82 2504.65 1.950 0.05537 .
ordered_day_of_weekSat 6372.06 2552.08 2.497 0.01500 *
ordered_day_of_weekSun 6724.00 2506.72 2.682 0.00920 **
bobbleheadYES 10714.90 2419.52 4.429 3.59e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6120 on 67 degrees of freedom
Multiple R-squared: 0.5444, Adjusted R-squared: 0.456
F-statistic: 6.158 on 13 and 67 DF, p-value: 2.083e-07
Analysis of Variance Table
Response: attend
Df Sum Sq Mean Sq F value Pr(>F)
ordered_month 6 948958117 158159686 4.2225 0.001158 **
ordered_day_of_week 6 1314813030 219135505 5.8504 6.002e-05 ***
bobblehead 1 734587177 734587177 19.6118 3.590e-05 ***
Residuals 67 2509574563 37456337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residualPlots(my.model.fit)
Test stat Pr(>|t|)
ordered_month NA NA
ordered_day_of_week NA NA
bobblehead NA NA
Tukey test -1.123 0.261
> print(outlierTest(my.model.fit))
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
43 2.668406 0.0095805 0.77602
Explanation / Answer
There are two outputs of regression.
1) temp, skies, bobblehead:
The regression equation is,
Y = 37103.65 + 35.09*temp - 2169.30*skiesCloudy + 13832.32*bobbleheadYES
Here we are test,
H0 : B1 = B2 = B3 Vs H1 : Atleast one of the B is differ than 0.
Assume alpha = level of significance = 5% = 0.05
From the output,
F-statistic: 14.11 on 3 and 77 DF, p-value: 2.042e-07
P-value < alpha
Reject H0 at 5% level of significance.
Conclusion : Atleast one of the B is differ than 0.
Multiple R-squared: 0.3547
Interpretation : R-square expresses the proportion of the variation in y which is explained by variation in x.
2) day, month, bobblehead:
In the second output :
Test statistic F = 6.158 on 13 and 67 DF, p-value: 2.083e-07
P-value < alpha
Reject H0 at 5% level of significance.
Conclusion : Atleast one of the B is differ than 0.
Multiple R-squared: 0.5444
It expresses the proportion of the variation in y which is explained by variation in x.