I ran two linear regressions in R on a data set for the LA Dodgers in regards to
ID: 3149597 • Letter: I
Question
I ran two linear regressions in R on a data set for the LA Dodgers in regards to attendace and how it's affected by temperature, skies (cloudy, clear) and their bobblehead giveaways. Also, one on how the day of the week, month and bobblehead giveaway affects attendance.
I need some help on how to compare and contrast the resulting statistics. If any one is better than the other or anything else of signficance. Here are the following results:
temp, skies, bobblehead:
Call:
lm(formula = my.model, data = dodgers)
Residuals:
Min 1Q Median 3Q Max
-15142.7 -4561.8 209.9 3871.7 16545.3
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 37103.65 7234.10 5.129 2.12e-06 ***
temp 35.09 96.34 0.364 0.717
skiesCloudy -2169.30 1879.36 -1.154 0.252
bobbleheadYES 13832.32 2207.69 6.266 1.97e-08 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6794 on 77 degrees of freedom
Multiple R-squared: 0.3547, Adjusted R-squared: 0.3296
F-statistic: 14.11 on 3 and 77 DF, p-value: 2.042e-07
Analysis of Variance Table
Response: attend
Df Sum Sq Mean Sq F value Pr(>F)
temp 1 53929532 53929532 1.1683 0.2831
skies 1 87616870 87616870 1.8981 0.1723
bobblehead 1 1812086953 1812086953 39.2569 1.968e-08 ***
Residuals 77 3554299532 46159734
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residualPlots(my.model.fit)
Test stat Pr(>|t|)
temp -2.034 0.045
skies NA NA
bobblehead NA NA
Tukey test -0.758 0.449
> print(outlierTest(my.model.fit))
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
1 2.559671 0.012463 NA
day, month, bobblehead:
Call:
lm(formula = my.model, data = dodgers)
Residuals:
Min 1Q Median 3Q Max
-10786.5 -3628.1 -516.1 2230.2 14351.0
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 33909.16 2521.81 13.446 < 2e-16 ***
ordered_monthMay -2385.62 2291.22 -1.041 0.30152
ordered_monthJune 7163.23 2732.72 2.621 0.01083 *
ordered_monthJuly 2849.83 2578.60 1.105 0.27303
ordered_monthAug 2377.92 2402.91 0.990 0.32593
ordered_monthSept 29.03 2521.25 0.012 0.99085
ordered_monthOct -662.67 4046.45 -0.164 0.87041
ordered_day_of_weekTue 7911.49 2702.21 2.928 0.00466 **
ordered_day_of_weekWed 2460.02 2514.03 0.979 0.33134
ordered_day_of_weekThur 775.36 3486.15 0.222 0.82467
ordered_day_of_weekFri 4883.82 2504.65 1.950 0.05537 .
ordered_day_of_weekSat 6372.06 2552.08 2.497 0.01500 *
ordered_day_of_weekSun 6724.00 2506.72 2.682 0.00920 **
bobbleheadYES 10714.90 2419.52 4.429 3.59e-05 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6120 on 67 degrees of freedom
Multiple R-squared: 0.5444, Adjusted R-squared: 0.456
F-statistic: 6.158 on 13 and 67 DF, p-value: 2.083e-07
Analysis of Variance Table
Response: attend
Df Sum Sq Mean Sq F value Pr(>F)
ordered_month 6 948958117 158159686 4.2225 0.001158 **
ordered_day_of_week 6 1314813030 219135505 5.8504 6.002e-05 ***
bobblehead 1 734587177 734587177 19.6118 3.590e-05 ***
Residuals 67 2509574563 37456337
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
residualPlots(my.model.fit)
Test stat Pr(>|t|)
ordered_month NA NA
ordered_day_of_week NA NA
bobblehead NA NA
Tukey test -1.123 0.261
> print(outlierTest(my.model.fit))
No Studentized residuals with Bonferonni p < 0.05
Largest |rstudent|:
rstudent unadjusted p-value Bonferonni p
43 2.668406 0.0095805 0.77602
Explanation / Answer
temp, skies, bobblehead:
in this regression analysis p-value of F-statistic of the model is less than alpha= 0.05 so model is significanlty explaining the variation in dependent variable due to indpendent variable.
But the individual t-statisic of regression coefficient of temp and sky are insigniciant as corresponding p-value is more than alpha =0.05 and t-statisic of regression coefficient of bobblehead is significant as its p-value is less than alpha 0.05.
we can say that only bobblehead is significantt and other two independent variables temp and sky may be drop from the model and reanlyze the data.
day, month, bobblehead:
in this regression analysis p-value of F-statistic of the model is less than alpha= 0.05 so model is significanlty explaining the variation in dependent variable due to indpendent variable.
But ordered_monthJune ,ordered_day_of_weekTue,ordered_day_of_weekSat ,ordered_day_of_weekSun and
bobbleheadYES are statisticaly significant at 5% level of significance. Remaining variables are not significant may be droped from the model and reanlyze the data.
R-square and Adjusted-R-squreare of first model is less than the second model also Residual standard errro of first model is more , so second model is better than second model if we want to compare the first and second model.