Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

I\'ve atttempted to work through the problems and will continue to do so, but I

ID: 356021 • Letter: I

Question

I've atttempted to work through the problems and will continue to do so, but I wanted to see if you had any direction. Thanks!

On the last page of this file you will find the Excel output for a multiple linear regression model. The model was built in an attempt to better understand why students at area high schools perform differently on the state high school mathematics exam. The average test score for a class of students is what we are trying to predict. In our attempt to understand why these exam scores differ, we use 3 independent variables: a rating (0-100) for the quality of the math degree obtained by the instructor, the age of the instructor, and the salary (in thousands) of the instructor. You are to address the following questions based on the output. Worth 25 points total.

Estimate the average math score for a class of students whose instructor is 52 years old, earns $48,000, and got her degree in a math program rated 72.

Y=35.68+b1(x)+b2(x2)+B2(X3)

Y=35.67+ 72(.25)+ 52(.24)+48,000(.13)

Y=6306.15 (Income swayed this too much???)

What percentage of the variations in math scores can be explained by this model?

~35.70% (R Squared %)

Conduct a test to determine if the model, taken as a whole, provided us with any significant explanation of the differences in math scores. That is, should the model be retained for further analysis?

Which of the independent variables appear to be significant to the model? Which appear to be insignificant? What leads you to these conclusions?

The Math Degree is the only significant variable within this scenario as the “T Stat” is greater than 2. The other T Stats are so small in value that they are insignificant to the dependent variable.

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.597512233

R Square

0.357020869

Adjusted R Square

0.303439274

Standard Error

7.724526046

Observations

40

ANOVA

df

SS

MS

F

Significance F

Regression

3

1192.732105

397.5774

6.663125

0.001076925

Residual

36

2148.058895

59.6683

Total

39

3340.791

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Intercept

35.67761801

7.278849159

4.901547

2.03E-05

20.9154278

Math Degree

0.247481581

0.069845662

3.543263

0.001115

0.105828014

Age

0.244830604

0.185213036

1.321886

0.194545

-0.130798841

Income

0.133296712

0.152818937

0.872253

0.388851

-0.176634456

SUMMARY OUTPUT

Regression Statistics

Multiple R

0.597512233

R Square

0.357020869

Adjusted R Square

0.303439274

Standard Error

7.724526046

Observations

40

ANOVA

df

SS

MS

F

Significance F

Regression

3

1192.732105

397.5774

6.663125

0.001076925

Residual

36

2148.058895

59.6683

Total

39

3340.791

Coefficients

Standard Error

t Stat

P-value

Lower 95%

Intercept

35.67761801

7.278849159

4.901547

2.03E-05

20.9154278

Math Degree

0.247481581

0.069845662

3.543263

0.001115

0.105828014

Age

0.244830604

0.185213036

1.321886

0.194545

-0.130798841

Income

0.133296712

0.152818937

0.872253

0.388851

-0.176634456

Explanation / Answer

(1)

Y = 35.68 + 0.247*Math Degree + 0.245*Age + 0.133*Income

Given, Math Degree = 72; Age = 52; Income = 48 (the model is formed using '000)

Y = 35.68 + 0.247*72 + 0.245*52 + 0.133*48 = 72.59

(2)

Yes, 35.7% i.e. R-squared value

(3)

Note the 'Significance F' statistic of the ANOVA. It is coming as 0.00107 which is less than 0.05. This means that the Mean Squared of the Model (MSM) is much less with respect to the Mean Square Error (MSE) because the F is calculated as MSM/MSE. So, the model 'as a whole' can be used for further analysis with 95% confidence.

(4)

n = no. of samples = 40
k = regression df = 3

Critical value = t0.05/2,n - (k+1) = t0.025,36 = 2.03

The 't Stat' values are more than this critical values only in two instances i.e. for the 'Intercept' and for the coefficient of 'Math Degree'. So, we reject the null hypothesis for these two variables that they ar equal to zero. Hence the values for these two variables become statistically significant at 95% confidence level.

Other easier way to determine this is to look at the P-values. When p-values are less than 0.05, we can say that the variable is statistically significant at 95% confidence level.