Instructions By using the forward stepwise method, develop a multiple regression
ID: 3251647 • Letter: I
Question
Instructions
By using the forward stepwise method, develop a multiple regression model to predict the birthweight.
Step 1: Gestation only
Step 2: Gestation and Smoke
Step 3: Gestation, Smoke and Pre-pregnancy Weight
Step 4: Gestation, Smoke, Pre-pregnancy Weight and Height
Step 5: Gestation, Smoke, Pre-pregnancy Weight, Height and Status
Step 6: Gestation, Smoke, Pre-pregnancy Weight, Height, Status and Age
a.) Interpret the regression coefficients of all six (6) independent variables in the model obtained in Step 6, and comment on the statistical significance of each.
b.) Use Excel to obtain the correlation matrix for the following variables: Gestation, Pre-pregnancy Weight, Height, Age and Birthweight. Do you think multi-collinearity is a problem in the regression model? Are the correlation coefficients consistent with the regression coefficients obtained in the model in Step 6? Discuss briefly.
c.) Focusing on Steps 3 and 4, discuss fully how the introduction of Height in Step 4 affects the regression coefficient of Pre-pregnancy Weight.
SUMMARY OUTPUT - STEP 4
d.) Based on the results in (a) to (c), explain which independent variables should be included or excluded to formulate the final model. State the final model.
e.) Comment on the overall adequacy of the final model.
SUMMARY OUTPUT Regression Statistics Multiple R 0.391776439 R Square 0.153488779 Adjusted R Square 0.148373907 Standard Error 500.7747979 Observations 1000 ANOVA df SS MS F Significance F Regression 6 45152113.9 7525352.317 30.00833563 3.50635E-33 Residual 993 249019970.4 250775.3982 Total 999 294172084.3 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -1483.025412 562.6691435 -2.635697069 0.008527373 -2587.182494 -378.8683302 -2587.182494 -378.8683302 Gestation 9.670187253 1.365749902 7.08049639 2.71499E-12 6.990099951 12.35027456 6.990099951 12.35027456 Smoke -232.2779335 32.95588738 -7.04814684 3.38914E-12 -296.9491117 -167.6067553 -296.9491117 -167.6067553 Pre-pregnancy weight 1.889045785 1.872550217 1.008809146 0.313311925 -1.785564076 5.563655646 -1.785564076 5.563655646 Height 14.1439232 2.754137681 5.135517841 3.38544E-07 8.739325027 19.54852137 8.739325027 19.54852137 Status -180.2979259 73.96014857 -2.437771278 0.014952672 -325.4340557 -35.16179616 -325.4340557 -35.16179616 Age 1.120907347 2.782524229 0.402838306 0.687153889 -4.339395333 6.581210028 -4.339395333 6.581210028Explanation / Answer
b.) Use Excel to obtain the correlation matrix for the following variables: Gestation, Pre-pregnancy Weight, Height, Age and Birthweight. Do you think multi-collinearity is a problem in the regression model? Are the correlation coefficients consistent with the regression coefficients obtained in the model in Step 6? Discuss briefly.
there is no data in thequestion to provide the correlation matrix. However , to check for multicollinearity , we can also check for the signifcant F and the p values of the model. If the model is overall significant but the independent variables are not signifcant , then the model is said to have multicollinearity. Here , we dont see that happening , as the independent variables and the overall model are both significant.
c.) Focusing on Steps 3 and 4, discuss fully how the introduction of Height in Step 4 affects the regression coefficient of Pre-pregnancy Weight.
In step 4 , when the new variable hieight is introduced it reduced the coefficient of the pre-pregnancy wieght from 6.048 to 2.073 , also the variable turns from being significant to being insignifcant in terms of its p value.
d.) Based on the results in (a) to (c), explain which independent variables should be included or excluded to formulate the final model. State the final model.
Consider the table
Now look at the p values column ,
all the variables that have p values less than 0.05 can be included in the model and are considered significant. we see that Pre-pregnancy weight is not statistically significant as it p value is greater than 0.05 and is 0.2622
e.) Comment on the overall adequacy of the final model.
we see that in the last step the significant F value is 1.54078E-33, which is less than the level of significance of 5% . Hence the model is signifcant and is not generated by chance. However , ,we see that the r2 value is only 0.1448 , which means that the model can explain only 14.48% variation of the data.
Please note that we can answer only 4 subparts of a question at a time ,as per the answering guideline.
however to answer full question , i am going to answer 1 interpretation of the coeeficients. The interpretation would remain the same for all the independent variables.
a.) Interpret the regression coefficients of all six (6) independent variables in the model obtained in Step 6, and comment on the statistical significance of each.
consider the coefficient for gestation.
this means that for 1 unit change in gestation , the dependent variable changes by 10.22 units.
The same goes for all independent variables.
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0% Intercept -1606.187104 550.1922646 -2.919319677 0.003587021 -2685.857461 -526.516747 -2685.857461 -526.516747 Gestation 10.2209845 1.348089051 7.581831849 7.7938E-14 7.575560562 12.86640844 7.575560562 12.86640844 Smoke -236.5983421 32.94098001 -7.182492509 1.3397E-12 -301.2401082 -171.9565761 -301.2401082 -171.9565761 Pre-pregnancy weight 2.073703427 1.848658668 1.121734078 0.262246153 -1.554013816 5.701420669 -1.554013816 5.701420669 Height 14.03805243 2.751262592 5.102403698 4.01551E-07 8.639109443 19.43699541 8.639109443 19.43699541