Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Consider the data of Bethel et al. (1985), discussed in Problem 19 in Chapter 14

ID: 3222116 • Letter: C

Question

Consider the data of Bethel et al. (1985), discussed in Problem 19 in Chapter 14. Delete the three female subjects, leaving 16 observations. Use FEV1 as the response and AGE, WEIGHT, and HEIGHT as predictors.

a. Use the all possible regressions procedure to suggest a best model.

b. Consider a model with centered AGE, WEIGHT, HEIGHT, and their squares as predictors. Suggest a plausible forward chunkwise strategy for choosing a model, and implement it.

c. Use the all possible regressions procedure for the expanded model to choose a best model.

d. Compare results from parts (a), (b), and (c). What model seems most plausible? How do the data limit your conclusions?

SUBJECT

AGE

SEX

HEIGHT

WEIGHT

FEV

1

24

M

175

78.0

4.7

2

36

M

172

67.6

4.3

3

28

F

171

98.0

3.5

4

25

M

166

65.5

4.0

5

26

F

166

65.0

3.2

6

22

M

176

65.5

4.7

7

27

M

185

85.5

4.3

8

27

M

171

76.3

4.7

9

36

M

185

79.0

5.2

10

24

M

182

88.2

4.2

11

26

M

180

70.5

3.5

12

29

M

163

75.0

3.2

13

33

F

180

68.0

2.6

14

31

M

180

65.0

2.0

15

30

M

180

70.4

4.0

16

22

M

168

63.0

3.9

17

27

M

168

91.2

3.0

18

46

M

178

67.0

4.5

19

36

M

173

62.0

2.4

SUBJECT

AGE

SEX

HEIGHT

WEIGHT

FEV

1

24

M

175

78.0

4.7

2

36

M

172

67.6

4.3

3

28

F

171

98.0

3.5

4

25

M

166

65.5

4.0

5

26

F

166

65.0

3.2

6

22

M

176

65.5

4.7

7

27

M

185

85.5

4.3

8

27

M

171

76.3

4.7

9

36

M

185

79.0

5.2

10

24

M

182

88.2

4.2

11

26

M

180

70.5

3.5

12

29

M

163

75.0

3.2

13

33

F

180

68.0

2.6

14

31

M

180

65.0

2.0

15

30

M

180

70.4

4.0

16

22

M

168

63.0

3.9

17

27

M

168

91.2

3.0

18

46

M

178

67.0

4.5

19

36

M

173

62.0

2.4

Explanation / Answer

First of all, we tried by keeping each of the three variables as individual independent variables:

a)

b)

c)

d)

Upon checking all of the above models, none of the variables have a p-value that is significant. Hence, none of the models are significant. However, looking at the r-square, the most significant model is the one with all the three variables as independent variables.

SUMMARY OUTPUT Regression Statistics Multiple R 0.30643 R Square 0.0939 Adjusted R Square -0.13263 Standard Error 0.940622 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -2.28401 6.31618 -0.36161 0.723929 -16.0458 11.47777 AGE -0.00654 0.040067 -0.16324 0.873047 -0.09384 0.080757 HEIGHT 0.030437 0.038239 0.79595 0.441523 -0.05288 0.113753 WEIGHT 0.014467 0.028279 0.511572 0.618233 -0.04715 0.076081 SUMMARY OUTPUT Regression Statistics Multiple R 0.214198 R Square 0.045881 Adjusted R Square -0.02227 Standard Error 0.893624 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 2.408318 1.846818 1.304037 0.213255 -1.55271 6.369348 WEIGHT 0.020575 0.025077 0.820498 0.425683 -0.03321 0.074359 SUMMARY OUTPUT Regression Statistics Multiple R 0.257419 R Square 0.066265 Adjusted R Square -0.00043 Standard Error 0.884027 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept -1.98033 5.916077 -0.33474 0.742786 -14.6691 10.70839 HEIGHT 0.033649 0.033758 0.996767 0.335795 -0.03876 0.106054 SUMMARY OUTPUT Regression Statistics Multiple R 0.03564 R Square 0.00127 Adjusted R Square -0.07007 Standard Error 0.914276 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 4.05524 1.09385 3.707307 0.002343 1.709164 6.401316 AGE -0.00488 0.036571 -0.13344 0.895745 -0.08332 0.073557

b)

SUMMARY OUTPUT Regression Statistics Multiple R 0.290389 R Square 0.084326 Adjusted R Square -0.14459 Standard Error 0.945578 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 0.830412 3.193595 0.260024 0.799253 -6.12783 7.788657 AGE^2 1.08E-06 0.000613 0.001763 0.998622 -0.00133 0.001337 HEIGHT^2 8.51E-05 0.00011 0.773636 0.454118 -0.00015 0.000325 WEIGHT^2 8.63E-05 0.000188 0.459652 0.653982 -0.00032 0.000496

c)

SUMMARY OUTPUT Regression Statistics Multiple R 0.010064 R Square 0.000101 Adjusted R Square -0.07132 Standard Error 0.914811 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 3.893831 0.545952 7.132179 5.07E-06 2.722879 5.064782 AGE^2 2.09E-05 0.000554 0.037659 0.970491 -0.00117 0.001209 SUMMARY OUTPUT Regression Statistics Multiple R 0.258033 R Square 0.066581 Adjusted R Square -9.2E-05 Standard Error 0.883877 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 0.946108 2.976649 0.317843 0.755295 -5.43817 7.330386 HEIGHT^2 9.66E-05 9.67E-05 0.999311 0.334604 -0.00011 0.000304 SUMMARY OUTPUT Regression Statistics Multiple R 0.187632 R Square 0.035206 Adjusted R Square -0.03371 Standard Error 0.898609 Observations 16 Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Intercept 3.270335 0.926107 3.53127 0.003322 1.284032 5.256638 WEIGHT^2 0.000118 0.000166 0.71475 0.486512 -0.00024 0.000474

d)

Upon checking all of the above models, none of the variables have a p-value that is significant. Hence, none of the models are significant. However, looking at the r-square, the most significant model is the one with all the three variables as independent variables.