Match each situation described to the most appropriate caution about the coeffic
ID: 3350398 • Letter: M
Question
Match each situation described to the most appropriate caution about the coefficient of determination, r2, for simple linear regression (SLR): (a) An SLR model fir to data with a curved relationship has r2-8896. (b) The r2 for an SLR model changes from 25% to 896 when a data point is removed. (c) An SLR model for variables y and x has an r2 of 70%, but in reality y and x have little 6. to do with one another. (d) The r2 for an SLR model fit to a large dataset is statistically significant but the estimated slope is not meaningfully different from 0. Select one caution for each situation from the following list: Association does not imply causation. One data point can greatly affect the r2 value. A large r2 value does not imply that the estimated regression line fits the data well. Statistical significance does not imply practical significance. . .Explanation / Answer
For ease and convenience in answering, the latter set of 4 statements are serially numbered from 1 to 4 in the given order. With this, the correct matching is: [explanation follows]
(a)(3)
(b)(2)
(c)(1)
(d)(4)
Explanations
(a)(3): Regression assumes linear relationship, but a high r2 can result from a non-linear relationship also. [That is why the very first step in correlation-regression study is making a scatter plot which would immediately tell if the relationship is linear or otherwise.]
(b)(2): r2 is very sensitive to outliers which can have great impact on its value. Thus, removal of one extreme value can change the value of r2 appreciably.
(c)(1): Two variables each of which is closely related to a third variable, can show a very high r2, even when the two variables have nothing to do with each other. It is possible that in an economy, the per capita income comes down over a period of time due to various economic factors. It is also possible that in the same country literacy percentage improves considerably over the same period due to governmental policies. The two figures would bring out a very high negative r. That does not mean we should infer that literacy leads to poverty – that would be ridiculous abuse of Statistics!!!
(d)(4): Statistical significance is impacted by various factors like sample size, variation etc. Thus, a statistically significant situation may not be practically significant. It is quite likely to have a situation where an r of 0.4 turns out to be statistically significant. But then, r = 0.4 => r2 = 0.16 which implies that only16% of the variation in the dependent variable is explained by the independent variable.
DONE