Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

CW6.1 We have looked several times at the Olympic data, first inroduced in Quest

ID: 3049858 • Letter: C

Question

CW6.1 We have looked several times at the Olympic data, first inroduced in Question Sheet 3. This data set contains winning high jump heights, long jump distances and discus throws at a sample of ten Olympics. The aim was to see which of long jump (r,i) and discus throw (i2) could be used to predict high jump heights (Yi) Consider the three possible models (a) Define what is meant by the 'sum of squares' of a fitted model. The sums of squares for these models are 74.18, 52.36 and 50.10 respectively. (b) Use the F-test to compare (i) Models 7 and 9 (i) Models 8 and 9. Which is your preferred model in each case? [4 [4 (c) Why is this test not appropriate for a comparison of models 7 and 8?

Explanation / Answer

a. The sum of squares is the sum of the squares of the deviations of the predicted values from the mean value of a response variable, in a standard regression model — for example, yi = a + b1x1i + b2x2i + ... + i, where yi is the i th observation of the response variable, xji is the i th observation of the j th explanatory variable, a and bi are coefficients, i indexes the observations from 1 to n, and i is the i th value of the error term. In general, the greater the sum of squares, the better the estimated model performs

b. The F test for comparing two models does the following, it calculates the relative increase in sum of squares to the relative increase in degrees of freedom between a simpler model and a more complicated model.

If increases in both the models are the same then the simpler model is right, while if there is a greater relative increase in the sum of squares compared to the degrees of freedom then, the more complicated model is the correct one

F = (SS1-SS2)*(DF1-Df2) / SS2/DF2

The degrees of freedom are equal to the number of observations which in this case is 10. So calculating the sum of squares and Degrees of freedom for the two models as per the definitions given above the F value can be calculated, and the more correct model will be our preferred model

c. Since F Test is used to compare a simpler and a more complicated model, it is not appropriate to compare models 7 and 8 using this as both the models are of the same simplicity