Simpson\'s Paradox, Derek -vs- David: Averaging across categories can be mislead
ID: 3353063 • Letter: S
Question
Simpson's Paradox, Derek -vs- David: Averaging across categories can be misleading but this can be resolved with weighted averages.
In baseball, the batting average is defined as the number of hits divided by the number of times at bat. Below is a table for the batting average for two different players for two different years.
The number in parentheses gives the number of times at bat for each player for each year.
xDerek =
xDavid =
xDerek=
xDavid=
Derek's higher average occurred with more times at bat (575).
David's higher average occurred with fewer times at bat (145).
Derek's lower batting average was based on a small number of times at bat (45).
All of these contributed to the discrepancy.
Batting Average (# of times at bat) 1995 1996 Derek 0.249 (45 times at bat) 0.313 (575 times at bat) David 0.252 (415 times at bat) 0.322 (145 times at bat)(a) What are the averages of the two batting averages for Derek(xDerek) and David(xDavid)?
Do NOT use a weighted average, just take the mean of 1995 and 1996 batting averages. Round your answers to 3 decimal places.
xDerek =
xDavid =
(b) Who had the higher average batting average using the non-weighted average?
(c) Using a weighted average, calculate the average batting averages for Derek(xDerek)and David (xDavid).
Round your answers to 3 decimal places.
xDerek=
xDavid=
(d) Who had the higher average batting average using the weighted average?
(e) What caused the discrepancy in average batting averages?
Derek's higher average occurred with more times at bat (575).
David's higher average occurred with fewer times at bat (145).
Derek's lower batting average was based on a small number of times at bat (45).
All of these contributed to the discrepancy.
Explanation / Answer
1995 1996 Derek 0.249 45 0.313 575 620 David 0.252 415 0.322 145 560 a) average Derek 0.281 David 0.287 b) David has higher average c) Weigthed average Derek 0.308355 David 0.270125 d) Derek has higher average using the weighted average 1995 1996 Derek 0.249 45 0.313 575 =SUM(C4,F4) David 0.252 415 0.322 145 =SUM(C5,F5) a) average Derek =AVERAGE(B4,E4) David =AVERAGE(B5,E5) b) David has higher average c) Weigthed average Derek =(B4*C4+E4*F4)/G4 David =(B5*C5+E5*F5)/G5 d) Derek has higher average using the weighted average