In 1972-74 a one-in-six survey of electoral roll was conducted in Whickham, U.K.
ID: 3203707 • Letter: I
Question
In 1972-74 a one-in-six survey of electoral roll was conducted in Whickham, U.K. Twenty years later a follow-up survey was conducted. The following table reports data on 1314 women on their smoking status and age at the first survey, and whether they were dead or alive at the follow-up survey.37 D. R. Appleton, J. M. French, and M. P. J. Vanderpump (1997), "Ignoring a covariate: An example Simpson's paradox, " The American Statistician. SO, pp. 340-341. (a) Aggregate the data over age groups and calculate the overall death rates for smokers and nonsmokers. Which group has the higher death rate? (b) Now calculate the adjusted (forage) death rates for smokers and nonsmokers. Which group has the higher death rate? (c) What does the fact that there were far few smokers in higher age groups suggest about the effect of smoking? This exercise illustrates what can go wrong if a covariate (age) is correlated with the risk factor (smoking) and outcome variable (survival).Explanation / Answer
a) R-Script:
Output:
> print("Death rate for smokers is")
[1] "Death rate for smokers is"
> sum(deadsmokers)/sum(deadsmokers+livesmokers)
[1] 0.2388316
> print("Death rate for non smokers is")
[1] "Death rate for non smokers is"
> sum(deadnonsmokers)/sum(deadnonsmokers+livenonsmokers)
[1] 0.3142077
(b) R-Script:
Output:
> print("Adjested Deat reates for smokers is")
[1] "Adjested Deat reates for smokers is"
> sum(deadsmokers/(deadsmokers+livesmokers))
[1] 2.645724
> print("Adjested Deat reates for non smokers is")
[1] "Adjested Deat reates for non smokers is"
> sum(deadnonsmokers/(deadnonsmokers+livenonsmokers))
[1] 2.373198
(c) Almost for all age groups, the death rates of smokers is higher than that of non smokers
Output:
> print("Adjested Deat reates for smokers is")
[1] "Adjested Deat reates for smokers is"
> (deadsmokers/(deadsmokers+livesmokers))
[1] 0.03636364 0.02419355 0.12844037
[4] 0.20769231 0.44347826 0.80555556
[7] 1.00000000
> print("Adjested Deat reates for non smokers is")
[1] "Adjested Deat reates for non smokers is"
> (deadnonsmokers/(deadnonsmokers+livenonsmokers))
[1] 0.01612903 0.03184713 0.05785124
[4] 0.15384615 0.33057851 0.78294574
[7] 1.00000000
livesmokers=c(53,121,95,103,64,7,0)
deadnonsmokers=c(1,5,7,12,40,101,64)
livenonsmokers=c(61,152,114,66,81,28,0)
print("Death rate for smokers is")
sum(deadsmokers)/sum(deadsmokers+livesmokers)
print("Death rate for non smokers is")
sum(deadnonsmokers)/sum(deadnonsmokers+livenonsmokers)
Output:
> print("Death rate for smokers is")
[1] "Death rate for smokers is"
> sum(deadsmokers)/sum(deadsmokers+livesmokers)
[1] 0.2388316
> print("Death rate for non smokers is")
[1] "Death rate for non smokers is"
> sum(deadnonsmokers)/sum(deadnonsmokers+livenonsmokers)
[1] 0.3142077