Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Logistic regression. The data le harrell.csv contains data on 40 people. The var

ID: 3312419 • Letter: L

Question

Logistic regression. The data le harrell.csv contains data on 40 people. The variables are
age in years, gender, a categorical variable with two levels, female and male, and response, a 0/1
indicator variable for whether the person responded to a medical treatment (1 means that the person
responded). Fit a logistic regression model with response as the dependent variable and age and
gender as the independent variables.
(a) How does the probability of response change for a 42-year-old male compared to a 52-year-old
male?
(b) Which gender has a higher probability of response to the medical treatment?
(c) What is the e ect on the odds of response for a one-year increase in age?
(d) Make a plot of the probability of response as a function of age, with one curve for females and
one curve for males.

age gender response 37 female 0 39 female 0 39 female 0 42 female 0 47 female 0 48 female 0 48 female 1 52 female 0 53 female 0 55 female 0 56 female 0 57 female 0 58 female 0 58 female 1 60 female 0 64 female 0 65 female 1 68 female 1 68 female 1 70 female 1 34 male 1 38 male 1 40 male 0 40 male 0 41 male 0 43 male 1 43 male 1 43 male 1 44 male 0 46 male 0 47 male 1 48 male 1 48 male 1 50 male 0 50 male 1 52 male 1 55 male 1 61 male 1 61 male 1 61 male 1

Explanation / Answer

Rstudio-code

data=read.table("D:\data.txt",header=TRUE)
data
X1=data[,1]
X1
X2=as.factor(data[,2])
X2
Y=data[,3]
Y
lm.fit=glm(Y~X1+X2,family=binomial(link="logit"),)
lm.fit
beta=lm.fit$coefficients
beta

Routput-

data=read.table("D:\data.txt",header=TRUE)

> data

age gender response

1 37 0 0

2 39 0 0

3 39 0 0

4 42 0 0

5 47 0 0

6 48 0 0

7 48 0 1

8 52 0 0

9 53 0 0

10 55 0 0

11 56 0 0

12 57 0 0

13 58 0 0

14 58 0 1

15 60 0 0

16 64 0 0

17 65 0 1

18 68 0 1

19 68 0 1

20 70 0 1

21 34 1 1

22 38 1 1

23 40 1 0

24 40 1 0

25 41 1 0

26 43 1 1

27 43 1 1

28 43 1 1

29 44 1 0

30 46 1 0

31 47 1 1

32 48 1 1

33 48 1 1

34 50 1 0

35 50 1 1

36 52 1 1

37 55 1 1

38 61 1 1

39 61 1 1

40 61 1 1

> X1=data[,1]

> X1

[1] 37 39 39 42 47 48 48 52 53 55 56 57 58 58 60 64 65 68 68 70 34 38 40 40 41 43 43

[28] 43 44 46 47 48 48 50 50 52 55 61 61 61

> X2=as.factor(data[,2])

> X2

[1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Levels: 0 1

> Y=data[,3]

> lm.fit=glm(Y~X1+X2,family=binomial(link="logit"),)

> lm.fit

Call: glm(formula = Y ~ X1 + X2, family = binomial(link = "logit"))

Coefficients:

(Intercept) X1 X21  

-9.8302 0.1578 3.4849  

Degrees of Freedom: 39 Total (i.e. Null); 37 Residual

Null Deviance: 55.45

Residual Deviance: 38.9 AIC: 44.9

> beta=lm.fit$coefficients

> beta
(Intercept) X1 X21
-9.8301685 0.1578414 3.4849454

(1)

approprite fitting model,

log(p/1-p)=-9.8301685 +0.1578414X1+ 3.4849454X2=eta

p/(1-p)=exp(eta)

(1-p)/p=1/exp(eta)

1/p=(1/exp(eta))+1

1/p=(1+exp(eta))/exp(eta)

p=exp(eta)/(1+exp(eta))

Where, X2 =1 if male and 0 if female

p=exp(-9.8301685 +0.1578414X1+ 3.4849454X2/(1+exp(-9.8301685 +0.1578414X1+ 3.4849454X2))

hence for male;

at age of 42

p=0.5705549

at age of 52 p=0.865591

Hence, as age increase probability of response also increase.

(b)

p=exp(-9.8301685 +0.1578414X1+ 3.4849454X2/(1+exp(-9.8301685 +0.1578414X1+ 3.4849454X2))

where x2=1 if male 0 if female

for male probability of response is higher then female probabilty of response.

since,beta2 is positive3.48 hence given probability is higher than refreance.

(c)

(p/1-p)=exp(-9.8301685 +0.1578414X1+ 3.4849454X2)=exp(eta)

odds ratio=(p/1-p)

odds ratio is increse if one year increase.

(d)pridict response using

pridict command and then plot graph for male and female age separatlly.