Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Again i only need question 4 answered, question 3 is only for the reference. Use

ID: 3226855 • Letter: A

Question

Again i only need question 4 answered, question 3 is only for the reference.
Use the R programming software to answer the question. The teengamb dataset is found in the downloadable 'faraway' package for the R programming software. You must download the package to access the datast. Then please answer the question and please include the code used to find the answers!!!
Please do not respond saying 'data not found' i stated that it is found in the downloadable faraway package using R software.
Thanks in advance! Use the R programming software to answer the question. The teengamb dataset is found in the downloadable 'faraway' package for the R programming software. You must download the package to access the datast. Then please answer the question and please include the code used to find the answers!!!
Please do not respond saying 'data not found' i stated that it is found in the downloadable faraway package using R software.
Thanks in advance! 3. Use the prostate data with lpsa as the response and the other variables as predictors Implement the following variable selection criterion to determine the "best" model using forward selection procedure: AIC BIC Adjusted R Mallows C 4. Using the teengamb dataset with gamble as the response and the other variables as predictors, repeat the work of the previous question, but use backward elimination.

Explanation / Answer

forward selection using mallow cp

library(faraway)

## Warning: package 'faraway' was built under R version 3.2.5

library(leaps)

## Warning: package 'leaps' was built under R version 3.2.5

leaps( x=teengamb[,1:4], y=teengamb[,5], names=names(teengamb)[1:4], method="Cp")

## $which
##     sex status income verbal
## 1 FALSE FALSE   TRUE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE   TRUE
## 1 FALSE   TRUE FALSE FALSE
## 2 TRUE FALSE   TRUE FALSE
## 2 FALSE   TRUE   TRUE FALSE
## 2 FALSE FALSE   TRUE   TRUE
## 2 TRUE   TRUE FALSE FALSE
## 2 TRUE FALSE FALSE   TRUE
## 2 FALSE   TRUE FALSE   TRUE
## 3 TRUE FALSE   TRUE   TRUE
## 3 TRUE   TRUE   TRUE FALSE
## 3 FALSE   TRUE   TRUE   TRUE
## 3 TRUE   TRUE FALSE   TRUE
## 4 TRUE   TRUE   TRUE   TRUE
##
## $label
## [1] "(Intercept)" "sex"         "status"      "income"      "verbal"    
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $Cp
## [1] 11.401283 30.984606 41.445676 45.517426 3.248323 12.003293 12.276400
## [8] 25.967108 26.743051 42.897591 3.034526 4.856329 10.256053 26.416920
## [15] 5.000000

The first part of the output, denoted $which, lists seven possible sub-models in seven rows. The first column indicates the number of predictors in the sub-model for each row. The variables in each sub-model are those designated TRUE in each row.
The next two parts of the output don't give us any new information, but the last part, designated $Cp, gives us the value of the Mallows' Cp criterion for each sub-model, in the same order. The best sub-model is that for which the Cp value is closest to p (the number of parameters in the model, including the intercept). For the full model, we always have Cp = p. The idea is to find a suitable reduced model, if possible. Here the best reduced model is the third one, for which Cp = 26.416920 and p = 4.

forward selection using R square

leaps( x=teengamb[,1:4], y=teengamb[,5], names=names(teengamb)[1:4], method="adjr2")

## $which
##     sex status income verbal
## 1 FALSE FALSE   TRUE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE   TRUE
## 1 FALSE   TRUE FALSE FALSE
## 2 TRUE FALSE   TRUE FALSE
## 2 FALSE   TRUE   TRUE FALSE
## 2 FALSE FALSE   TRUE   TRUE
## 2 TRUE   TRUE FALSE FALSE
## 2 TRUE FALSE FALSE   TRUE
## 2 FALSE   TRUE FALSE   TRUE
## 3 TRUE FALSE   TRUE   TRUE
## 3 TRUE   TRUE   TRUE FALSE
## 3 FALSE   TRUE   TRUE   TRUE
## 3 TRUE   TRUE FALSE   TRUE
## 4 TRUE   TRUE   TRUE   TRUE
##
## $label
## [1] "(Intercept)" "sex"         "status"      "income"      "verbal"    
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $adjr2
## [1] 0.37335700 0.14777864 0.02727861 -0.01962347 0.47872403
## [6] 0.37558442 0.37236702 0.21108098 0.20193983 0.01162814
## [11] 0.49328792 0.47132669 0.40623483 0.21142102 0.48164945

The highest value for either criteria indicates the best sub-model.

adjr2=0.493 and p=4

forward selection using AIC and Bic

nothing<-lm(gamble~1,data=teengamb)
fullmode<-lm(gamble~.,data=teengamb)

forwards = step(nothing,scope=list(lower=formula(nothing),upper=formula(fullmode)),direction="forward")

## Start: AIC=325.34
## gamble ~ 1
##
##          Df Sum of Sq   RSS    AIC
## + income 1   17680.9 28009 304.34
## + sex     1    7598.4 38091 318.79
## + verbal 1    2212.5 43477 325.00
## <none>                45689 325.34
## + status 1     116.2 45573 327.22
##
## Step: AIC=304.34
## gamble ~ income
##
##          Df Sum of Sq   RSS    AIC
## + sex     1    5227.3 22781 296.63
## <none>                28009 304.34
## + status 1     719.8 27289 305.11
## + verbal 1     579.1 27429 305.35
##
## Step: AIC=296.63
## gamble ~ income + sex
##
##          Df Sum of Sq   RSS    AIC
## + verbal 1   1139.78 21642 296.21
## <none>                22781 296.63
## + status 1    201.82 22580 298.21
##
## Step: AIC=296.21
## gamble ~ income + sex + verbal
##
##          Df Sum of Sq   RSS    AIC
## <none>                21642 296.21
## + status 1    17.776 21624 298.18

The lowest value for AIC criteria indicates the best sub-model.