Again i only need question 4 answered, question 3 is only for the reference. Use
ID: 3226855 • Letter: A
Question
Again i only need question 4 answered, question 3 is only for the reference.Use the R programming software to answer the question. The teengamb dataset is found in the downloadable 'faraway' package for the R programming software. You must download the package to access the datast. Then please answer the question and please include the code used to find the answers!!!
Please do not respond saying 'data not found' i stated that it is found in the downloadable faraway package using R software.
Thanks in advance! Use the R programming software to answer the question. The teengamb dataset is found in the downloadable 'faraway' package for the R programming software. You must download the package to access the datast. Then please answer the question and please include the code used to find the answers!!!
Please do not respond saying 'data not found' i stated that it is found in the downloadable faraway package using R software.
Thanks in advance! 3. Use the prostate data with lpsa as the response and the other variables as predictors Implement the following variable selection criterion to determine the "best" model using forward selection procedure: AIC BIC Adjusted R Mallows C 4. Using the teengamb dataset with gamble as the response and the other variables as predictors, repeat the work of the previous question, but use backward elimination.
Explanation / Answer
forward selection using mallow cp
library(faraway)
## Warning: package 'faraway' was built under R version 3.2.5
library(leaps)
## Warning: package 'leaps' was built under R version 3.2.5
leaps( x=teengamb[,1:4], y=teengamb[,5], names=names(teengamb)[1:4], method="Cp")
## $which
## sex status income verbal
## 1 FALSE FALSE TRUE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE TRUE
## 1 FALSE TRUE FALSE FALSE
## 2 TRUE FALSE TRUE FALSE
## 2 FALSE TRUE TRUE FALSE
## 2 FALSE FALSE TRUE TRUE
## 2 TRUE TRUE FALSE FALSE
## 2 TRUE FALSE FALSE TRUE
## 2 FALSE TRUE FALSE TRUE
## 3 TRUE FALSE TRUE TRUE
## 3 TRUE TRUE TRUE FALSE
## 3 FALSE TRUE TRUE TRUE
## 3 TRUE TRUE FALSE TRUE
## 4 TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "sex" "status" "income" "verbal"
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $Cp
## [1] 11.401283 30.984606 41.445676 45.517426 3.248323 12.003293 12.276400
## [8] 25.967108 26.743051 42.897591 3.034526 4.856329 10.256053 26.416920
## [15] 5.000000
The first part of the output, denoted $which, lists seven possible sub-models in seven rows. The first column indicates the number of predictors in the sub-model for each row. The variables in each sub-model are those designated TRUE in each row.
The next two parts of the output don't give us any new information, but the last part, designated $Cp, gives us the value of the Mallows' Cp criterion for each sub-model, in the same order. The best sub-model is that for which the Cp value is closest to p (the number of parameters in the model, including the intercept). For the full model, we always have Cp = p. The idea is to find a suitable reduced model, if possible. Here the best reduced model is the third one, for which Cp = 26.416920 and p = 4.
forward selection using R square
leaps( x=teengamb[,1:4], y=teengamb[,5], names=names(teengamb)[1:4], method="adjr2")
## $which
## sex status income verbal
## 1 FALSE FALSE TRUE FALSE
## 1 TRUE FALSE FALSE FALSE
## 1 FALSE FALSE FALSE TRUE
## 1 FALSE TRUE FALSE FALSE
## 2 TRUE FALSE TRUE FALSE
## 2 FALSE TRUE TRUE FALSE
## 2 FALSE FALSE TRUE TRUE
## 2 TRUE TRUE FALSE FALSE
## 2 TRUE FALSE FALSE TRUE
## 2 FALSE TRUE FALSE TRUE
## 3 TRUE FALSE TRUE TRUE
## 3 TRUE TRUE TRUE FALSE
## 3 FALSE TRUE TRUE TRUE
## 3 TRUE TRUE FALSE TRUE
## 4 TRUE TRUE TRUE TRUE
##
## $label
## [1] "(Intercept)" "sex" "status" "income" "verbal"
##
## $size
## [1] 2 2 2 2 3 3 3 3 3 3 4 4 4 4 5
##
## $adjr2
## [1] 0.37335700 0.14777864 0.02727861 -0.01962347 0.47872403
## [6] 0.37558442 0.37236702 0.21108098 0.20193983 0.01162814
## [11] 0.49328792 0.47132669 0.40623483 0.21142102 0.48164945
The highest value for either criteria indicates the best sub-model.
adjr2=0.493 and p=4
forward selection using AIC and Bic
nothing<-lm(gamble~1,data=teengamb)
fullmode<-lm(gamble~.,data=teengamb)
forwards = step(nothing,scope=list(lower=formula(nothing),upper=formula(fullmode)),direction="forward")
## Start: AIC=325.34
## gamble ~ 1
##
## Df Sum of Sq RSS AIC
## + income 1 17680.9 28009 304.34
## + sex 1 7598.4 38091 318.79
## + verbal 1 2212.5 43477 325.00
## <none> 45689 325.34
## + status 1 116.2 45573 327.22
##
## Step: AIC=304.34
## gamble ~ income
##
## Df Sum of Sq RSS AIC
## + sex 1 5227.3 22781 296.63
## <none> 28009 304.34
## + status 1 719.8 27289 305.11
## + verbal 1 579.1 27429 305.35
##
## Step: AIC=296.63
## gamble ~ income + sex
##
## Df Sum of Sq RSS AIC
## + verbal 1 1139.78 21642 296.21
## <none> 22781 296.63
## + status 1 201.82 22580 298.21
##
## Step: AIC=296.21
## gamble ~ income + sex + verbal
##
## Df Sum of Sq RSS AIC
## <none> 21642 296.21
## + status 1 17.776 21624 298.18
The lowest value for AIC criteria indicates the best sub-model.