Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Choose a different base classification method in Bagging and AdaBoostM1 for clas

ID: 3801279 • Letter: C

Question

Choose a different base classification method in Bagging and AdaBoostM1 for classifying titanic.

Please tell me what I need to change and how.

```{r Bagging from RWeka}

bagging_model <- Bagging(Survived~., data = train)

# The following is the same as the default setting above.
# bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = "weka.classifiers.trees.REPTree"))
str(bagging_model)
bagging_model
summary(bagging_model)

# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))

# Changing the base classifier to J48
bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = "J48"))
# str(bagging_model)
bagging_model
# summary(bagging_model)

# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))

# Changing the base classifier to J48
bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = list(J48, M = 30)))
# bagging_model
# summary(bagging_model)

# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
```

```{r Build, exam and evaluate Boosting models}

M1_model <- AdaBoostM1(Survived~., data = train)

# The following is the same as the default setting above.
# M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = "DecisionStump"))

str(M1_model)
M1_model
summary(M1_model)

# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))

# Changing the base classifier to J48
M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = "J48"))
# str(M1_model)
# M1_model
# summary(M1_model)

# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))

# Changing the base classifier to J48 and minimum number of leaf size or # of instances in a leaf to 30

M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = list(J48, M = 30)))
# str(M1_model)
# M1_model
# summary(M1_model)

# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))

# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
```

Explanation / Answer

AdaBoostM1

AdaBoostM1 is a very popular boosting algorithm for binary classification. The algorithm trains learners sequentially. For every learner with index t, AdaBoostM1 computes the weighted classification error

t=Nn=1d(t)nI(ynht(xn)),

where

xn is a vector of predictor values for observation n.

yn is the true class label.

ht is the prediction of learner (hypothesis) with index t.

I is the indicator function.

d(t)n is the weight of observation n at step t.

AdaBoostM1 then increases weights for observations misclassified by learner t and reduces weights for observations correctly classified by learner t. The next learner t + 1 is then trained on the data with updated weights d(t+1)n.

After training finishes, AdaBoostM1 computes prediction for new data using

Finally, we can place this code into a function to wrap it up nicely: