Choose a different base classification method in Bagging and AdaBoostM1 for clas
ID: 3801279 • Letter: C
Question
Choose a different base classification method in Bagging and AdaBoostM1 for classifying titanic.
Please tell me what I need to change and how.
```{r Bagging from RWeka}
bagging_model <- Bagging(Survived~., data = train)
# The following is the same as the default setting above.
# bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = "weka.classifiers.trees.REPTree"))
str(bagging_model)
bagging_model
summary(bagging_model)
# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
# Changing the base classifier to J48
bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = "J48"))
# str(bagging_model)
bagging_model
# summary(bagging_model)
# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
# Changing the base classifier to J48
bagging_model <- Bagging(Survived~., data = train, control = Weka_control(W = list(J48, M = 30)))
# bagging_model
# summary(bagging_model)
# performance of bagging model in train
bagging_predict_train <- predict(bagging_model,train)
mmetric(train$Survived,bagging_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of bagging model
bagging_predict_test <- predict(bagging_model,test)
mmetric(test$Survived,bagging_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
```
```{r Build, exam and evaluate Boosting models}
M1_model <- AdaBoostM1(Survived~., data = train)
# The following is the same as the default setting above.
# M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = "DecisionStump"))
str(M1_model)
M1_model
summary(M1_model)
# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
# Changing the base classifier to J48
M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = "J48"))
# str(M1_model)
# M1_model
# summary(M1_model)
# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
# Changing the base classifier to J48 and minimum number of leaf size or # of instances in a leaf to 30
M1_model <- AdaBoostM1(Survived~., data = train, control = Weka_control(W = list(J48, M = 30)))
# str(M1_model)
# M1_model
# summary(M1_model)
# performance of M1 model in train
M1_predict_train <- predict(M1_model,train)
mmetric(train$Survived,M1_predict_train, metric = c("ACC","TPR","PRECISION","F1"))
# Testing performance of M1 model
M1_predict_test <- predict(M1_model,test)
mmetric(test$Survived,M1_predict_test, metric = c("ACC","TPR","PRECISION","F1"))
```
Explanation / Answer
AdaBoostM1
AdaBoostM1 is a very popular boosting algorithm for binary classification. The algorithm trains learners sequentially. For every learner with index t, AdaBoostM1 computes the weighted classification error
t=Nn=1d(t)nI(ynht(xn)),
where
xn is a vector of predictor values for observation n.
yn is the true class label.
ht is the prediction of learner (hypothesis) with index t.
I is the indicator function.
d(t)n is the weight of observation n at step t.
AdaBoostM1 then increases weights for observations misclassified by learner t and reduces weights for observations correctly classified by learner t. The next learner t + 1 is then trained on the data with updated weights d(t+1)n.
After training finishes, AdaBoostM1 computes prediction for new data using
Finally, we can place this code into a function to wrap it up nicely: