Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

A dataset of 800 cases was partitioned into a training set of 650 cases and a va

ID: 3252503 • Letter: A

Question

A dataset of 800 cases was partitioned into a training set of 650 cases and a validation set of 150 cases. Classifier A predict class label for the validation case based on the closest training case. Classifier A has a misclassification error rate of 4% on the validation data. It was later discovered that the partitioning had been done incorrectly and that 80 cases from the training data set had been accidentally duplicated and had overwritten 80 cases in the validation dataset. What is the true misclassification error rate for the validation data?

Explanation / Answer

True Misclassification error rate for the validation data = Cases Correct in Validation Set / Total

= 70 / 800 = 9%

Training Set Validaion Set Total Cases Incorrect 650 150 800 Cases Correct =650+80 =730 =150-80=70 =730+70=800 Total 1380 220 1600