Suppose we have a random dataset, i.e. the attribute values are generated indepe
ID: 3252507 • Letter: S
Question
Suppose we have a random dataset, i.e. the attribute values are generated independently from the class label, and it contains data points belongs to either POSITIVE or NEGATIVE classes. Now we need to build a classifier for such a dataset and we use half of the dataset for training while the remaining half for testing purpose. Please answer following questions and provide brief explanation for your answers:
1) Suppose there are an equal number of positive and negative records in the data and the decision tree classifier predicts every test record to be positive. What is the expected error rate of the classifier on the test data?
2) Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 0.8 and negative class with probability 0.2.
3) Suppose two-thirds of the data belong to the positive class and the remaining one-third belong to the negative class. What is the expected error of a classifier that predicts every test record to be positive?
4)Repeat the previous analysis assuming that the classifier predicts each test record to be positive class with probability 2/3 and negative class with probability 1/3
Explanation / Answer
1. Error is when a negative dataset is termed positive and vice versa.Half records are positive and half negative.
When every record is predicted positive, then error is when half negative value are termed positive. Error rate = 0.5 = 50%
2. 80% records are positive and 20% negative. Error when negative records are predicted positive records = (80-50) = 30%.There is no error when positive records are predicted to be negative. So error rate = 30%
3. 1/3rd of the data is negative and rpedicted wrongly as positive whereas no error in predicting positive records. So, error rate = 1/3 = 33.33%
4. 2/3 of data is positive and 2/3 records are predicted to be positive. So we can assume there is no error in prediction