Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Problem 2 (16 pts): Answer the following questions by circling the most likely c

ID: 3252506 • Letter: P

Question

Problem 2 (16 pts): Answer the following questions by circling the most likely choice:

[TRUE / FALSE] Decision tree algorithm (without pruning) is sensitive to noisy data.

[TRUE / FALSE] Naïve Bayes method is intolerant of irrelevant attributes.

[TRUE / FALSE] Data mining is non-trivial extraction of implicit, previously unknown and potentially useful information from data.

[TRUE / FALSE] F-measure value is closer to max(Precision, Recall) than to min(Precision, Recall).

[TRUE / FALSE] Naïve Bayes Classifier assumes features are independent.

[TRUE / FALSE] Holdout method tends to produce more stable estimate of the model performance accuracy than cross validation method.

[TRUE / FALSE] Convenience sampling is one type of probability sampling method.

[TRUE / FALSE] ROC (Receiver Operating Curve) is generated by plotting sensitivity (y-axis) against specificity (x-axis).

Explanation / Answer

yes Decision tree algorithm (without pruning) is sensitive to noisy data

yes Naïve Bayes method is intolerant of irrelevant attributes.

yes “Data mining has been defined as "The nontrivial extraction of implicit, previously unknown, and potentially useful information from data"

yes F-measure value is closer to max(Precision, Recall) than to min(Precision, Recall)

yes Naive Bayes is so called because the independence assumptions we have just made are indeed very naive for a model of natural language. The conditional independence assumption states that features are independent of each other given the class.