Academic Integrity: tutoring, explanations, and feedback — we don’t complete graded work or submit on a student’s behalf.

Part 1: Definition (20 Points) Define the following terms and where appropriate

ID: 3060228 • Letter: P

Question

Part 1: Definition (20 Points) Define the following terms and where appropriate compare and contrast with the alternative method. For example, the survivor function helps describe the distribution of event times, but so do the cumulative distribution function, density function, and the hazard function. Consider, in your definition of the survivor function, including how it differs from other functions that describe the distribution of event times. a) The Trade-Off Between Prediction Accuracy and Model Interpretability b) Supervised Versus Unsupervised Learning c) The Bias-Variance Trade-Off d) Linear Regression versus K-Nearest Neighbors e) Logistic Regression versus LDA versus QDA

Explanation / Answer

a )

Model performance is estimated in terms of its accuracy to predict the occurrence of an event on unseen data. A more accurate model is seen as a more valuable model.

Model interpretability provides insight into the relationship between in the inputs and the output. An interpreted model can answer questions as to why the independent features predict the dependent attribute.

Linear model has good interpretability, but less prediction ability.

Deep Neural newtworks has good predictive ability, but less interpretability.

b)

Supervised learning is where you have input variables (x) and an output variable (Y) and you use an algorithm to learn the mapping function from the input to the output.

Y = f(X)

The goal is to approximate the mapping function so well that when you have new input data (x) that you can predict the output variables (Y) for that data.

Unsupervised learning is where you only have input data (X) and no corresponding output variables.

The goal for unsupervised learning is to model the underlying structure or distribution in the data in order to learn more about the data.

C )

When the complexity of the hypothesis set is large, then you have less bias, but you have lot of varience.

Let,s take 100 th order polynomial vs 2 nd order polynomial.

100 th order hypothesis set is large, so it has less bias, and variance is more (Need a lot of data to fit)

2 nd order hypothesis set is less, so it has more bias, variance is less.

So, there is a tradeoff between bias and variance.