Neural networks are a biologically-inspired family of data mining techniques for
ID: 3679641 • Letter: N
Question
Neural networks are a biologically-inspired family of data mining techniques for solving data mining tasks. One type of neural netowrk is the multi-layer perceptron, which can solve supervised learning problems. With this in mind, please answer the following questions: Considering that the logistic activation function only outputs probability values from 0 to 1, how do neural networks deal with regression problems? In your opinion, if we have the universal approximation theorem, why do researchers even bother constructing deep neural networks? Your neural network converges at a training score that is sub-optimal. Please describe why gradient descent can sometimes discover poor choice of weights in a neural network. How should you decide the number of hidden neurons in a single-layer multilayer perceptron? Please provide two reasons that the logistic function used in neural networks as opposed to the stepwise function How does the autoencoder neural network differ from a standard multi-layer perceptron? What problem (out of the four data mining problems we've discussed) does the autoencoder neural network solve?Explanation / Answer
a)The “classic” application of logistic regression model is binary classification. However, we can also use “flavors” of logistic to tackle multi-class classification problems, e.g., using the One-vs-All or One-vs-One approaches, via the related softmax regression / multinomial logistic regression. Although there are kernelized variants of logistic regression exist, the standard “model” is a linear classifier.
b)This is because neural networks can compute any function which might not be done universal approximation.
supposer we have wiggly function, f(x):
No matter what the function, there is guaranteed to be a neural network so that for every possible input, x, the value f(x) (or some close approximation) is output from the network, :
c)It leads to poor choice of weights as it gets stuck in the local minima.We need to run gradient descent exponential times in order to find global minima.
d)Testing helps to find the number of neurons in a given layer for the 2 layers we can find by:
input layer: the size of my data vactor (the number of features in my model) + 1 for the bias node and not including the response variable, of course
output layer: soley determined by my model: regression (one node) versus classification (number of nodes equivalent to the number of classes, assuming softmax)