Machine Learning Questions
Machine Learning Questions
Machine Learning Questions
1. Back Also known as back prop, this is the process 5. F1/F Score A measure of how accurate a model is by
Propagation of back tracking errors through the weights of using precision and recall following a
the network after forward propagating inputs formula of:
through the network. This is used by applying
the chain rule in calculus F1 = 2 (Precision Recall) / (Precision + Recall)
Precise: of every prediction which ones are
2. Batch [1] When networks have many deep layers
actually positive?
Normalization there becomes an issue of internal covariate
shift. The shift is "the change in the
Precision = True Positives / (True Positives +
distribution of network activations due to the
False Positives)
change in network parameters during
Recall: of all that actually have positive
training." (Szegedy). If we can reduce internal
predicitions what fraction actually were
covariate shift we can train faster and better.
positive?
Batch Normalization solves this problem by
normalizing each batch into the network by
Recall = True Positives / (True Positives +
both mean and variance.
False Negatives)
3. Cross Used to calculate how far off your label
6. Gradient The gradient is the partial derivative of a
Entropy prediction is. Some times denoted by CE.
(Nabla) function that takes in multiple vectors and
outputs a single value (i.e. our cost functions
in Neural Networks). The gradient tells us
Cross entropy is a loss function is related to
which direction to go on the graph to
the entropy of thermodynamics concept of
increase our output if we increase our
entropy. This is used in multi class
variable input. We use the gradient and go in
classification to find the error in the
the opposite direction since we want to
predicition.
decrease our loss.
4. Drop out
7. L1 and L2 These regularization methods prevent
Regularization overfitting by imposing a penalty on the
coefficients. L1 can yield sparse models while
L2 cannot. Regularization is used to specify
model complexity. This is important because
it allows your model to generalize better and
not overfit to the training data.
[1] "It prevents overfitting and provides a way
of approximately combining exponentially 8. Learning Rate The learning rate is the magnitude at which
many different neural network architectures [2] you're adjusting your weights of the network
efficiently"(Hinton). This method randomly during optimization after back propagation.
picks visible and hidden units to drop from The learning rate is a hyper parameter that
the network. This is usually determined by will be different for a variety of problems.
picking a layer percentage dropout. This should be cross validated on.