0% found this document useful (0 votes)

491 views

Machine Learning by Tom Mitchell - Definitions

1. The document provides information about machine learning concepts like perceptrons, backpropagation algorithm, regularization, and gradient descent. 2. It includes questions and answers on topics such as the perceptron training rule, differences between L1 and L2 regularization, early stopping, and characteristics of problems solved using neural networks. 3. Regularization techniques like L1 and L2 regularization help reduce overfitting when dealing with large number of features in a dataset.

Uploaded by

Ponambalam Vilashini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

491 views

Machine Learning by Tom Mitchell - Definitions

Uploaded by

Ponambalam Vilashini

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

SASTRA

School of Computing

Part A Collections : Machine Learning

1. A neuron with 4 inputs has the weight vector w = [1, 2, 3, 4]T and a bias _ = 0 (zero). The activation
function is linear, where the constant of proportionality equals 2 — that is, the activation function is
given by f(net) = 2 × net. If the input vector is x = [4, 8, 5, 6]T then calculate the output of the neuron.

Output ------(118)

2. What is the biggest difference between Widrow & Hoff’s Delta Rule and the Perceptron Learning
Rule for learning in a single-layer feed forward network?

The Delta Rule is defined for linear activation functions, but the Perceptron Learning Rule is defined
for step activation functions.

3. What are merits and demerits of Back Propagation Algorithm?

Merits:
1. The mathematical formula present here can be applied to any network and does not require any
special mention of the features of the function to be learnt.
2. The computing time is reduced if the weights chosen are small at the beginning.
Demerits:
1. The n umber of learning steps may be high, and also the learning phase has intensive calculations.
2. The training may cause temporal instability to the system.

4. What are the applications of back propagation algorithm?

1. Optical character recognition
2. Image compression
3. D ata compression
4. Control problems

5. What are the four main steps in back propagation algorithm?

1. Initialization of weights
2. Feed forward function
3. Back propagation
4. Termination

6. Give some applications of ANN

• Function approximation, or regression analysis, including time series prediction, fitness

approximation and modeling.
• Classification, including pattern and sequence recognition, novelty detection and sequential decision
making.
• Data processing, including filtering, clustering, blind source separation and compression.
• Robotics, including directing manipulators, Computer numerical control.

7. What are the uses of Regularization?

In order to create less complex (parsimonious) model when you have a large number of features in your
dataset, some of the Regularization techniques used to address over-fitting and feature selection are: L1
Regularization, L2 Regularization
8. List the difference between L1 Regularization & L2 Regularization.

A regression model that uses L1 regularization technique is called Lasso Regression and model which
uses L2 is called Ridge Regression.

The key difference between these two is the penalty term.

Ridge regression adds “squared magnitude” of coefficient as penalty term to the loss function. Here the
highlighted part represents L2 regularization element.

Cost function

Here, if lambda is zero then you can imagine we get back OLS. However, if lambda is very large then it will
add too much weight and it will lead to under-fitting. Having said that it’s important how lambda is chosen.
This technique works very well to avoid over-fitting issue.

Lasso Regression (Least Absolute Shrinkage and Selection Operator) adds “absolute value of magnitude” of
coefficient as penalty term to the loss function.

Cost function

Again, if lambda is zero then we will get back OLS whereas very large value will make coefficients zero hence
it will under-fit.The key difference between these techniques is that Lasso shrinks the less important feature’s
coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we
have a huge number of features.Traditional methods like cross-validation, stepwise regression to handle
overfitting and perform feature selection work well with a small set of features but these techniques are a
great alternative when we are dealing with a large set of features.

9. Define Early Stopping:

One method for improving network generalization ability is to use a network that is just large enough
to provide an adequate fit to the target function. But sometimes it is hard to know beforehand how large a
network should be for a specific application. One commonly used technique for improving network
generalization is early stopping. This technique monitors the error on a subset of the data (validation
data) that does not actually take part in the training. The training stops when the error on the validation
data increases for a certain amount of iterations
10. What is generalization?
The ability of a pattern recognition system to approximate the desired output values for pattern vectors
which are not in the training set.

11. Define Regularization.

Regularization is defined as “any modiﬁcation we make to a learning algorithm that is intended to reduce its
generalization error but not its training error.”

12. What are the other names of L2 regularization?

It is also known as weight decay , ridge regression or Tikhonov regularization.

13. What is Regularization?

Regularization is a way to avoid overfitting by penalizing high-valued regression coefficients. In

simple terms, it reduces parameters and shrinks (simplifies) the model. This more streamlined, more
parsimonious model will likely perform better at predictions. Regularization adds penalties to more
complex models and then sorts potential models from least overfit to greatest; The model with the
lowest “overfitting” score is usually the best choice for predictive power.

14. Why is Regularization Necessary?

Regularization is necessary because least squares regression methods, where the residual sum of
squares is minimized, can be unstable. This is especially true if there is multicollinearity in the model.
However, the mere practice of model fitting comes with a major pitfall: any set of data can be fitted
to a model, even if that model is ridiculously complex.

15. What does Regularization achieve?

A standard least squares model tends to have some variance in it, i.e. this model won’t generalize well for
a data set different than its training data. Regularization, significantly reduces the variance of the model,
without substantial increase in its bias. So the tuning parameter λ, used in the regularization techniques,
controls the impact on bias and variance. As the value of λ rises, it reduces the value of coefficients and thus
reducing the variance. Till a point, this increase in λ is beneficial as it is only reducing the variance(hence
avoiding overfitting), without loosing any important properties in the data. But after certain value, the model
starts loosing important properties, giving rise to bias in the model and thus underfitting. Therefore, the
value of λ should be carefully selected.

16. What are characetristics of problems that can be solved by neural network learning?

Instances are represented by many attribute-value pairs. The target function output may be
discrete-valued, real-valued, or a vector of several real- or discrete-valued attributes. The training
examples may contain errors. Long training times are acceptable. Fast evaluation of the learned
target function may be required. The ability of humans to understand the learned target function
is not important.
17. State the Equation of Perceptron Training rule.

W=weight x=input.
eta = learning rate t=target output o = generated output

18. Drawback of Perceptron Training Rule.

It can be applied to training samples which are Linearly separable. If the data are not linearly
separable, convergence is not assured.

19. When Gradient descent can be applied?

It is a strategy for searching through a large or infinite hypothesis space that can be applied
whenever
(1) the hypothesis space contains continuously parameterized hypotheses (e.g., the weights in a linear
unit),
(2) the error can be differentiated with respect to these hypothesis parameters.

20. What are the difficulties in applyting gradient descent?

(1) converging to a local minimum can sometimes be quite slow (i.e., it can require many thousands
of gradient descent steps),
(2) if there are multiple local minima in the error surface, then there is no guarantee that the
procedure will find the global minimum.

21. Distinguish between standard gradient descent and stochastic gradient descent.
In standard gradient descent, the error is summed over all examples before updating weights,
whereas in stochastic gradient descent weights are updated upon examining each training example.
Summing over multiple examples in standard gradient descent requires more computation per
weight update step. On the other hand, because it uses the true gradient, standard gradient descent is
often used with a larger step size per weight update than stochastic gradient descent.
In cases where there are multiple local minima stochastic gradient descent can sometimes avoid
falling into these local minima and reaches global optimal minimum.

22. Distinguish between Perceptron Training Rule and Delta Rule.

The difference between these two training rules is reflected in different convergence properties. The
perceptron training rule converges after a finite number of iterations to a hypothesis that perfectly
classifies the training data, provided the training examples are linearly separable. The delta rule
converges only asymptotically toward the minimum error hypothesis, possibly requiring
unbounded time, but converges regardless of whether the training data are linearly separable.

23. How the problem of local minima can be alleviated?

Adding momentum term to the weight-update rule. Using stochastic gradient descent rather than
true gradient descent. Training multiple networks using the same data, but initializing each network
with different random weights.

24. What set of functions can be represented by feed forward networks?

• Boolean functions.
• Continuous functions.
• Arbitrary functions.

25. What is sample error?

26. What is true error?

27. Mention Central Limit Theorem.

28. Write down the formula of the general two-sided confidence interval for estimating the difference
between errors of two hypotheses.

29. What are the generic four-step procedure used to derive a confidence interval?
30. If a random variable Y obeys a Normal distribution with
then write down the confidence interval .

31. What is a dropout?

Dropout is a regularization technique for reducing overfitting in neural networks. At each training step we
randomly drop out (set to zero) set of nodes, thus we create a different model for each training case, all of
these models share weights. It’s a form of model averaging.

32. What are hyperparameters, provide some examples?

Hyperparameters as opposed to model parameters can’t be learn from the data, they are set before training
phase.

Learning rate
It determines how fast we want to update the weights during optimization, if learning rate is too small,
gradient descent can be slow to find the minimum and if it’s too large gradient descent may not converge(it
can overshoot the minima). It’s considered to be the most important hyperparameter.

Number of epochs
Epoch is defined as one forward pass and one backward pass of all training data.

Batch size
The number of training examples in one forward/backward pass.

33. What is the role of the activation function?

The goal of an activation function is to introduce non-linearity into the neural network so that it can learn
more complex function. Without it, the neural network would be only able to learn function which is a linear
combination of its input data.

34. What is a cost function?

Cost function tells us how well the neural network is performing. Our goal during training is to find
parameters that minimize the cost function. For an example of a cost function, consider Mean Squared Error
function.

35. What is a gradient descent?

Gradient descent is an optimization algorithm used in machine learning to learn values of parameters that
minimize the cost function. It’s an iterative algorithm, in every iteration, we compute the gradient of the cost
function with respect to each parameter and update the parameters of the function via the following.
36. What is data augmentation? List some examples.

Data augmentation is a technique for synthesizing new data by modifying existing data in such a way that the
target is not changed, or it is changed in a known way.
Computer vision is one of fields where data augmentation is very useful. There are many modifications that
we can do to images:
• Resize
• Horizontal or vertical flip
• Rotate
• Add noise
• Deform
• Modify colors

37. Why do we need a validation set and test set? What is the difference between them?

When training a model, we divide the available data into three separate sets:
• The training dataset is used for fitting the model’s parameters. However, the accuracy that we
achieve on the training set is not reliable for predicting if the model will be accurate on new samples.
• The validation dataset is used to measure how well the model does on examples that weren’t part of
the training dataset. The metrics computed on the validation data can be used to tune the
hyperparameters of the model. However, every time we evaluate the validation data and we make
decisions based on those scores, we are leaking information from the validation data into our model.
The more evaluations, the more information is leaked. So we can end up overfitting to the validation
data, and once again the validation score won’t be reliable for predicting the behaviour of the model
in the real world.
• The test dataset is used to measure how well the model does on previously unseen examples. It should
only be used once we have tuned the parameters using the validation set.
So if we omit the test set and only use a validation set, the validation score won’t be a good estimate of the
generalization of the model.

38. Implement an AND function to a single neuron.

Below is a tabular representation of an AND function:
X1 X2 X1 AND X2
0 0 0
0 1 0
1 0 0
1 1 1
The activation function of our neuron is denoted as:
Bias = -1.5, w1 = 1, w2 = 1.

39. Which technique perform similar operations as dropout in a neural network?

Bagging. Dropout can be seen as an extreme form of bagging in which each model is trained on
a single case and each parameter of the model is very strongly regularized by sharing it with the
corresponding parameter in all the other models.

40. In a neural network, which of the techniques are used to deal with overfitting?

Dropout, Regularization, Batch Normalization

41. Which of the following statement is the best description of early stopping?

A. Train the network until a local minimum in the error function is reached

B. Simulate the network on a test dataset after every epoch of training. Stop training when the
generalization error starts to increase

C. Add a momentum term to the weight update in the Generalized Delta Rule, so that training
converges more quickly

D. A faster version of backpropagation, such as the `Quickprop’ algorithm

Solution: (B)

42. What is parameter sharing?

While a parameter norm penalty is one way to regularize parameters to be close to one
another, the more popular way is to use constraints: to force sets of parameters to be
equal. This method of regularization is often referred to as parameter sharing, where we
interpret the various models or model components as sharing a unique set of parameters.
A significant advantage of parameter sharing over regularizing the parameters to be close
(via a norm penalty) is that only a subset of the parameters (the unique set) need to be
stored in memory.
43. What is Bagging?

Bagging (Bootstrap aggregating) is a technique for reducing generalization error by

combining several models. The idea is to train several different models separately, then
have all of the models vote on the output for test examples. This is an example of a
general strategy in machine learning called model averaging. Techniques employing this
strategy are known as ensemble methods.

44. List the advantages of Dropout.

• It is very computationally cheap

• it does not significantly limit the type of model or training procedure that
can be used. It works well with nearly any model that uses a distributed
representation and can be trained with stochastic gradient descent.

45. Suppose you test a hypothesis h and find that it commits r = 300 errors on a sample S of n =
1000 randomly drawn test examples.

What is the standard deviation in error s ( h )?

error s ( h ) = r/n
= 300 / 1000
= 0.3
The variance in this estimate arises completely from the variance in r.
Because r is Binomially distributed
variance ( error s ( h ) ) = np ( 1 - p )
Since p is unknown, substitute estimate r / n
= 1000 ( 0.3 )( 1 - 0.3 )
= 210
standard deviation ( r )
= square root ( variance ( r ) )
= square root ( 210 )
= 14.49
standard deviation ( error s ( h ) )
= standard deviation ( r ) / n
= 14.49 / 1000

= 0.01449
46. Suppose hypothesis h commits r = 10 errors over a sample of n = 65 independently drawn
examples. What is the 90% confidence interval (two-sided) for the true error rate?

10 / 65 = 0.15
90% interval = 0.15 +- 1.64 ( square root [ 0.15 * ( 1 - 0.15 ) / 65 ] )
= 0.15 +- 0.073

47. What is the minimum number of examples ( n ) you must collect to assure that the width of
the two-sided 95% confidence interval will be smaller that 0.1?

Let E ( error D ( h ) ) = ( 0.2 + 0.6 ) / 2

= 0.4
95% interval width = 2 * ( 1.96 * x )
x = square root [ 0.4 * ( 1 - 0.4 ) / n ]
for width < 0.1
x = 0.1 / ( 1.96 * 2 ) 0.0255
= 0.0255
0.0255 = square root [ 0.4 * ( 1 - 0.4 ) / n ]
0.00065025 = ( 0.4 * 0.6 ) / n
0.00065025 = 0.24 / n
n = 0.24 / 0.00065025

n = 370

48. What are the values of weights w0, w1, and w2 for the perceptron whose decision surface is
illustrated in the figure? Assume the surface crosses the x1 axis at -1 and the x2 axis at 2.
49. a) Design a two-input perceptron that implements the Boolean function A∧¬B. (b) Design the two-layer
network of perceptrons that implements A XOR B.

The requested perceptron has 3 inputs: A, B, and the constant 1. The values of A and B are 1
(true) or -1 (false). The following table describes the output O of the perceptron:
A B A XOR B
-1 -1 -1
-1 1 -1
1 -1 1
1 1 -1

One of the correct decision surfaces (any line that separates the positive point from the negative
points would be fine) is shown in the following picture
50. Derive a gradient descent training rule for a single unit with output o, where

.
The gradient descent training rule specifies how the weights are to be changed at each step of
the learning procedure so that the prediction error of the unit decreases the most.

51. In the Back-Propagation learning algorithm, what is the object of the learning? Does the Back-Propagation
learning algorithm guarantee to find the global optimum solution?

The object is to learn the weights of the interconnections between the inputs and the hidden units and between the
hidden units and the output units. The algorithms attempts to minimize the squared error between the network output
values and the target values of these outputs. The learning algorithm does not guarantee to find the global optimum
solution. It guarantees to find at least a local minimum of the error function.

Analytics at Work Smarter Decisions Better Results by Thomas t Morison Thomas H. Davenport;Jeanne G. Harris;Robert Morison all chapter instant download
100% (1)
Analytics at Work Smarter Decisions Better Results by Thomas t Morison Thomas H. Davenport;Jeanne G. Harris;Robert Morison all chapter instant download
24 pages
Summer Training Report ML
79% (14)
Summer Training Report ML
48 pages
Goodman The New Riddle of Induction Sparknotes
No ratings yet
Goodman The New Riddle of Induction Sparknotes
4 pages
Artificial Intelligence in Healthcare
No ratings yet
Artificial Intelligence in Healthcare
13 pages
The One Page CV - Hichens, Paul - 2013 - Harlow, England - Pearson - 9781292001470 - Anna's Archive
No ratings yet
The One Page CV - Hichens, Paul - 2013 - Harlow, England - Pearson - 9781292001470 - Anna's Archive
244 pages
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
From Everand
Discrete Structure and Automata Theory for Learners: Learn Discrete Structure Concepts and Automata Theory with JFLAP
Sukhpreet Kaur Gill
No ratings yet
The Theory of Matrices in Numerical Analysis
From Everand
The Theory of Matrices in Numerical Analysis
Alston S. Householder
3.5/5 (3)
An Introduction to Finite Projective Planes
From Everand
An Introduction to Finite Projective Planes
Abraham Adrian Albert
No ratings yet
An Introduction to Algebraic Structures
From Everand
An Introduction to Algebraic Structures
Joseph Landin
2/5 (1)
The Number System
From Everand
The Number System
H. A. Thurston
No ratings yet
Mathematics for Data Science: Linear Algebra with Matlab
From Everand
Mathematics for Data Science: Linear Algebra with Matlab
César Pérez López
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
From Everand
Competitive Learning: Fundamentals and Applications for Reinforcement Learning through Competition
Fouad Sabry
No ratings yet
Combinatorial Algorithms: Enlarged Second Edition
From Everand
Combinatorial Algorithms: Enlarged Second Edition
T. C. Hu
3.5/5 (2)
Dynamic Bayesian Networks: Fundamentals and Applications
From Everand
Dynamic Bayesian Networks: Fundamentals and Applications
Fouad Sabry
No ratings yet
A Survey On Deep Learning Techniques For Medical Image Analysis Riyaj
100% (1)
A Survey On Deep Learning Techniques For Medical Image Analysis Riyaj
20 pages
Intro R Commander
No ratings yet
Intro R Commander
19 pages
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
From Everand
Direct Linear Transformation: Practical Applications and Techniques in Computer Vision
Fouad Sabry
No ratings yet
Exercises of Distributions
From Everand
Exercises of Distributions
Simone Malacrida
No ratings yet
Introduction to Vectors, Matrices and Tensors
From Everand
Introduction to Vectors, Matrices and Tensors
Simone Malacrida
No ratings yet
A Geometric Approach To The Basel Problem
No ratings yet
A Geometric Approach To The Basel Problem
12 pages
Explorations of Mathematical Models in Biology with MATLAB
From Everand
Explorations of Mathematical Models in Biology with MATLAB
Mazen Shahin
No ratings yet
Statistical Independence in Probability, Analysis and Number Theory
From Everand
Statistical Independence in Probability, Analysis and Number Theory
Mark Kac
No ratings yet
New Learning of Python by Practical Innovation and Technology
From Everand
New Learning of Python by Practical Innovation and Technology
Sudhir Pathania
No ratings yet
Statistical and Inductive Probabilities
From Everand
Statistical and Inductive Probabilities
Hugues Leblanc
No ratings yet
Linear Algebra Notes PDF
No ratings yet
Linear Algebra Notes PDF
3 pages
A Problem Solving Approach in College Algebra
No ratings yet
A Problem Solving Approach in College Algebra
92 pages
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
Nonlinear Differential Equations
From Everand
Nonlinear Differential Equations
Raimond A. Struble
No ratings yet
Exercises of Limits
From Everand
Exercises of Limits
Simone Malacrida
No ratings yet
Theory of Deep Learning 1652786371
No ratings yet
Theory of Deep Learning 1652786371
118 pages
Artifical Intelligence Notes Part 5
No ratings yet
Artifical Intelligence Notes Part 5
35 pages
Graph Theory
No ratings yet
Graph Theory
302 pages
Communication Nets: Stochastic Message Flow and Delay
From Everand
Communication Nets: Stochastic Message Flow and Delay
Leonard Kleinrock
3/5 (1)
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
From Everand
Group Method of Data Handling: Fundamentals and Applications for Predictive Modeling and Data Analysis
Fouad Sabry
No ratings yet
Pathophysiology of Peripheral Nerve Injury A Brief Review
No ratings yet
Pathophysiology of Peripheral Nerve Injury A Brief Review
7 pages
Introduction to Numerical Analysis
From Everand
Introduction to Numerical Analysis
Simone Malacrida
No ratings yet
Localized Feature Extraction
No ratings yet
Localized Feature Extraction
6 pages
Graph Theory and Path Searches in Python
No ratings yet
Graph Theory and Path Searches in Python
3 pages
Spontaneous Symmetry Breaking in Quantum Field Theory
From Everand
Spontaneous Symmetry Breaking in Quantum Field Theory
Evan Hughes
1/5 (1)
Linear Algebra
From Everand
Linear Algebra
Sterling K. Berberian
3/5 (2)
Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation
From Everand
Nonlinear Filtering and Smoothing: An Introduction to Martingales, Stochastic Integrals and Estimation
Venkatarama Krishnan
No ratings yet
Facial Recognition Using Eigen Faces
No ratings yet
Facial Recognition Using Eigen Faces
3 pages
EEFM Practice
No ratings yet
EEFM Practice
4 pages
2.3 Indeterminate Forms and L'Hopital's Rule
100% (1)
2.3 Indeterminate Forms and L'Hopital's Rule
15 pages
Fuzzy Logic Introduction
100% (1)
Fuzzy Logic Introduction
8 pages
Introductory Numerical Analysis
From Everand
Introductory Numerical Analysis
Anthony J. Pettofrezzo
2/5 (1)
Information Retrieval Data Structures & Algorithms - William B. Frakes
No ratings yet
Information Retrieval Data Structures & Algorithms - William B. Frakes
630 pages
Principles of Programming Languages - ASU 2014
100% (4)
Principles of Programming Languages - ASU 2014
479 pages
Stock Watson 4E AnswersToReviewTheConcepts
No ratings yet
Stock Watson 4E AnswersToReviewTheConcepts
34 pages
Hidalgo Why Information Grows
No ratings yet
Hidalgo Why Information Grows
7 pages
Ma5160 Applied Probability and Statistics 1 PDF
50% (2)
Ma5160 Applied Probability and Statistics 1 PDF
4 pages
Data Structures II Essentials
From Everand
Data Structures II Essentials
Dennis C. Smolarski
No ratings yet
Lectures On Fundamental Concepts Of Algebra And Geometry
From Everand
Lectures On Fundamental Concepts Of Algebra And Geometry
Young John Wesley
No ratings yet
Slides-Sksk
100% (1)
Slides-Sksk
151 pages
Dynamic Programming: Models and Applications
From Everand
Dynamic Programming: Models and Applications
Eric V. Denardo
2/5 (1)
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
From Everand
Multicore DSP: From Algorithms to Real-time Implementation on the TMS320C66x SoC
Naim Dahnoun
No ratings yet
A Brief Introduction to Theta Functions
From Everand
A Brief Introduction to Theta Functions
Richard Bellman
No ratings yet
Introduction to Formal Languages
From Everand
Introduction to Formal Languages
György E. Révész
2/5 (1)
(2009) Suresh Chandra, Jayadeva, Aparna Mehra - Numerical Optimization With Applications-Penguin Books
No ratings yet
(2009) Suresh Chandra, Jayadeva, Aparna Mehra - Numerical Optimization With Applications-Penguin Books
707 pages
Ethical and Cultural Issues Worksheet 1
100% (1)
Ethical and Cultural Issues Worksheet 1
2 pages
Machine Learning Applications in Physical Design: Recent Results and Directions
No ratings yet
Machine Learning Applications in Physical Design: Recent Results and Directions
114 pages
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
No ratings yet
University Institute of Technology Rajiv Gandhi Proudyogikivishwavidyalaya Bhopal (M.P.)
44 pages
Classification of Citrus Plant Diseases Using Deep Transfer Learning
No ratings yet
Classification of Citrus Plant Diseases Using Deep Transfer Learning
17 pages
Advanced Lectures On Machine Learning ML Summer SC
No ratings yet
Advanced Lectures On Machine Learning ML Summer SC
249 pages
Deep Learning Tutorial: Reference: Hung-Yi Lee
100% (1)
Deep Learning Tutorial: Reference: Hung-Yi Lee
179 pages
BERT4Rec Sequential Recommendation With BidirectionalEncoder Representations From Transformer
No ratings yet
BERT4Rec Sequential Recommendation With BidirectionalEncoder Representations From Transformer
11 pages
AI Assisted High Fidelity Multi-Physics Digital Twin of Industrial Gas Turbines
No ratings yet
AI Assisted High Fidelity Multi-Physics Digital Twin of Industrial Gas Turbines
8 pages
Schizophrenia Research: Jose A. Cortes-Briones, Nicolas I. Tapia-Rivas, Deepak Cyril D'Souza, Pablo A. Estevez
No ratings yet
Schizophrenia Research: Jose A. Cortes-Briones, Nicolas I. Tapia-Rivas, Deepak Cyril D'Souza, Pablo A. Estevez
19 pages
Summary Business Analytics
No ratings yet
Summary Business Analytics
24 pages
Machine Learning in Additive Manufacturing A Review
No ratings yet
Machine Learning in Additive Manufacturing A Review
15 pages
Black Book
No ratings yet
Black Book
58 pages
Analysis of IEC 61850-9-2LE Measured Values Using A Neural Network
No ratings yet
Analysis of IEC 61850-9-2LE Measured Values Using A Neural Network
20 pages
IDRiD Diabetic Retinopathy - Segmentation and Grading Challenge
No ratings yet
IDRiD Diabetic Retinopathy - Segmentation and Grading Challenge
26 pages
A Very Brief Introduction To Machine Learning With Applications To Communication Systems
No ratings yet
A Very Brief Introduction To Machine Learning With Applications To Communication Systems
20 pages
Skating Speed: A Statistical Approach To Modelling
No ratings yet
Skating Speed: A Statistical Approach To Modelling
13 pages
L10a - Machine Learning Basic Concepts
100% (1)
L10a - Machine Learning Basic Concepts
80 pages
Manish Bhatt 2451137 ProjectIV
No ratings yet
Manish Bhatt 2451137 ProjectIV
20 pages
Machine Learning Classification of Stars, Galaxies, and Quasars
No ratings yet
Machine Learning Classification of Stars, Galaxies, and Quasars
21 pages
Places PAMI2018 PDF
No ratings yet
Places PAMI2018 PDF
13 pages
Yarbus (Greene, Liu, Wolfe)
No ratings yet
Yarbus (Greene, Liu, Wolfe)
8 pages
Chatbot: A Deep Neural Network Based Human To Machine Conversation Model
No ratings yet
Chatbot: A Deep Neural Network Based Human To Machine Conversation Model
7 pages
Data Mining With Clustering AND Classification
No ratings yet
Data Mining With Clustering AND Classification
16 pages
P M S W P M U D L D T R
No ratings yet
P M S W P M U D L D T R
26 pages
De - Biasing 2 MIT
No ratings yet
De - Biasing 2 MIT
9 pages
Designing Machine Learning Toolboxes Concepts Prin
No ratings yet
Designing Machine Learning Toolboxes Concepts Prin
35 pages
CH11
No ratings yet
CH11
36 pages
Pattern Recognition
No ratings yet
Pattern Recognition
12 pages
Use of Deep Learning To Study Modelling Deterioration of Pavements A Case Study in Iowa
No ratings yet
Use of Deep Learning To Study Modelling Deterioration of Pavements A Case Study in Iowa
32 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Machine Learning by Tom Mitchell - Definitions

Uploaded by

Machine Learning by Tom Mitchell - Definitions

Uploaded by

SASTRA

Part A Collections : Machine Learning

3. What are merits and demerits of Back Propagation Algorithm?

4. What are the applications of back propagation algorithm?

5. What are the four main steps in back propagation algorithm?

6. Give some applications of ANN

• Function approximation, or regression analysis, including time series prediction, fitness

7. What are the uses of Regularization?

The key difference between these two is the penalty term.

9. Define Early Stopping:

11. Define Regularization.

12. What are the other names of L2 regularization?

13. What is Regularization?

Regularization is a way to avoid overfitting by penalizing high-valued regression coefficients. In

14. Why is Regularization Necessary?

15. What does Regularization achieve?

18. Drawback of Perceptron Training Rule.

19. When Gradient descent can be applied?

20. What are the difficulties in applyting gradient descent?

22. Distinguish between Perceptron Training Rule and Delta Rule.

23. How the problem of local minima can be alleviated?

24. What set of functions can be represented by feed forward networks?

25. What is sample error?

27. Mention Central Limit Theorem.

31. What is a dropout?

32. What are hyperparameters, provide some examples?

33. What is the role of the activation function?

34. What is a cost function?

35. What is a gradient descent?

38. Implement an AND function to a single neuron.

39. Which technique perform similar operations as dropout in a neural network?

Dropout, Regularization, Batch Normalization

D. A faster version of backpropagation, such as the `Quickprop’ algorithm

42. What is parameter sharing?

Bagging (Bootstrap aggregating) is a technique for reducing generalization error by

44. List the advantages of Dropout.

• It is very computationally cheap

What is the standard deviation in error s ( h )?

Let E ( error D ( h ) ) = ( 0.2 + 0.6 ) / 2

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.