0% found this document useful (0 votes)

0 views

Unit 1-1

The document provides an overview of machine learning fundamentals, including components of learning, the machine learning workflow, and the importance of training and testing datasets. It discusses concepts such as bias-variance tradeoff, overfitting, underfitting, and the role of Hoeffding's inequality and VC bounds in understanding generalization error. Additionally, it highlights the significance of validation techniques like k-fold cross-validation and the impact of noisy targets on model accuracy.

Uploaded by

cavya472004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

0 views

Unit 1-1

Uploaded by

cavya472004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

U18ECE0014

MACHINE LEARNING
Fundamentals of Machine Learning

S.Arun Kumar
Department of Electronics and
Communication Engineering
Overview of this
Lecture
• Components of
Learning
• Fundamentals of
Machine Learning
• Feasibility Of
Learning
• VC Bound
• Bias-Variance
tradeoff
• Learning Curves
Components of Learning
Components of Learning

Unknown Pattern to be learnt from Data

Training Examples

g : Learnt function from Data

Machine Learning Workflow
Machine Learning Workflow

Machine Learning : Uses Data to compute hypothesis g that approximates target f

Outside the Data Set
Another simple binary classification example

Can we “learn” from this data?

(Can we “infer” something outside the training data?)
Outside the Data Set
Learning process:
Check all the possible functions
Choose the one that fits all the data

Can’t make any prediction

Learning from D (to infer something outside D) is impossible if any f
can happen
Inferring Something Unknown
Consider a bin with red and green marbles
P[picking a red marble] = µ
P[picking a green marble] = 1 − µ

The value of µ is unknown to us

How to infer µ?
Pick N marbles independently
ν: the fraction of red marbles
Inferring with probability
Do you know µ? Does v say something about µ?
No!
Sample can be mostly green while bin is mostly red Possible
Can you say something about µ?
Yes!
ν is “probably” close to µ (if the sample is sufficiently large) Probable
Hoeffding’s Inequality
In big sample (large N), ν (sample mean) is probably close to µ:
2
P[|ν − µ| > ϵ] ≤ 2e− 2 є N

This is called Hoeffding’s inequality

The statement “µ = ν” is Probably Approximately Correct (PAC)

The quantity |ν − µ| > 𝜀 is a bad event, we want its probability to be low

The bound is valid for all 𝑁 and 𝜀 is a margin of error
The bound does not depend on v
If we set for a lower margin 𝜀, we have to increase the data 𝑁 in order to
have a small probability of |ν − µ| > 𝜖 (bad event) happening
Hoffding’s Inequality
2
P[|ν − µ| > ϵ] ≤ 2e− 2є N

Valid for all N and ϵ > 0

Does not depend on µ (no need to know µ)
Larger sample size N or looser gap ϵ
⇒ higher probability for µ ≈ ν
Connection to Learning
How to connect this to learning?
Each marble (uncolored) is a data point x ∈ X
Connection to Learning
Red

Red Marble
Green Marble
Connection to Learning
How to connect this to learning?
Each marble (uncolored) is a data point x ∈ X
red ball: h(x)  f (x) (h is wrong)
green ball: h(x)=f (x) (h is correct)

Both µ and ν depend on the particular hypothesis ℎ

ν → in-sample error 𝐸𝑖𝑛 (ℎ)

µ → out-of-sample error 𝐸𝑜𝑢𝑡(ℎ)

The Out of sa mple error 𝐸𝑜 𝑢 𝑡 (ℎ) is the quantity

that really matters
Connection to Learning

• A hypothesis is an assumption, an idea that is proposed for the sake of argument so that
it can be tested to see if it might be true.
• A research hypothesis is a statement of expectation or prediction that will be tested by
research

• For any fixed ‘h’ can probably infer unknown Eout(h) from known Ein(h).

.
For a particular function h:
Eout (h) = P [h(x )  f (x)] (out-of-sample, unknown)
⇔µ

Ein(h) = 1 / yn ] (in-sample, known)

ΣNn=1 [h(xn ) =
N
⇔ν

Now we can infer Eout (h) from Ein (h)!

Verifying a Hypothesis

2
Verifying a Hypothesis

2
Verifying a Hypothesis
For any h, when sample size (N) is large:
2
P[|Ein (h) − Eout (h)| > ϵ] ≤ 2e− 2є N
Given a hypothesis h ⇒ sample N data ⇒ Ein(h) to “verify” the
quality of h
Can we apply to mulitple hypothesis?
Apply to multiple bins (hypothesis)

Color in each bin depends in different hypothesis

Bingo when getting all green balls?
A Simple Solution
When is learning successful?
When our Learning Algorithm A picks the hypothesis g:
2
P[|Ein (g) − Eout (g)| > ϵ] ≤ 2M e− 2є N

If M is small and N is large enough:

If A finds Ein(g) ≈ 0
⇒ Eout (g) ≈ 0 (Learning is successful!)
Feasibility of Learning
2
P[|Ein (g) − Eout (g)| > ϵ] ≤ 2M e− 2є N

Two questions:
(1) Can we make sure Eout (g) ≈ Ein (g)?
(2) Can we make sure Ein (g) ≈ 0?
Feasibility of Learning –Tradeoff on M
2
P[|Ein (g) − Eout (g)| > ϵ] ≤ 2M e− 2є N

Two questions:
(1) Can we make sure Eout (g) ≈ Ein (g)?
(2) Can we make sure Ein (g) ≈ small enough
M: complexity of model
Small M: (1) holds, but (2) may not hold (too few choices)
(under-fitting)
Large M: (1) doesn’t hold, but (2) may hold (over-fitting)
Training and Testing
• Training involves feeding a machine learning
algorithm with data and allowing it to learn the
relationships within that data.
• The dataset used in training is known as the training
dataset. It contains input-output pairs where the
input data is used by the model to learn and the
output data serves as the target or label.
• Testing evaluates the trained model's performance
on unseen data to gauge its generalization capability.
• The dataset used for testing is known as the testing
dataset or test set. It contains data that was not used
during the training phase
Overfitting and Underfitting
•Overfitting: When a model performs well on training data but
poorly on test data, it means the model has learned the noise
in the training data instead of the underlying pattern.
•Underfitting: When a model performs poorly on both training
and test data, it means the model is too simple to capture the
underlying pattern in the data.

•Validation Set: Sometimes, a separate validation set is used in

addition to the training and testing sets. This helps in tuning
the model and avoiding overfitting.
Validation
• K-fold cross-validation is a method where the dataset is divided
into 'k' equally-sized subsets or folds. The model is trained and
validated 'k' times, each time using a different fold as the
validation set and the remaining folds as the training set.
•Step 1: Split the dataset into 'k' folds (typically, k is chosen as 5
or 10).
•Step 2: For each of the 'k' iterations:
•Training: Use 'k-1' folds for training the model.
•Validation: Use the remaining 1 fold for validating the model.
•Step 3: Calculate the performance metric (such as accuracy,
precision, recall, etc.) for each iteration.
•Step 4: Average the performance metrics from all 'k' iterations to
obtain the final performance estimate
Error and Noisy Targets
Error
Error in machine learning refers to the difference between the
predicted values by the model and the actual target values
Training Error: The error measured on the training dataset. It
indicates how well the model fits the training data.
Test Error: The error measured on the test dataset. It indicates
how well the model generalizes to unseen data.
Sources of Error:
•Bias: Systematic error due to erroneous assumptions in the
learning algorithm. High bias can lead to underfitting.
•Variance: Error due to the model's sensitivity to small
fluctuations in the training dataset. High variance can lead to
overfitting.
Noisy Targets
Noisy targets refer to inaccuracies or inconsistencies in the target
labels (output values) of the training data. These can arise due to
various factors and can negatively impact model training.
Sources of Noise in Targets:
• Mistakes in the process of measuring or recording the target
values.
• Inaccuracies introduced by human annotators
• Natural variability in the phenomenon being measured, which
might not be fully captured by the features.
Effects of Noisy Targets:
• Reduced Model Accuracy
• Increased Variance: The model might overfit to the noise,
resulting in high variance.
• Misleading Evaluation Metrics
Theory of Generalization
Probability of a “bad event” is less than a
huge number ➔ not useful bound
The Hoeffding’s inequality becomes:

ℙ [|𝐸𝑖 n (𝑔)− 𝐸o u t (𝑔)|> 𝜀] ≤ 2𝑀𝑒−2𝜀2𝑁

where 𝑀 is the number of hypotheses in ℋ → 𝑀 can be infinity 

The quantity 𝐸out(𝑔) − 𝐸𝑖𝑛 (𝑔) is called the generalization error

To reduce M number of hypotheses 𝑀 can be replaced by a finite quantity 𝑚ℋ (𝑁)

(called the growth function) which is eventually bounded by a polynomial
Theory of Generalization
It turns out that the number of hypotheses 𝑀 can be replaced by a 𝑚ℋ (𝑁)
quantity (called the growth function) which is eventually bounded by a
polynomial

▪ This is due to the fact the 𝑀 hypotheses 𝑥2

will be very overlapping → They generate
the same “classification dichotomy”

𝑥1
Vapnik- Chervonenkis Bound
The VC bound is crucial for understanding the generalization ability of a machine
learning model, which is its performance on unseen data. 1
ℙ 𝐸𝑖𝑛 𝑔 − 𝐸𝑜𝑢𝑡 𝑔 > 𝜀 ≤ 4𝑚ℋ 2𝑁 𝑒 − 2𝑁
8 𝜀
Vapnik Chervonenkis (VC) Bound

The VC bound is crucial for understanding the generalization ability of a machine

learning model, which is its performance on unseen data.
Vapnik Chervonenkis (VC) Bound

The VC bound provides a probabilistic upper

bound on the difference between the true
error (the error on the entire distribution of
data) and the empirical error (the error on
the training data). It helps in understanding
how well a model trained on a finite sample
is expected to perform on new data.
Choosing Reduced value of M
Choosing Reduced value of M
Choosing Reduced value of M
Effective No of lines
Effective No of lines
Effective No of lines
Effective No of lines
Effective No of lines
Dichotomies
Dichotomy refers to the division or separation of a dataset or problem into two
distinct or mutually exclusive classes or groups.
Growth Function
Break Point

2D PERCEPTRON
VC Dimension
VC Dimension
The VC-dimension is a single parameter that characterizes the growth function
Definition
The Vapnik-Chervonenkis dimension of a hypothesis set ℋ is the max number of
points for which the hypothesis can generate all possible classification dichotomies
VC Dimension
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑁=3

Max n°dichotomies
2𝑁 = 8
𝑥2

Shattering: A set of N Points is said to be

shattered by a hypothesis space H, if there are 𝑥2
hypothesis(h) in H that separates positive 𝑁=4
examples from the negative examples in one The linear model is not able to provide all 24
dichotomies, we would need a nonlinear one
of the 2N possible ways
OBSERVATIONS 𝑥1
If 𝑑𝑉𝐶 is finite, then 𝑚ℋ ≤ 𝑁 𝑑 𝑉𝐶 + 1 → this is a polynomial that will be eventually
dominated by 𝑒−𝑁 → generalization guarantees
linear models we have that 𝑑𝑉𝐶 = 𝑑 → can be interpreted as the number of
parameters of the model

The VC dimension of a model straight line in 2D plane is 3 30 /35

M and dvc
Two Questions
Generalization Error
Generalization Error
Generalization Error
Analysis of the generalization bound 𝐸𝑜𝑢𝑡 𝑔 ≤ 𝐸𝑖𝑛 𝑔 + Ω ( 𝑁, ℋ, 𝛿)
𝑬𝒐𝒖𝒕 - 𝑬in ≤ 𝛀 with Probability >= 1-δ

𝑬𝒐𝒖𝒕 ≤ 𝑬in +𝛀

1. 𝐸𝑜𝑢𝑡 𝑔 ≥ 𝐸𝑖 𝑛 𝑔 − Ω 𝑁, ℋ, 𝛿 → Not of much interest 

2. 𝑬𝒐𝒖𝒕 𝒈 ≤ 𝑬𝒊𝒏 𝒈 + 𝛀 𝑵, 𝓗, 𝜹 → Bound on the out of sample error! ☺

Observations
• 𝐸𝑖𝑛 𝑔 is known
• The penalty Ω can be computed if 𝑑𝑉𝐶 ℋ is known and 𝛿 is chosen
Generalization Bound
Generalization Bound
Generalization Bound Ω=
8
𝑁 ln
4𝑚ℋ 2𝑁
𝛿
Analysis of the generalization bound 𝐸𝑜𝑢𝑡 𝑔 ≤ 𝐸𝑖𝑛 𝑔 + Ω 𝑁, ℋ, 𝛿
The optimal model is a compromise between
𝐸𝑖𝑛 and Ω

Error
Out of sample
error(Test error)

Variance
Model complexity

overfitting

bias In sample
error(training error)
underfitting

VC dimension

Model complexity ≈ number of model parameters

VC Dimension for a rectangle
classifier
Break point for axis aligned rectangle is K= 5 ;
dvc=k-1=4
Connection to REAL learning
In a learning scenario, the function ℎ is not fixed a priori
• The learning algorithm is used to fathom the hypothesis space ℋ, to find the
best hypothesis ℎ ∈ ℋ that matches the sampled data → call this
hypothesis 𝑔
• With many hypotheses, there is more probability to find a good hypothesis 𝑔
only by chance → the function can be perfect on sampled data but bad on unseen
data

There is therefore an approximation - generalization tradeoff between:

• Perform well on the given (training) dataset
• Perform well on unseen data
Approximation vs.Generalization
The ultimate goal is to have a small 𝐸𝑜𝑢𝑡: good approximation of 𝑓 out of sample

• More complex ℋ ⇒ better chances of approximating 𝑓 in sample → if ℋ is

too simple, we fail to approximate 𝑓 and we end up with a large 𝐸
𝑖𝑛

• Less complex ℋ ⇒ better chance of generalizing out of sample → if ℋ is too

complex, we fail to generalize well

Ideal: ℋ = 𝑓 winning lottery ticket ☺

Approximation vs.Generalization
The example shows:

• Perfect fit on in sample (training) data

↓
𝐸𝑖𝑛 = 0

• Low fit on out of sample (test) data

↓
𝐸𝑜𝑢𝑡 huge
Approximation generalization
Tradeoff
VC Analysis → Choice of H should strike a balance between approximating
f on training data and generalizing on new data. Since we do not know the
target function, we go to larger model that contains good hypothesis.

H is too simple → fail to approximate f and large Ein

H is too large → Fail to generalise well because of model complexity

Selection of hypothesis H should have two requirements

1. To have some hypothesis H that can approximate f
2. Enable the data to zoom on right hypothesis

In Bias variance decomposition, we split Eout into bias and variance

Quantifying the tradeoff
VC analysis was one approach: 𝐸𝑜𝑢𝑡 ≤ 𝐸𝑖𝑛 + Ω

Bias-variance analysis is another: decomposing 𝐸𝑜𝑢𝑡 into:

1. How well ℋ can approximate 𝑓 → Bias

2. How well we can zoom in on a good ℎ ∈ ℋ, using the available data → Variance

It applies to real valued targets and uses squared error

The learning algorithm is not obliged to minimize squared error loss. However, we
measure its produced hypothesis’s bias and variance using squared error
Bias and Variance • We can create a graphical visualization of
bias and variance using a bulls-eye diagram.
• Imagine that the center of the target is a
model that perfectly predicts the correct
values.
• As we move away from the bulls-eye, our
predictions get worse and worse.
• Imagine we can repeat our entire model
building process to get a number of separate
hits on the target.
• Each hit represents an individual realization
of our model, given the chance variability in
the training data we gather.
• Sometimes we will get a good distribution of
training data so we predict very well and we
are close to the bulls-eye, while sometimes
our training data might be full of outliers or
non-standard values resulting in poorer
predictions.
• These different realizations result in a scatter
Graphical illustration of bias and variance. of hits on the target.
Bias and Variance
The out of sample error is (making explicit the dependence of 𝑔 on entire dataset 𝒟)
final hypothesis predicted target(mean squared error)

The expected out of sample error of the learning model is independent of the
particular realization of data set used to find 𝑔 𝒟 : in simple words the final hypothesis
will be different for different datasets(D). So, g is dependent on D. To get the expected
value of error with respect to D over N number samples,
Bias and Variance
Focus on Define the “average” hypothesis

This average hypothesis can be derived by imagining many datasets 𝒟1, 𝒟2, … , 𝒟𝑘 &
building it by

→ this is a conceptual tool, and 𝑔ҧ does not need

to belong to the hypothesis set
Bias and variance

Therefore
Bias and Variance
Interpretation

• The bias term measures how much our learning model is

biased away from the target function

• In fact, 𝑔 has the benefit of learning from an unlimited number of datasets,

so it is only limited in its ability to approximate 𝑓 by the limitations of the
learning model itself

• The variance term measures the variance in the

Final hypothesis, depending on the data set, and can be thought as how much
the final chosen hypothesis differs from the “mean” (best) hypothesis
Bias and Variance
Very s mall model. Since there is only one
hypothesis, both the average function g and t h e
final hypothesis 𝑔 𝒟 will be the same, for any
dataset. Thus, var = 0. The bias will depend solely
on how well this single hypothesis approximates
the target 𝑓, and unless we are extremely
lucky, we expect a large bias

Very large model. The target function is in ℋ.

Different data sets will led to different hypotheses
that agree with 𝑓 on the data set, and are spread
around 𝑓 in the red region. Thus, bias ≈ 0 because
𝑔 is likely to be close to 𝑓. The var is large
(heuristically represented by the size of the red
region)
Learning Curves
How it is possible to know if a model is suffering from bias or variance problems?
The learning curves provide a graphical representation for assessing this, by
plotting:

• the expected out of sample error ➪𝒟 𝐸𝑜𝑢𝑡 𝑔𝒟

• the expected in sample error ➪𝒟 𝐸𝑖𝑛 𝑔𝒟 𝒟1 𝒟2 𝒟3 𝒟4 𝒟5 𝒟6

with respect to the number of data 𝑁

𝒟
In practice, the curves are computed from one dataset, or by dividing it into
more parts and taking the mean curve resulting from the various sub-datasets
Learning Curves

Expected Error
Expected Error

Number of data points, Number of data points,

Simple model Complex model

Learning Curves
Interpretation
• Bias can be present when the expected error is quite high and 𝐸𝑖𝑛 is similar to
𝐸𝑜𝑢𝑡
• When bias is present, getting more data is not likely to help
• Variance can be present when there is a gap between 𝐸𝑖𝑛 and 𝐸𝑜𝑢𝑡
• When variance is present, getting more data is likely to help

Fixing bias Fixing variance

• Try add more features • Try a smaller set of features
• Try polynomial features • Get more training examples
• Try a more complex model • Regularization
• Boosting • Bagging
Learning Curves:
VC vs.Bias-Variance Analysis
References
1. Provost, Foster, and Tom Fawcett. “Data Science for Business: What you need to know about data mining
and data-analytic thinking”. O'Reilly Media, Inc., 2013.
2. Brynjolfsson, E., Hitt, L. M., and Kim, H. H. “Strength in numbers: How does data driven decision making affect
firm
performance?” Tech. rep., available at SSRN: http://ssrn.com/abstract=1819486, 2011.
3. Pyle, D. “Data Preparation for Data Mining”. Morgan Kaufmann, 1999.
4. Kohavi, R., and Longbotham, R. “Online experiments: Lessons learned”. Computer, 40 (9), 103–105, 2007.
5. Abu-Mostafa, Yaser S., Malik Magdon-Ismail, and Hsuan-Tien Lin. ”Learning from data”. AMLBook, 2012.
6. Andrew Ng. ”Machine learning”. Coursera MOOC. (https://www.coursera.org/learn/machine-learning)
7. Domingos, Pedro. “The Master Algorithm”. Penguin Books, 2016.
8. Christopher M. Bishop, “Pattern recognition and machine learning”, Springer-Verlag New York, 2006.

Entrance Exam Interview Sample Questions (BIC)
No ratings yet
Entrance Exam Interview Sample Questions (BIC)
10 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Wired - March 2017 USA
100% (1)
Wired - March 2017 USA
100 pages
Relation Between Architecture and Fashion
No ratings yet
Relation Between Architecture and Fashion
28 pages
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
No ratings yet
Module 2 - Syllabus: CS 476 Introduction To Machine Learning, Module 2
20 pages
Week 3
No ratings yet
Week 3
56 pages
TheLearningTheory 2
No ratings yet
TheLearningTheory 2
90 pages
Unit 2
No ratings yet
Unit 2
76 pages
hypothesis_in_ml
No ratings yet
hypothesis_in_ml
8 pages
SML_Lecture2
No ratings yet
SML_Lecture2
35 pages
cs229 Notes4 PDF
No ratings yet
cs229 Notes4 PDF
11 pages
Cs 171 18 IntroLearning Old
No ratings yet
Cs 171 18 IntroLearning Old
47 pages
Foundations of Machine Learning: Module 7: Computational Learning Theory
No ratings yet
Foundations of Machine Learning: Module 7: Computational Learning Theory
64 pages
Unit 2
No ratings yet
Unit 2
97 pages
ML 01
No ratings yet
ML 01
24 pages
SupervisedLearning 2 33
No ratings yet
SupervisedLearning 2 33
32 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Lecture5 Learning Theory v1.1
No ratings yet
Lecture5 Learning Theory v1.1
59 pages
Notes
No ratings yet
Notes
125 pages
ML Unit-2 Material Add-On
No ratings yet
ML Unit-2 Material Add-On
82 pages
ML 3
No ratings yet
ML 3
36 pages
01-intro
No ratings yet
01-intro
22 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Statistical Learning Theory
No ratings yet
Statistical Learning Theory
4 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
ML HAND WRITTEN NOTES
No ratings yet
ML HAND WRITTEN NOTES
19 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Gansp Awareness Quiz PDF
No ratings yet
Gansp Awareness Quiz PDF
13 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
MachineLearningMathematics
No ratings yet
MachineLearningMathematics
15 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
07 Intro to ML
No ratings yet
07 Intro to ML
38 pages
ML Lecture 1 Iitg
No ratings yet
ML Lecture 1 Iitg
32 pages
MLSM Lecture2 120923
No ratings yet
MLSM Lecture2 120923
35 pages
Learning From Data
No ratings yet
Learning From Data
402 pages
LearningTheory
No ratings yet
LearningTheory
19 pages
Machine_learning(unit 3)
No ratings yet
Machine_learning(unit 3)
9 pages
UNIT I-Part 2
No ratings yet
UNIT I-Part 2
35 pages
ML MODULE - 1-1
No ratings yet
ML MODULE - 1-1
25 pages
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
No ratings yet
6.036: Intro To Machine Learning: Lecturer: Professor Leslie Kaelbling Notes By: Andrew Lin Fall 2019
50 pages
UNIT 5
No ratings yet
UNIT 5
21 pages
AI-unit-4
No ratings yet
AI-unit-4
91 pages
Module 2
No ratings yet
Module 2
19 pages
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
No ratings yet
Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin
17 pages
ML Sit1305
No ratings yet
ML Sit1305
127 pages
Machine Learning Exploring The Model
No ratings yet
Machine Learning Exploring The Model
17 pages
ML Chapter 7 (CLT) Notes
No ratings yet
ML Chapter 7 (CLT) Notes
59 pages
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
No ratings yet
Fit without fear- remarkable mathematical phenomena of deep learning through the prism of interpolation
51 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
05-1 Supervised Learning
No ratings yet
05-1 Supervised Learning
65 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
U1 - ML
No ratings yet
U1 - ML
5 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
PAC Bayesian Learning Introduction
No ratings yet
PAC Bayesian Learning Introduction
124 pages
ML MU Unit 2
100% (3)
ML MU Unit 2
84 pages
04u Handout
No ratings yet
04u Handout
35 pages
Machine Learning Models: by Mayuri Bhandari
No ratings yet
Machine Learning Models: by Mayuri Bhandari
48 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
computational learning theorem
No ratings yet
computational learning theorem
91 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
DS535 Note 4 (With Marks)
No ratings yet
DS535 Note 4 (With Marks)
18 pages
General Mathematics Pre Test
No ratings yet
General Mathematics Pre Test
6 pages
Andrews Diseases of the Skin Clinical Dermatology 10th Edition William D. James Md - Download the ebook now to never miss important information
100% (1)
Andrews Diseases of the Skin Clinical Dermatology 10th Edition William D. James Md - Download the ebook now to never miss important information
55 pages
Tudor Royal Iconography, by John N. King
No ratings yet
Tudor Royal Iconography, by John N. King
2 pages
Short and Simple Catechesis on the Solemnity of the Immaculate Conception 1
No ratings yet
Short and Simple Catechesis on the Solemnity of the Immaculate Conception 1
2 pages
Suppressed Communities: A Study On Ethnic Discrimination in The Philippines
No ratings yet
Suppressed Communities: A Study On Ethnic Discrimination in The Philippines
11 pages
Cry, The Peacock: Portrayal of Feminine And: Masculine Doctrines
No ratings yet
Cry, The Peacock: Portrayal of Feminine And: Masculine Doctrines
3 pages
Energize2 - G&V - Unit 2 - STANDARD
No ratings yet
Energize2 - G&V - Unit 2 - STANDARD
2 pages
Students Assignments and Research Papers Generate
No ratings yet
Students Assignments and Research Papers Generate
7 pages
WESTPAK Laboratory Package Drop Testing v2.1-1
No ratings yet
WESTPAK Laboratory Package Drop Testing v2.1-1
11 pages
Principles of Language Assessment - Brown
No ratings yet
Principles of Language Assessment - Brown
22 pages
Port Hedland Loading
50% (2)
Port Hedland Loading
3 pages
AGGREGATES NOTES
No ratings yet
AGGREGATES NOTES
5 pages
Bip Complete Case Study
100% (1)
Bip Complete Case Study
5 pages
VIPer32
No ratings yet
VIPer32
1 page
Math8 q1 Mod4 Go Simplifying Rational Algebraic Expressions 08092020
No ratings yet
Math8 q1 Mod4 Go Simplifying Rational Algebraic Expressions 08092020
17 pages
Hemmings Classic Car - February 2023
No ratings yet
Hemmings Classic Car - February 2023
78 pages
Review Article: Spectrophotometric Analysis of Caffeine
No ratings yet
Review Article: Spectrophotometric Analysis of Caffeine
8 pages
Book of Vermilion Fish
No ratings yet
Book of Vermilion Fish
94 pages
PE 10 Q2 2nd Modules
No ratings yet
PE 10 Q2 2nd Modules
9 pages
BDRRMC E.O. 2022
No ratings yet
BDRRMC E.O. 2022
6 pages
Computer Network Handwritten Notes All Units
100% (1)
Computer Network Handwritten Notes All Units
159 pages
Assignment: Ha Noi Open University Faculty of English
No ratings yet
Assignment: Ha Noi Open University Faculty of English
17 pages
Brochure IOT
No ratings yet
Brochure IOT
7 pages
Effective Note Taking MS 04.09.2017 PDF
No ratings yet
Effective Note Taking MS 04.09.2017 PDF
44 pages
Theorist's Toolkit Lecture 1: Probabilistic Arguments
No ratings yet
Theorist's Toolkit Lecture 1: Probabilistic Arguments
7 pages
Relationship of Poisson and Exponential Distributions: FX X X T X e
No ratings yet
Relationship of Poisson and Exponential Distributions: FX X X T X e
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 1-1

Uploaded by

Unit 1-1

Uploaded by

U18ECE0014

Unknown Pattern to be learnt from Data

g : Learnt function from Data

Machine Learning : Uses Data to compute hypothesis g that approximates target f

Can we “learn” from this data?

Can’t make any prediction

The value of µ is unknown to us

This is called Hoeffding’s inequality

The quantity |ν − µ| > 𝜀 is a bad event, we want its probability to be low

Valid for all N and ϵ > 0

Both µ and ν depend on the particular hypothesis ℎ

ν → in-sample error 𝐸𝑖𝑛 (ℎ)

The Out of sa mple error 𝐸𝑜 𝑢 𝑡 (ℎ) is the quantity

Ein(h) = 1 / yn ] (in-sample, known)

Now we can infer Eout (h) from Ein (h)!

Color in each bin depends in different hypothesis

If M is small and N is large enough:

•Validation Set: Sometimes, a separate validation set is used in

ℙ [|𝐸𝑖 n (𝑔)− 𝐸o u t (𝑔)|> 𝜀] ≤ 2𝑀𝑒−2𝜀2𝑁

The quantity 𝐸out(𝑔) − 𝐸𝑖𝑛 (𝑔) is called the generalization error

To reduce M number of hypotheses 𝑀 can be replaced by a finite quantity 𝑚ℋ (𝑁)

▪ This is due to the fact the 𝑀 hypotheses 𝑥2

The VC bound is crucial for understanding the generalization ability of a machine

The VC bound provides a probabilistic upper

Shattering: A set of N Points is said to be

The VC dimension of a model straight line in 2D plane is 3 30 /35

1. 𝐸𝑜𝑢𝑡 𝑔 ≥ 𝐸𝑖 𝑛 𝑔 − Ω 𝑁, ℋ, 𝛿 → Not of much interest 

Model complexity ≈ number of model parameters

There is therefore an approximation - generalization tradeoff between:

• More complex ℋ ⇒ better chances of approximating 𝑓 in sample → if ℋ is

• Less complex ℋ ⇒ better chance of generalizing out of sample → if ℋ is too

Ideal: ℋ = 𝑓 winning lottery ticket ☺

• Perfect fit on in sample (training) data

• Low fit on out of sample (test) data

H is too simple → fail to approximate f and large Ein

Selection of hypothesis H should have two requirements

In Bias variance decomposition, we split Eout into bias and variance

Bias-variance analysis is another: decomposing 𝐸𝑜𝑢𝑡 into:

1. How well ℋ can approximate 𝑓 → Bias

It applies to real valued targets and uses squared error

→ this is a conceptual tool, and 𝑔ҧ does not need

• The bias term measures how much our learning model is

• In fact, 𝑔 has the benefit of learning from an unlimited number of datasets,

• The variance term measures the variance in the

Very large model. The target function is in ℋ.

• the expected out of sample error ➪𝒟 𝐸𝑜𝑢𝑡 𝑔𝒟

with respect to the number of data 𝑁

Number of data points, Number of data points,

Simple model Complex model

Fixing bias Fixing variance

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.