Unit 1-1
Unit 1-1
MACHINE LEARNING
Fundamentals of Machine Learning
S.Arun Kumar
Department of Electronics and
Communication Engineering
Overview of this
Lecture
• Components of
Learning
• Fundamentals of
Machine Learning
• Feasibility Of
Learning
• VC Bound
• Bias-Variance
tradeoff
• Learning Curves
Components of Learning
Components of Learning
Training Examples
Red Marble
Green Marble
Connection to Learning
How to connect this to learning?
Each marble (uncolored) is a data point x ∈ X
red ball: h(x) f (x) (h is wrong)
green ball: h(x)=f (x) (h is correct)
• A hypothesis is an assumption, an idea that is proposed for the sake of argument so that
it can be tested to see if it might be true.
• A research hypothesis is a statement of expectation or prediction that will be tested by
research
• For any fixed ‘h’ can probably infer unknown Eout(h) from known Ein(h).
.
For a particular function h:
Eout (h) = P [h(x ) f (x)] (out-of-sample, unknown)
⇔µ
2
Verifying a Hypothesis
2
Verifying a Hypothesis
For any h, when sample size (N) is large:
2
P[|Ein (h) − Eout (h)| > ϵ] ≤ 2e− 2є N
Given a hypothesis h ⇒ sample N data ⇒ Ein(h) to “verify” the
quality of h
Can we apply to mulitple hypothesis?
Apply to multiple bins (hypothesis)
Two questions:
(1) Can we make sure Eout (g) ≈ Ein (g)?
(2) Can we make sure Ein (g) ≈ 0?
Feasibility of Learning –Tradeoff on M
2
P[|Ein (g) − Eout (g)| > ϵ] ≤ 2M e− 2є N
Two questions:
(1) Can we make sure Eout (g) ≈ Ein (g)?
(2) Can we make sure Ein (g) ≈ small enough
M: complexity of model
Small M: (1) holds, but (2) may not hold (too few choices)
(under-fitting)
Large M: (1) doesn’t hold, but (2) may hold (over-fitting)
Training and Testing
• Training involves feeding a machine learning
algorithm with data and allowing it to learn the
relationships within that data.
• The dataset used in training is known as the training
dataset. It contains input-output pairs where the
input data is used by the model to learn and the
output data serves as the target or label.
• Testing evaluates the trained model's performance
on unseen data to gauge its generalization capability.
• The dataset used for testing is known as the testing
dataset or test set. It contains data that was not used
during the training phase
Overfitting and Underfitting
•Overfitting: When a model performs well on training data but
poorly on test data, it means the model has learned the noise
in the training data instead of the underlying pattern.
•Underfitting: When a model performs poorly on both training
and test data, it means the model is too simple to capture the
underlying pattern in the data.
𝑥1
Vapnik- Chervonenkis Bound
The VC bound is crucial for understanding the generalization ability of a machine
learning model, which is its performance on unseen data. 1
ℙ 𝐸𝑖𝑛 𝑔 − 𝐸𝑜𝑢𝑡 𝑔 > 𝜀 ≤ 4𝑚ℋ 2𝑁 𝑒 − 2𝑁
8 𝜀
Vapnik Chervonenkis (VC) Bound
2D PERCEPTRON
VC Dimension
VC Dimension
The VC-dimension is a single parameter that characterizes the growth function
Definition
The Vapnik-Chervonenkis dimension of a hypothesis set ℋ is the max number of
points for which the hypothesis can generate all possible classification dichotomies
VC Dimension
𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑥2 𝑁=3
Max n°dichotomies
2𝑁 = 8
𝑥2
𝑬𝒐𝒖𝒕 ≤ 𝑬in +𝛀
Observations
• 𝐸𝑖𝑛 𝑔 is known
• The penalty Ω can be computed if 𝑑𝑉𝐶 ℋ is known and 𝛿 is chosen
Generalization Bound
Generalization Bound
Generalization Bound Ω=
8
𝑁 ln
4𝑚ℋ 2𝑁
𝛿
Analysis of the generalization bound 𝐸𝑜𝑢𝑡 𝑔 ≤ 𝐸𝑖𝑛 𝑔 + Ω 𝑁, ℋ, 𝛿
The optimal model is a compromise between
𝐸𝑖𝑛 and Ω
Error
Out of sample
error(Test error)
Variance
Model complexity
overfitting
bias In sample
error(training error)
underfitting
VC dimension
The learning algorithm is not obliged to minimize squared error loss. However, we
measure its produced hypothesis’s bias and variance using squared error
Bias and Variance • We can create a graphical visualization of
bias and variance using a bulls-eye diagram.
• Imagine that the center of the target is a
model that perfectly predicts the correct
values.
• As we move away from the bulls-eye, our
predictions get worse and worse.
• Imagine we can repeat our entire model
building process to get a number of separate
hits on the target.
• Each hit represents an individual realization
of our model, given the chance variability in
the training data we gather.
• Sometimes we will get a good distribution of
training data so we predict very well and we
are close to the bulls-eye, while sometimes
our training data might be full of outliers or
non-standard values resulting in poorer
predictions.
• These different realizations result in a scatter
Graphical illustration of bias and variance. of hits on the target.
Bias and Variance
The out of sample error is (making explicit the dependence of 𝑔 on entire dataset 𝒟)
final hypothesis predicted target(mean squared error)
The expected out of sample error of the learning model is independent of the
particular realization of data set used to find 𝑔 𝒟 : in simple words the final hypothesis
will be different for different datasets(D). So, g is dependent on D. To get the expected
value of error with respect to D over N number samples,
Bias and Variance
Focus on Define the “average” hypothesis
This average hypothesis can be derived by imagining many datasets 𝒟1, 𝒟2, … , 𝒟𝑘 &
building it by
Therefore
Bias and Variance
Interpretation
𝒟
In practice, the curves are computed from one dataset, or by dividing it into
more parts and taking the mean curve resulting from the various sub-datasets
Learning Curves
Expected Error
Expected Error