TheLearningTheory 2
TheLearningTheory 2
TheLearningTheory 2
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 2
Predict how a viewer will rate a movie
• A pattern exists.
27.04.2023 3
The Components of Learning
Applicant information:
age 23 years
gender male
annual salary $30,000
years in residence 1 year
years in job 1 year
current debt $15,000
··· ···
Approve credit?
27.04.2023 4
The Components of Learning
Formalization:
↓ ↓ ↓
27.04.2023 5
The Components of Learning
UNKNOWN TARGET FUNCTION
f: X → Y
TRAINING EXAMPLES
(x1 , y1 ), ... , ( xN , yN)
LEARNING FINAL
HYPOTHESIS
ALGORITHM
g≈f
A
(final credit approval formula)
HYPOTHESIS SET
H
27.04.2023 6
Solution Components
TRAINING EXAMPLES
HYPOTHESIS SET
H
learning model
27.04.2023 7
A simple Hypotesis Set – the Perceptron
27.04.2023 8
A simple Hypotesis Set – the Perceptron
d
h(x) = sign Σ wi x i
i= 1
− w0
27.04.2023 9
A simple learning Algorithm – the PLA
27.04.2023 10
Iterations of the PLA
27.04.2023 11
Related Experiment
- We pick N marbles
= probability
independently. of red marbles
bottom
27.04.2023 12
Does say anything about µ?
top
No!
BIN
Sample can be mostly green while
bin is mostly red.
SAM PLE
Y es! = fraction
of red marbles
Sample frequency is likely close to
bin frequency µ.
= probability
of red marbles
bottom
P ossible Probable
versus
27.04.2023 13
What does say anything about µ?
Formally,
http://cs229.stanford.edu/extra-notes/hoeffding.pdf
27.04.2023 14
What does say anything about µ?
top
BIN
• Valid for all N and 𝜖
• Bound does not depend on
S A M P LE
µ = fraction
• Tradeoff : N, 𝜖, and the bound.
of red marbles
= probability
• 𝜈 ≈µ ⇒ µ≈𝜈 of red marbles
bottom
27.04.2023 15
Connection to learning?
f: X Y DISTRIBUTION
P on X
TRAINING EXAMPLES
x1 , ... , xN
(x1 , y1 ), ... , ( xN , yN)
X HYPOTHESIS SET
LEARNING
ALGORITHM
A
FINAL
HYPOTHESIS
g~f
Hi H
27.04.2023 16
Connection to learning?
(x)
27.04.2023 17
In sample – Out of sample Error
Hi
Ein(h)
Hi
27.04.2023 18
Multiple Bins
h1 h2 hm
M
1 2
........
1 2 M bottom
27.04.2023 19
Multiple Bins
W h a t ?
B IN N h1 h2 hM
TRAINING EXAMPLES P on X
(x1 , y1 ), ... , ( xN , yN)
x1 , ... , xN Eout( h1) Eout( h2) Eout(hM )
S A M P LE
→ →
. . . . . . ..
FINAL
= fraction
LEARNING
HYPOTHESIS
ALGORITHM
g~f
of red marbles A
of red marbles
bottom
27.04.2023 20
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 21
Multiple Bins
Answer: ≈ 63%
27.04.2023 22
From Coins to Learning
hi
BINGO ? Hi
27.04.2023 23
A simple Solution – Union Bound
27.04.2023 24
Final Verdict
27.04.2023 25
The learning Diagram – where we left it
27.04.2023 26
Data Distribution
27.04.2023 27
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 28
The learning Diagram – where we left it
Examples:
Squared error: e (h(x), f (x)) = (h(x) − f (x))2
27.04.2023 29
Overall Error
In-sample error:
Out-of-sample error:
27.04.2023 30
Diagram with pointwise error
27.04.2023 31
How to Choose the Error Measure
Fingerprint Verification:
f
+1 −1
+1 no error false accept
h
−1 false reject no error
27.04.2023 32
The Error Measure – In the Supermarket
+1 y ou
False accept is minor; gave away a discount and f
-1 intruder
intruder left their fingerprint
f
+1 −1
+1 0 1
h
−1 10 0
27.04.2023 33
The Error Measure – CIA
f
+1 −1
+1 0 1000
h
−1 1 0
27.04.2023 34
The Error Measure
27.04.2023 35
The Error Measure
27.04.2023 36
Take Home Lesson
27.04.2023 37
Error Measure
27.04.2023 38
Noisy Targets
age 23 years
annual salary $30,000
years in residence 1 year
years in job 1 year
current debt $15,000
· ·· ···
27.04.2023 39
Target Distribution
P (y|x)
27.04.2023 40
The learning Diagram + Noisy Targets
27.04.2023 41
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 42
Distinction between P (y|x) and P (x)
27.04.2023 43
What we know so far
E o u t (g) ≈ E in (g)
Is this learning?
E o u t (g) ≈ 0
27.04.2023 44
2 Questions of Learning
E o u t (g) ≈ 0
is achieved through:
E o u t (g) ≈ E i n (g) and E i n (g) ≈ 0
27.04.2023 45
Training Setting
27.04.2023 46
Where did M come from?
B1 B2
B3
27.04.2023 47
Can we improve M ?
Eout
Eout
27.04.2023 48
Can we replace M with ?
27.04.2023 49
Dichotomies: mini-hypotheses
A hypothesis h : X → {−1,+1}
27.04.2023 50
The growth function
27.04.2023 51
Applying mH(N ) definition - perceptrons
N= 3 N= 3 N= 4
mH(3) = 8 mH(4) = 14
27.04.2023 52
Positive Rays
h(x) = −1 h(x) = +1
a
x1 x2 x3 ... xN
H is set of h: R → {−1,+1}
27.04.2023 53
Positive Intervals
H is set of h: R → {−1,+1}
27.04.2023 54
Convex sets
up +
H is set of −
h : R2 → {−1, +1} +
+
h(x) = +1 is convex −
−
−
mH(N ) = 2N
The N points are 'shattered' by convex +
− +
sets bottom
27.04.2023 55
The 3 growth functions
• H is positive rays:
mH(N ) = N + 1
• H is positive intervals:
1 1
mH(N ) = N 2 + N + 1
2 2
• H is convex sets:
mH(N ) = 2N
27.04.2023 56
Back to the big picture
27.04.2023 57
Breakpoint of H
D e fin i t i o n :
If no dataset of size k can be
shattered by H, then k is a break point
for H
mH(k)< 2k
For 2D perceptrons, k = 4
27.04.2023 58
Breakpoints, 3 Examples
• H is positive rays:
mH(N ) = N + 1 break point k = 2
• •
• H is positive intervals:
break point k = 3 • • •
1 2 1
mH(N ) = N + N+ 1
2 2
• H is convex sets:
mH(N ) = 2N break point k =`∞'
27.04.2023 59
Main Result
27.04.2023 60
Putting it altogether
Not quite :
But more:
T h e Va p n i k - C h e r v o n e n k i s I n e q u a l i t y
https://web.eecs.umich.edu/~cscott/past_courses/eecs598w14/notes/05_vc_theory.pdf
27.04.2023 61
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 62
Definition of VC dimensions
mH(N) = 2N
t h e m o s t p o i n ts H c a n
shatter
27.04.2023 63
Examples
•
• H is p ositive rays: dvc = 1
•
• H is 2 D p e r ceptrons: dvc = 3 • •
up
bottom
27.04.2023 64
VC Dimensions and Learning
LEARNING FINAL
f u n ct i o n
27.04.2023 65
Degrees of Freedom
27.04.2023 66
Not just parameters
down
x y
down
27.04.2023 67
Number of Datapoints needed
27.04.2023 68
Visual Representation
0
10
R u l e of t h u m b :
10−5
N ≥ 10 dvc
20 40 60 80 100 120 140 160 180 200
27.04.2023 69
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 70
Rearranging Things
27.04.2023 71
Generalization Bound
With probability ≥ 1 − δ,
With probability ≥ 1−
δ,
27.04.2023 72
Generalization Bound
With probability ≥ 1 − δ,
With probability ≥ 1−
δ,
Subject of Regularization
27.04.2023 73
Approximation and Generalization Tradeoff
Ideal H = {f}
27.04.2023 74
Quantifiying the Tradeoff
27.04.2023 75
Starting with Eout
To evaluate:
27.04.2023 76
Bias and Variance
27.04.2023 77
The Tradeoff
↓ H↑ ↑
27.04.2023 78
Example: Sine Target
f : [−1, 1]→ R f (x) = sin(πx)
1.5
0.5
H1: h(x) = ax + b −1
−1.5
27.04.2023 79
Approximation
H0 versus H1
H0 H1
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
27.04.2023 80
Learning
H0 versus H1
H0 H1
2 2
1.5 1.5
1 1
0.5 0.5
0 0
−0.5 −0.5
−1 −1
−1.5 −1.5
−2 −2
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
27.04.2023 81
Bias and Variance
y
y
ḡ(x )
sin(πx)
x x
27.04.2023 82
Bias and Variance – H1
ḡ(x )
y
y
sin(πx)
x x
27.04.2023 83
And the winner is
H0 H1
ḡ(x )
y
y
ḡ(x )
sin(πx) sin(πx)
x x
bias = 0.50 var = 0.25 bias = 0.21 var = 1.69
27.04.2023 84
Lesson Learned
to the d a t a resources,
not to the t ar g e t complexity
27.04.2023 85
Agenda: The Learning Theory
• Is Learning Feasible
• Theory of Generalization
• The VC Dimension
• Bias-Variance Tradeoff
• Overfitting
27.04.2023 86
Expected Eout and Ein
27.04.2023 87
The curves
Eout
Expcted Error
Expeted Error
E in
Eout
E in
Number of Data Points, N Number of Data Points, N
S i m p l e M o del C o m p l e x M o del
§
27.04.2023 88
What the Theory will achieve
model complexity
Error
Characterizing the tradeoff :
27.04.2023 89
Sources
https://work.caltech.edu/telecourse.html
http://gruber.userweb.mwn.de/17.18.statlearn/17.18.statlea
rn.html
27.04.2023 90