(PDF) Introduction To Machine Learning PDF
(PDF) Introduction To Machine Learning PDF
LzÎuctionw
= *ț '
Lior Rokach
Department of Information Systems Engineering
Ben-Gurion University of the Negev
bo
Prof. Lior Rokach
Department of Information Systems Engineering
Faculty of Engineering Sciences
Head of the Machine Learning Lab
Ben-Gurion University of the Negev ”,;
Email: Iiorrk@bgu.ac.ii
htt www.ise.b u.ac.ii facult
'1
• ood of available data (especial with the
advent of the Internet)
• Increasi computational power
• rowing progress in availa e algorithms and
theory developed by researchers
• Increasing support from industries
lica i
ess
Nat a¿;g age p
ce SIS
BioinformaticsE mmece SS•eee nit
Genneeeexxprreessssiioonn •° Med icine 'r
, ‹ „ y›Lf,•Collaborative filtering object recognition
Th co ce lea ing " syste
• Learn —Im with ex erience at some
task
Improve over task 7,
—With respect to performance measure, P
Based on experience, E.
Motivating Example
Learning to Filter Spam
Example: Spam Filtering
Spam - is all email the user does not
want to receive and has not asked to
receive
T: Identify Spam Emails
The
‘real world"
Th Le rning rocess i Exa I
The łys
"real world' results
ensozs
y Ąumber of recipients
y Size of message
t hjumbęr of ąttąchments
y h|umDer of "re's ' in the
subject line
Email Server
ata Set
Target
Input Attributes
Attribute
Learner Classifier
Database
lnducer Classification Model
Training Set
Induction Algorithm
Ste I Testi
Database Learner
Training Set lnducer
Classifier
Induction Algorithm
Classification Model
Learning Algorithms
Apñori
eguarizat
,•Committee
{ Ensemble agatioo
Activatiixi function
ans :
Nor Spam (Ham)
Spam
2 3 4 5
Num ber of New Recipients
Nor Spam (Ham)
Spam
Spam
New Recipients
How would you
classify this data?
New Recipients
ew il i sent
1. We first place the new email in the space
2. Classify it according to the subspace in which it resides
New Recipients
How would you
classify this data?
New Recipients
How would you
classify this data?
New Recipients
How would you
classify this data?
New Recipients
Any of these would
be fine..
New Recipients
rgi
ew Recipients
The maximum
margin linear
classifier is the
linear classifier with
the, maximum
margin.
This is the simplest
kind of SVM (Called
an LSVM)
New Recipients
Linear SVM
I ca cove II
i sta ces
New Recipients
• Ideal the best decision bounda shou be the
one which provides an optimal performance such as
in the followi figure
I ca cove II
i sta ces
New Recipients
• owever, our satisfaction is premature
because the central aim of esigning a
classifier is to correctly classify novel input
Issue of generalization!
New Recipients New Recipients
2 Errors 0 Errors
Simple model Complicated model
Eval i t’s Le
1. We randomly select a portion of the data to be used for training (the
training set)
2. Train the model on the training set.
3. Once the model is trained, we run the model on the remaining instances
(the test set) to see how it performs
Confusion Matrix
Classified As
Blue Red
Blue 7 1
Red 0 5
New Recipients
S$U î* ! d ! * * b /*\ * N
as ¡ eda !
S$U î* ! d ! * * b /*\ * N
as ¡ eda !
S$U î* ! d ! * * b /*\ * N
as ¡ eda !
s\ ua!d!*ãg MãN
as ¡ edas i
La Le rs
• eneralization beyond e training ata is
elayed until a new instance is provi ed to e
system
Training Set
s\ua !d!*ãg u ã N
țSãJ N-1 •
i i re
A flow-chart-like tree structure
Internal node denotes a test on an attribute
Branch represents an outcome of the test
Leaf nodes represent class labels or class distribution
W c/! i‹› i t i’‹ rlJvirlr f.I r fi.'.ttt jim sț›n‹'r! /ii›t.n ‹sxi țxzratlel i’ix:I iiglx„ ai rl Înî›rl ‹•: c:h/rer't»iigIr
. .0. . ? . J . .e xI
To Down I cti of i Trees
Email Len
<1.8 1.8
1 Error 8 Errors
New Recipients
A single level decision tree is also known as
Decision Stump
To Down I cti of i Trees
Email Len
<1.8 1.8
1 Error 8 Errors
New Recip
4 Errors 11 Errors
To Down I cti of i Trees
Email Len
<1.8 1.8
1 Error 8 Errors
New Recip
4 Errors 11 Errors
To Down I cti of i Trees
Email Len
<1.8 1.8
Emaíì Len
<1.8 1.8
Emaíì Len
New Recip
New Recipients
0 Errors 1 Error
•., . „,
Tree Size
Overtraining: means that it learns the training set too well —it overfits to the
training set such that it performs poorly on the test set.
Underfitting: when model is too simple, both training and test errors are large
I etwo
Inputs
Output
Age .6
0.6
Gender
of
beingAlive
Stage
Dependent
Independent Weights HiddenLaye Weights variable
variables r
Prediction
“TO I I ISU I
Inputs
Output
Age .6
0.6
Gender
of
beingAlive
Stage
Dependent
Independent Weights HiddenLaye Weights variable
variables
Prediction
Inputs
Age
0.6
Gender
“Probability of
beingAlive
Stage
Dependent
Independent Weights HiddenLave Weights variable
variables r
Prediction
Inputs
Output
Age .6
0.6
Gender
of
beingAlive
Stage
Dependent
Independent Weights HiddenLaye Weights variable
variables r
Prediction
Age .6
0.6
Gender
of
beingAlive
Stage
Dependent
Independent Weights HiddenLaye Weights variable
variables r
Prediction
Ensemble Learning
• The idea is to use multiple models to obtain
better predictive performance than could be
obtained from any of the constituent models.
D=1
Less is More
The Curse of Dimensionality
• Learning from a high-dimensional feature space requires an
enormous amount of training to ensure that there are
several samples with each combination of values.
• With a fixed number of training instances, the predictive
power reduces as the dimensionality increases.
• As a counter-measure, many dimensionality reduction
techniques have been proposed, and it has been shown
that when done properly, the properties or structures of
the objects can be well preserved even in the lower
dimensions.
• Nevertheless, naively applying dimensionality reduction
can lead to pathological results.
While dimensionality reduction is an important tool in machine learning/data mining, we
must always be aware that it can distort the data in misleading ways.
A cloud of points in 3D
In 2D ”‘. we see
a triangle
In 2D YZ we see
a square
In 2D XY we see
a circle
Less is More?
• In the past the published advice was that high
dimensionality is dangerous.
• But, Reducing dimensionality reduces the amount
of information available for prediction.
• Today: try going in the opposite direction: Instead
of reducing dimensionality, increase it by adding
many functions of the predictor variables.
• The higher the dimensionality of the set of
features, the more likely it is that separation
occurs.
Meaningfulness of Answers
• Pairs of people:
5x 10 17
• Consider:
Ot Le ing Ta ks
New Recipients
is Le rni i I
Multi-label learning refers to the classification problem where each example can be
assigned to multiple class labels simultaneously
ew Recipients
ise rning Regre i
Find a relationship between a numeric dependent variable and one or more
independent variables
New Recipients
ise ing I steri
Clustering is the assignment of a set of observations into subsets (called
clusters) so that observations in the same cluster are similar in some sense
New Recipients
ise ing—An etecti
Detecting patterns in a given data set that do not conform to an established
normal behavior.
New Recipients
Source of Training Data
• Provided random examples outside of the learner’s control.
— Passive Learning
— Negative examples available or only positive? Semi-Supervised Learning
— Imbalanced
• Good training examples selected by a "benevolent teacher."
— "Near miss" examples
Learner can query an oracle about class of an unlabeled example in the
environment.
— Active Learning
• Learner can construct an arbitrary example and query an oracle for its label.
Learner can run directly in the environment without any human guidance and
obtain feedback.
— Reinforcement Learning
• There is no existing class concept
— A form of discovery
— Unsupervised Learning
• Clustering
• Association Rules
Other Learning Tasks
• Other Supervised Learning Settings
— Multi-Class Classification
— Multi-Label Classification
— Semi-supervised classification —make use of labeled and unlabeled data
— One Class Classification —only instances from one label are given
• Ranking and Preference Learning
• Sequence labeling
• Cost-sensitive Learning
• Online learning and Incremental Learning- Learns one instance at a
time.
• Concept Drift
• Multi-Task and Transfer Learning
• Collective classification —When instances are dependent!
Software
RapidMiner
Want to Le re
Want to Learn More?