Introduction To Machine Learning
Introduction To Machine Learning
GECD
Lluı́s A. Belanche
belanche@cs.upc.edu
2018-2019
Motivation:
(1/44)
What is this course about?
(2/44)
Examples of learning tasks (ML subfields)
SUPERVISED LEARNING uses labeled data
Classification: predicting a class (or category) to each example (e.g., document
classification); note multi-label, probabilistic generalizations
Regression: predicting a real value for each example (e.g., prediction of ph
concentration); note multi-variable generalization
(3/44)
Introduction to Machine Learning
A system (living or not) learns if it uses past experience to improve
future performance:
(4/44)
Introduction to Machine Learning
I have an idea ...
(5/44)
An example
(6/44)
An example
Key fact: Take a data sample D = {(x1, t1), . . . , (xN , tN )}; for any f
there exists f ∗ s.t.
(7/44)
So what do we do?
(8/44)
Machine Learning in context
(9/44)
Machine Learning in context
Substantial relation to Multivariate Statistics (MS):
Often the goals and problems are similar, and the techniques can
often be rooted in the same theories
(10/44)
Models
A model is a compact description of a data sample, that permits to
make predictions about future examples. Desirable properties of models
include:
2. interpretability (MS,ML?)
(11/44)
Applications of Machine Learning
...
(12/44)
Examples
Playing Checkers
• the task is to learn to win games without violating the rules
• the outcome of the model is a movement given the current situation
• the performance measure is the fraction of games won
• the data is the set of games played against other opponents
(13/44)
Examples
Behaviour of users in a blog
• the task is the prediction of the number of comments in the upcoming 24
hours
• the performance measure is the (square) difference between actual number
and prediction
• the outcome of the model is the number of comments that the blog post
received in the next 24 hours relative to the basetime
• the data are raw HTML-documents of the blog posts (must be crawled and
processed)
Evolution of a patient
• the task is the prediction of a second attack over a period of time
• the performance measure is the fraction of correctly predicted patients
• the outcome of the model is the probability of such event
• the data could be previous patient records (demographic, dietary, clinical ...)
(14/44)
Examples
Ill-posed examples (need a lot of refinement and specific information):
(15/44)
Introduction to Machine Learning
Inductive bias
How can we rule out the more complex one? (and many others)
(16/44)
Introduction to Machine Learning
Formulation
(17/44)
Introduction to Machine Learning
Example 1a: classification of images [easy]
(18/44)
Introduction to Machine Learning
Example 1b: classification of images [medium]
Predict the class (a category) subject to little probabilistic uncertainty, but larger
variety
(19/44)
Introduction to Machine Learning
Example 1c: classification of images [hard]
(20/44)
Introduction to Machine Learning
Example 2: classification of Leukemia types
(21/44)
Introduction to Machine Learning
Example 3: fish classification
A fish processing plant wants to automate the process of sorting incoming fish
according to species (salmon or sea bass)
The system consists of a conveyor belt, a robotic arm, a vision system with an
overhead CCD camera and a computer.
(22/44)
Introduction to Machine Learning
Example 3: fish classification
Given labeled training data coming from some unknown joint probability distribution,
should we predict the new point as salmon or sea bass?
(23/44)
Introduction to Machine Learning
Example 3: fish classification
The goal is to obtain a model based on available training data (known examples)
with high classification accuracy on unseen unknown examples (test data), i.e.
achieving good generalization
(24/44)
Introduction to Machine Learning
Example 4: regression
(25/44)
Introduction to Machine Learning
Example 4: regression
(26/44)
Introduction to Machine Learning
Example 4: regression
(28/44)
Introduction to Machine Learning
Example 5b: clustering [medium]
(29/44)
Introduction to Machine Learning
Example 6: clustering
General difficulties with clustering are large subjectivity and estimation
of the “right” number of clusters:
(30/44)
Introduction to Machine Learning
Example 7: reinforcement learning
(31/44)
Introduction to Machine Learning
Example 8: Dimensionality reduction
Each image has thousands or
millions of pixels
(32/44)
Introduction to Machine Learning
The Rosetta stone
Machine Learning Statistics
model model
parameter/weight parameter/coefficient
training fitting
learning modelling
regression regression
classification discrimination
clustering clustering/classification
inputs/features/variables independent variables
explanatory variables
predictors
outputs/targets dependent variables
response variables
instances/examples individuals/observations
error/loss function fit criterion, deviance
(33/44)
Introduction to Machine Learning
Prediction vs. Inference
Inference:
(34/44)
Introduction to Machine Learning
Example: Direct mailing
What do we pretend?
(35/44)
Model selection
(36/44)
Resampling methods
10-fold CV (K = 10)
(from https://chrisjmccormick.files.wordpress.com/2013/07/10\_fold\_cv.png)
(37/44)
Resampling methods
Basic method:
Non-standard data (images, audio, text, ...) may need competely ad hoc treatments
(39/44)
Recommended reading: introductory
http://www-bcf.usc.edu/$\sim$gareth/ISL/
(40/44)
Recommended reading: intermediate
https://mitpress.mit.edu/books/introduction-machine-learning-0
(41/44)
Recommended reading: standard level
Pattern Recognition and Machine Learning, Christopher M. Bishop,
Springer, 2006
http://research.microsoft.com/~cmbishop/PRML
Pattern Classification (2nd Ed.), Richard 0. Duda and Peter E. Hart and
David G. Stork, Wiley-Interscience, 2001.
http://rii.ricoh.com/~stork/DHS.html
(43/44)
Making the best out of R
https://cran.r-project.org/web/views/MachineLearning.html
https://support.rstudio.com/hc/en-us/articles/
201057987-Quick-list-of-useful-R-packages
https://awesome-r.com/#awesome-r-machine-learning
https://github.com/ujjwalkarn/DataScienceR
https://www.tidyverse.org/