0% found this document useful (0 votes)
57 views

Introduction To Machine Learning

Uploaded by

Charlie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views

Introduction To Machine Learning

Uploaded by

Charlie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 45

Aprenentatge Automàtic 1

GECD

Lluı́s A. Belanche
belanche@cs.upc.edu

Soft Computing Research Group


Departament de Ciències de la Computació (Computer Science Department)
Universitat Politècnica de Catalunya - Barcelona Tech

2018-2019

LECTURE 1: Introduction to Machine Learning


What is this course about?

Motivation:

We want human-like behaviour in machines (specially, in compu-


ters!): adaptable (flexible), fast, reliable and automatic

Biological systems deal with imprecision, partial truth, uncertainties,


noise, contradictions, ... mostly in a data-driven fashion (think of a
baby learning to walk or talk)

They also make predictions in intricate ways by own behavioral mo-


dels (learnt by experience & almost impossible to verbalize)

Very difficult to achieve by direct programming (brittleness)

(1/44)
What is this course about?

Machine learning (ML) is a field that lies at the intersection of


statistics, probability, computer science, and optimization. The main
goal is to explore automatic methods for inferring models from
data (e.g., finding structure, making predictions).

(2/44)
Examples of learning tasks (ML subfields)
SUPERVISED LEARNING uses labeled data
Classification: predicting a class (or category) to each example (e.g., document
classification); note multi-label, probabilistic generalizations
Regression: predicting a real value for each example (e.g., prediction of ph
concentration); note multi-variable generalization

UNSUPERVISED LEARNING does not use (or have) data labels


Clustering: discovering homogeneous groups in data (clusters)
Dimensionality reduction: finding lower-dimensional data representations
Density estimation: estimating the probabilistic mechanism that generates data
Novelty detection: finding anomalous/novel/outlying data

SEMI-SUPERVISED LEARNING uses partly labeled data


Ranking: ordering examples according to some criterion (e.g., web pages
returned by a search engine).
Reinforcement: delayed rewarding (e.g., finding the way out in a maze)

TRANSFER LEARNING learning in a new task through the transfer of knowledge


from a related task that has already been learned.

(3/44)
Introduction to Machine Learning
A system (living or not) learns if it uses past experience to improve
future performance:

1. Acquiring more knowledge (or more abilities) with time, and

2. Reorganizing this knowledge such that some problems are solved:

a) in a more efficient way (using less resources) or

b) in a more effective way (higher performance standards)

(4/44)
Introduction to Machine Learning
I have an idea ...

Let the machine ...

1. learn the information contained in a data sample (the experience);

2. build a model that summarizes the regularities in the sample, and

3. use it to answer future queries

−→ This is Machine Learning

(5/44)
An example

(example due to Jason Weston)

Suppose we have a dataset D of 50 images of elephant faces and 50


of tiger faces, which we digitize into 100 × 100 pixel RGB images, so
we have x ∈ {0, . . . , 255}d where d = 3 · 104

Given a new image, we want to answer the question: is it an


elephant or a tiger? [we assume it is one or the other]

(6/44)
An example

Define a classifier as a function f : Rd → {−1, +1}

Key fact: Take a data sample D = {(x1, t1), . . . , (xN , tN )}; for any f
there exists f ∗ s.t.

1. f and f ∗ coincide in all the N images in D

2. f and f ∗ differ in at least one of all possible images (not in D)

Moral: ML is about learning general structure from data (the data


regularities that will appear in any data sample)

(7/44)
So what do we do?

Moral (more formal): Based on training data D only, there is no means


of choosing which function f is better (generalization is not
“guaranteed”)

Consequence: we must add control to the “fitting ability” of our


methods (complexity control)

training/empirical/apparent/resubstitution/in-sample error (in D)


true/generalization/out-sample error (in all possible images)

true error (f ) ≤ training error (f ) + complexity of f

(8/44)
Machine Learning in context

Machine Learning has strong bridges to other disciplines:

Statistics: inferential statistics, distribution and sampling theory,


mathematical statistics

Data Mining: very large data bases, interest in high-level knowledge

Mathematics: optimization, numerical methods, asymptotics, ...

Algorithmics: correctness, complexity, ...

Artificial Intelligence: general aims at “intelligent” (?) behaviour

(9/44)
Machine Learning in context
Substantial relation to Multivariate Statistics (MS):

Many classical techniques in MS are linear in nature: PCA, logistic


regression, ridge and linear regression, Fisher’s discriminant analysis,
Canonical-correlation analysis, Factor Analysis, PLS, ...

Many classical techniques in ML are non-linear: neural networks,


kernel methods, random forests, ...

Often the goals and problems are similar, and the techniques can
often be rooted in the same theories

Modern MS is lagging behind in the analysis of complex data

(10/44)
Models
A model is a compact description of a data sample, that permits to
make predictions about future examples. Desirable properties of models
include:

1. good generalization (MS, ML)

2. interpretability (MS,ML?)

3. amenable to inference (MS)

4. sparsity (MS?, ML)

5. efficiency (time, space) (MS?, ML)

6. A common aim is to produce good models!

(11/44)
Applications of Machine Learning

Machine perception: image analysis, speech analysis, face and


handwritten recognition, image and video captioning, ...

Natural language: translation, understanding, generation, ...

Business applications: fraud detection, credit concession, network


intrusion, stock market analysis, ...

Scientific tasks: bioinformatics, chemoinformatics, microbiology,


geology, astronomy, medical diagnosis, ...

Web analysis: hypertext, blogs, e-mail, social networks: recommender


systems, sentiment analysis, community detection, ...

...
(12/44)
Examples

Need to identify task, outcome, performance measure and data:

Playing Checkers
• the task is to learn to win games without violating the rules
• the outcome of the model is a movement given the current situation
• the performance measure is the fraction of games won
• the data is the set of games played against other opponents

Identification of handwritten digits


• the task is to learn to recognize a member of the set {0, 1, ..., 9}
• the performance measure is the fraction of correctly recognized digits
• the outcome of the model is a probability distribution or a hard decision
• the data could be a set of human labelled digits (e.g., from the USPS)

(13/44)
Examples
Behaviour of users in a blog
• the task is the prediction of the number of comments in the upcoming 24
hours
• the performance measure is the (square) difference between actual number
and prediction
• the outcome of the model is the number of comments that the blog post
received in the next 24 hours relative to the basetime
• the data are raw HTML-documents of the blog posts (must be crawled and
processed)

Evolution of a patient
• the task is the prediction of a second attack over a period of time
• the performance measure is the fraction of correctly predicted patients
• the outcome of the model is the probability of such event
• the data could be previous patient records (demographic, dietary, clinical ...)

(14/44)
Examples
Ill-posed examples (need a lot of refinement and specific information):

classify my incoming e-mail

predict the result of a basketball game

predict who is going to win a video game

predict whether this song is going to become a hit

Wrong examples (non-sensical for different reasons):

sort an array of integers

predict the lottery

predict the next digit in the decimal expansion of π

predict the terminal velocity of a falling object

(15/44)
Introduction to Machine Learning
Inductive bias

Complete the series! 2, 4, 6, 8, ...

Answer 1: 132 (model 1: f (n) = n4 − 10n3 + 35n2 − 48n + 24)

Answer 2: 10 (model 2: f (n) = 2n)

How can we rule out the more complex one? (and many others)

1. Supply more “training” data: 2, 4, 6, 8, 10, 12, 14, ...

2. Regularize: add a penalty to higher-order terms

3. Reduce the hypothesis space (e.g. restrict to quadratic models)

(16/44)
Introduction to Machine Learning
Formulation

X are the measured variables


Z are the non-measured variables
y is the true function
y’ is the modeled function

(17/44)
Introduction to Machine Learning
Example 1a: classification of images [easy]

Predict the class (a category) subject to little probabilistic uncertainty

Little chance to “hand-craft” a solution, without learning

Note heavy pre-processing

(18/44)
Introduction to Machine Learning
Example 1b: classification of images [medium]

Predict the class (a category) subject to little probabilistic uncertainty, but larger
variety

Negligible chance to “hand-craft” a solution, without learning

Note some pre-processing, unique label

(19/44)
Introduction to Machine Learning
Example 1c: classification of images [hard]

Predict the class (a category) subject to some probabilistic uncertainty, general


background, position, size, occlusion, illumination, ...

Null chance to “hand-craft” a solution, without learning

Note no pre-processing, multi-label


(all images from http://pascallin.ecs.soton.ac.uk/challenges/VOC/databases.html)

(20/44)
Introduction to Machine Learning
Example 2: classification of Leukemia types

38 training instances, 34 test instances, ∼ 7, 000 genes

ALL: Acute Lymphoblastic Leukemia; AML: Acute Myeloid Leukemia

Results on test data: 33/34 correct (the 1 error may be misslabeled)

Very small dataset, high-dimensional, large irrelevance and redundancy, probably


subject to some probabilistic uncertainty

(21/44)
Introduction to Machine Learning
Example 3: fish classification
A fish processing plant wants to automate the process of sorting incoming fish
according to species (salmon or sea bass)

The system consists of a conveyor belt, a robotic arm, a vision system with an
overhead CCD camera and a computer.

After some preprocessing, each fish is characterized by two features: average


lightness and length: subject to heavy probabilistic uncertainty

(from Pattern Classification)

(22/44)
Introduction to Machine Learning
Example 3: fish classification
Given labeled training data coming from some unknown joint probability distribution,
should we predict the new point as salmon or sea bass?

(23/44)
Introduction to Machine Learning
Example 3: fish classification

The goal is to obtain a model based on available training data (known examples)
with high classification accuracy on unseen unknown examples (test data), i.e.
achieving good generalization

(24/44)
Introduction to Machine Learning
Example 4: regression

Predict some quantitative outcome subject to probabilistic uncertainty

Example: predict gas mileage (mpg) of a car as a function of horsepower (HP)

(auto-mpg data set from UCI Machine Learning Repository)

(25/44)
Introduction to Machine Learning
Example 4: regression

We can start by fitting a straight line to explain the relationship ...

(26/44)
Introduction to Machine Learning
Example 4: regression

We can then fit a quadratic function ...

Is it a better fit? Will it lead to better predictions?


(27/44)
Introduction to Machine Learning
Example 5a: clustering [easy]

Group unlabeled data into (non?)-overlapping subsets (clusters), according to a


similarity measure

Large intra-cluster (within) similarity and small inter-cluster (between) similarity

Many times similarity is just the inverse of a (metric) distance

(28/44)
Introduction to Machine Learning
Example 5b: clustering [medium]

(29/44)
Introduction to Machine Learning
Example 6: clustering
General difficulties with clustering are large subjectivity and estimation
of the “right” number of clusters:

(30/44)
Introduction to Machine Learning
Example 7: reinforcement learning

No supervised output but sequence of (delayed) rewards

Example: robot in a maze

(31/44)
Introduction to Machine Learning
Example 8: Dimensionality reduction
Each image has thousands or

millions of pixels

Can we give each image a

coordinate, such that (only)

similar images are nearby?


The LLE algorithm

(32/44)
Introduction to Machine Learning
The Rosetta stone
Machine Learning Statistics
model model
parameter/weight parameter/coefficient
training fitting
learning modelling
regression regression
classification discrimination
clustering clustering/classification
inputs/features/variables independent variables
explanatory variables
predictors
outputs/targets dependent variables
response variables
instances/examples individuals/observations
error/loss function fit criterion, deviance

Careful with the words: transaction (means observation), sample (means


data set), attribute (means variable), ...

(33/44)
Introduction to Machine Learning
Prediction vs. Inference

Prediction: produce a good estimate for the predicted variable

Inference:

1. Which predictors actually affect the predicted variable?

2. How strong are these dependencies?

3. Are these relationships positive or negative?

(34/44)
Introduction to Machine Learning
Example: Direct mailing

Predicting how much money an individual will donate (the response)


based on observations from 90, 000 people on which we have recorded
over 400 different characteristics (the predictors)

What do we pretend?

1. For a given individual should I send out an e-mail (yes/no)?

2. What is the probability that a specific individual will donate?

3. What is the expected donation for a specific individual?

4. What are the characteristics more strongly linked to donation?

5. How much increase in donation is associated with a given increase in a specific


predictor?

(35/44)
Model selection

We need to perform three different tasks:


1. Fit models to data (estimate coefficients)
2. Choose one of these models (based on prediction error)
3. Estimate the true performance of the chosen model

(36/44)
Resampling methods

10-fold CV (K = 10)

(from https://chrisjmccormick.files.wordpress.com/2013/07/10\_fold\_cv.png)

(37/44)
Resampling methods

Basic method:

Use training data used to fit models (parameter optimization)

Use validation data to average prediction errors and choose the


model with the lowest prediction error (model selection or
hyper-parameter optimization)

The chosen model is refit using the full learning (training +


validation) data

Test data is used to estimate true performance of the chosen model

Other sophistications are double CV, bootstrapping, iterated LE/TE


resapling and repeated CV
(38/44)
On data pre-processing
Each problem requires a different approach in what concerns data
cleaning and preparation. This pre-process is very important because it
can have a deep impact on performance; it can easily take you a
significant part of the time.
1. treatment of lost values (missing values)

2. treatment of anomalous values (outliers)

3. treatment of incoherent or incorrect values

4. coding of non-continuous or non-ordered variables

5. possible elimination of irrelevant or redundant variables (feature selection)

6. creation of new variables that can be useful (feature extraction)

7. normalization of the variables (e.g. standardization)

8. transformation of the variables (e.g. correction of serious skewness and/or


kurtosis)

Non-standard data (images, audio, text, ...) may need competely ad hoc treatments

(39/44)
Recommended reading: introductory

A free online version of An Introduction to Statistical Learning, with


Applications in R by James, Witten, Hastie and Tibshirani (Springer,
2013) is available from January 2014.

Springer has agreed to this, so no need to worry about copyright.


However, you may not distribute printed versions of this pdf file.

http://www-bcf.usc.edu/$\sim$gareth/ISL/
(40/44)
Recommended reading: intermediate

Introduction to Machine Learning (3rd Ed.), by E. Alpaydin (The


MIT Press, 2009)

There are several editions (the latest, the better)

https://mitpress.mit.edu/books/introduction-machine-learning-0
(41/44)
Recommended reading: standard level
Pattern Recognition and Machine Learning, Christopher M. Bishop,
Springer, 2006
http://research.microsoft.com/~cmbishop/PRML

Pattern Classification (2nd Ed.), Richard 0. Duda and Peter E. Hart and
David G. Stork, Wiley-Interscience, 2001.
http://rii.ricoh.com/~stork/DHS.html

The Elements of Statistical Learning (10th edition) Hastie, Tibshirani


and Friedman (2009). Springer-Verlag.
http://statweb.stanford.edu/~tibs/ElemStatLearn/

Learning from data: concepts, theory, and methods (2nd Ed.).


Cherkassky, V.S., Mulier, F. John Wiley, 2007.

Machine Learning: A Probabilistic Perspective. Kevin P. Murphy. MIT


Press, 2012. https://www.cs.ubc.ca/~murphyk/MLbook/
(42/44)
On the best programming environment
Python, R, MATLAB/Octave, Java, SQL, C/C++, Julia, ...

(43/44)
Making the best out of R

R is an open-source software for statistical computing, data analysis


and publication-quality graphics and a very usable programming
language (mix of imperative, OO and functional)

Get R from http://cran.r-project.org/

RStudio is a friendly IDE for R (Windows, Mac, and Linux)

Get RStudio from http://www.rstudio.com/

R has a very active community and dozens of very useful packages:

https://cran.r-project.org/web/views/MachineLearning.html
https://support.rstudio.com/hc/en-us/articles/
201057987-Quick-list-of-useful-R-packages
https://awesome-r.com/#awesome-r-machine-learning
https://github.com/ujjwalkarn/DataScienceR
https://www.tidyverse.org/

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy