0% found this document useful (0 votes)

6 views

2021 Machine Learning Intro

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views

2021 Machine Learning Intro

Uploaded by

sibahlemlambo5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Machine Learning – COMS3007

Introduction to Machine
Learning
Benjamin Rosman
Benjamin.Rosman1@wits.ac.za / benjros@gmail.com
Example: Hand-Written Digits
• Write a program to automatically classify these
Example: Faces
• Write a program to automatically find faces
What is machine learning?
(patterns)
• “the automatic discovery of regularities in data
through the use of computer algorithms and with the
use of these regularities to take actions such as
classifying the data into different categories”
[Bishop, 2007]

• “Machine learning is the study of computer algorithms

that improve automatically through experience”
• “A computer program is said to learn from experience E
with respect to some task T and some performance
measure P, if its performance on T, as measured by P,
improves with experience E.”
[Mitchell, 1997]
What is machine learning?
• Automating the process that has driven science for
centuries
• Collect data, hypothesise relationships (models),
experiment to test hypotheses
• E.g.
• Kepler’s laws of planetary motion
• Newton’s laws of motion
• …
What is machine learning?
• In many of today’s problems it is
• Very hard to write a correct program
• Heuristics, rules, special cases
• E.g. based on character strokes, topology, etc
• But very easy to collect examples
• Text, emails, images, sales, …
• Idea behind machine learning:
• From the examples, generate the program (function)
• This is easier with many examples
• Define many relative to the size of the data, and complexity of
the model
Terms
• Artificial Intelligence (AI)
• Machine Learning
• Data Science
• Big Data
Terms
• Artificial Intelligence (AI) is the field which studies
how to create computers and computer software that
are capable of intelligent behaviour.
• Machine Learning
• Data Science
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning explores the study and
construction of algorithms that can learn from and
make predictions on data.
• Data Science
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning
• Data Science is an interdisciplinary field about
processes and systems to extract knowledge or
insights from large volumes of data.
• Big Data
Terms

• Artificial Intelligence (AI)

• Machine Learning
• Data Science
• Big Data is a broad term for data sets so large or
complex that traditional data processing applications
are inadequate. Challenges include analysis,
capture, data curation, search, sharing, storage,
transfer, visualization, and information privacy.
ML, Stats and Data Science
Many parts of machine learning are similar to ideas from statistics
Emphasis is typically different
• Focus on prediction in machine learning vs interpretation of
the model in statistics
ML often refers to tasks associated with artificial intelligence (AI)
• e.g. recognition, diagnosis, planning, robot control,
prediction, etc.
Goals can be autonomous machine performance, or enabling
humans to learn from data (data mining)
Data science: typically refers to entire pipeline
• From data acquisition and storage to presenting results
• ML is typically the modelling and analysis phase
supervised
Typical learning problem
New example
(unknown label)

Training examples Machine learning

Prediction rule
(known labels) algorithm

Prediction
Example problems
• Text recognition, understanding and translation
• Face/object detection
• Spam filtering
• Identifying topics in documents
• Spoken language understanding
• Medical diagnosis
• Customer segmentation / product recommendation
• Fraud detection
• Weather prediction
• Computer game AI
Hand-Written Digits

What is the ML problem here?

Learn a function y = f(x):
• Mapping from data (x=image) to a class (y=number label)
• Think of this as a program
• E.g. f( ) = “8”
How is this done?
• The function f is some model of the digits
• We provide the general form
• We present the model with a set of N images {x1, x2, …, xN} where the
true solutions {y1, y2, …, yN} are known
• This training tweaks the parameters of the model
• During testing provide a new x’ to determine y’ = f(x’)
Formally
Given some data (input-output pairs):
• x = training input, y = training output
• D= {(x1 ,y1), (x2, y2), …, (xN, yN)}

We need a model:
• Modelling assumptions (or knowledge about the problem) go here!
• y = f(x; θ)
• θ = parameters of model

Now, learning task is to find “best” model parameters θ

• Usually measure some error between y and f(x; θ)
• Treated as an optimisation problem
An example
• Given (features):
• Animal size
• Level of domestication
• Predict (label):
• Cat or dog?
An example
𝑦 = label = {"dog", "cat"}
Training
Parameters of
𝑥2 model define this
line

Features of
the data (2D)

The model here is a

straight line: linear
discriminant

𝑥1
An example
Training
An example
Training
An example
Training
We want the model to generalise to any
An example point in this space that we haven’t seen yet

Querying

?
Categories of ML
• Supervised learning
• Predict output y when given input x
• Learn from labelled data: {(xi, yi)} (Make predictions!)
• Classification: y categorical
• Regression: y real-valued
• Unsupervised learning (Understand data!)
• Learn from unlabelled data: {xi}
• Clustering
• Learning some structure in the data
• Semi-supervised learning (Combine the above!)
• Only some labels provided
• Reinforcement learning
• Learn from rewards (typically delayed)
• Generate own data (experience) through interacting with an environment

(Learn to act/make decisions!)

Common ML Problems

Regression Classification Clustering

• Predict a continuous • Predict a discrete • Given input data x, return
output y, given some class output y, given discrete cluster membership
input x some input x • No training clusters are given,
• Training: examples • Training: examples so the returned clusters may
of (x,y) pairs of (x,y) pairs not mean anything
• Supervised learning • Supervised learning • Unsupervised learning
NB: the labels (S, M, L) added to this t-
shirt example were the result of
someone looking at the clustering
results and interpreting them!

Note: inputs are usually multidimensional

Supervised Learning (Make predictions!)
Unsupervised Learning (Understand data!)
Reinforcement Learning (Learn to act!)

Given an unknown dynamic process (environment),

and an unknown reward process (goal), learn to take
a sequence of actions to maximise reward.
Learn a behaviour by rewards received
Many attempts
• Trial and error -100

Examples: +10
• Learn to fly a helicopter
• Learn to make coffee
• Learn to play chess
Generalising
During training, only a small fraction of possible data will
be provided

Need the resulting f to generalise to (ideally) all cases

• This is what we really care about: we are unlikely to only ever
see the training data!

Beware of over-/under- fitting!

• Performs well on training data
• Poor generalisation!

The more data the better!

?
Generalising
Want a model that is expressive, but not too general
for the amount of data you have

Note: ground truth

(green line) is unknown
to algorithm
Bias-variance trade-off

These models are

These models are said to have a high
said to have a high variance: there is
bias: there is error error from small
from erroneous fluctuations in the
assumptions in the data. This causes
model. This causes overfitting.
underfitting.

The bias-variance trade-off involves

balancing these two factors: both are
different sources of error that affect
generalisation.
Representations
How the data is represented is fundamental!
• Determines if problem can be solved with chosen model

• Or can be solved at all!

• Identify as elephant vs dog
1. Features: mass, height
2. Features: number of legs, number of ears
Representations
Represent data using features
• Low level: pixels, characters, …
• High level: objects, words, regions, …

Trade-off between
• Expressive: accurately capture distinctions in data
• Sparse: not need prohibitive amounts of data

One of the hardest and most important parts of ML!

Curse of Dimensionality
• As dimensionality of model or feature space grows,
may need exponentially more data

How sparse are these 20 data points in each case?

Handling representations
• Feature extraction
• Manual pre-processing

• Feature selection
• Autonomously identify important dimensions

• Feature learning
• Combine simpler features into more complex ones
• E.g. deep learning (when we talk about neural networks)
Data
For any ML algorithm to work, we need data, and
more is always better. In ML, we “let the data do the
talking”.
Much work goes into collecting data sets. For large
models (many parameters), we may need many
millions of examples to learn a good model.
But, how do we know how well
the model will generalise?
Protip: Never trust people
that mess this up!
Splitting the Data
Typically divide the full data set into three:
• Training data: learn the model parameters
• This is the core learning part, and so it needs the most data
• +/- 60% of the data
• Validation data: learn the model hyperparameters
• Hyperparameters are values set before training begins, e.g. the
degree of the polynomial, the complexity of the neural network
• +/- 20% of the data
• Testing data: report quality of model
• This is used to report an unbiased evaluation of the final model
• +/- 20% of the data
Why split the data?
This red model has a perfect fit to the blue
training points: so they will not give a reliable
estimate of how well the model will
x generalise.
x
Instead, we want to test it on new data points
that it has never seen during training. This
gives a better idea of its performance.

Similarly, we may be learning the hyperparameter of the degree of the model (M), by
training a straight line model (M=1), quadratic model (M=2), … up to M=9 and then
seeing which is best. We can train them all on the same training data, but we need to
use different validation data to choose the best one. Again, we can’t just report its
performance on that data, as it is already biased. So, we then need a different testing
set to report final scores.

The test data must not be touched until the very end! It is the “blind/surprise test”.
Example: Polynomial Curve Fitting
Simple regression (supervised learning) problem
Target label (= y)

Training data (noisy, i.e. a small

amount of randomness is added)

True unknown
function: sin(2πx)

Feature (1D)
Goal: given a new x, predict t (target)
A Polynomial Function
Assume the function is polynomial:

Note: this is a linear model

• Linear function of coefficients w (the parameters)
Evaluate:
• Use an error function

Learning:
• Find the weight vector w to minimise error E(w)
• Unique solution w*
• E(w) is quadratic in w Predicted value at x
• E’(w) is linear in w Error between predicted and
true value t
Squared so it is symmetrical
Sum over every data point
½ to make the maths simpler
after differentiating
More on this example in the linear
regression lecture.
Model Selection
Choosing M (polynomial order)
For M = 9, E(w*) = 0! But goal is to generalise!
Training vs Testing Error
Overfitting: high
Define error to compare across N: error on test
data, low error
on training data

Training error is always better than

test error. Why?
Adding More Training Data

More data
• Less severe over-fitting
• More complex model we can fit

There are other strategies to solve this problem (see

regularisation later).
Recap
• What is ML and why do we need it
• Example problems
• Supervised, unsupervised, reinforcement learning
• Generalisation, bias-variance trade-off
• Representations
• Curse of dimensionality
• Training, validation, testing data
• Curve-fitting example

Make sure you are comfortable with this all by next

week!

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Machine Learning?
100% (2)
Machine Learning?
114 pages
Roger G. Barry - Synoptic and Dynamic Climatology (2001)
No ratings yet
Roger G. Barry - Synoptic and Dynamic Climatology (2001)
637 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
45 pages
Machine Learning - ch1
No ratings yet
Machine Learning - ch1
46 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
24 pages
ML -1_Sovan_Introduction to ML
No ratings yet
ML -1_Sovan_Introduction to ML
83 pages
Mlfa Autumn 22 Lec 01
No ratings yet
Mlfa Autumn 22 Lec 01
43 pages
Unit I MACHINE LEARNING
No ratings yet
Unit I MACHINE LEARNING
87 pages
Firoz Topic 0 Ppt
No ratings yet
Firoz Topic 0 Ppt
24 pages
Ch7 Introduction to Machine Learning
No ratings yet
Ch7 Introduction to Machine Learning
29 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Ch3-Machine Learning
No ratings yet
Ch3-Machine Learning
124 pages
Module2 ch2
No ratings yet
Module2 ch2
36 pages
unit 01
No ratings yet
unit 01
32 pages
An Enlightenment To Machine Learning
100% (1)
An Enlightenment To Machine Learning
16 pages
AAI Lecture 9 Sp 25
No ratings yet
AAI Lecture 9 Sp 25
26 pages
Module 1
No ratings yet
Module 1
22 pages
Unit-1 MLT
No ratings yet
Unit-1 MLT
51 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
10 pages
Fundamentals of ML 1
No ratings yet
Fundamentals of ML 1
38 pages
Overview of machine learning
No ratings yet
Overview of machine learning
60 pages
Air quality prediction using machine learning
No ratings yet
Air quality prediction using machine learning
29 pages
Machine Learning
No ratings yet
Machine Learning
29 pages
1.Introduction
No ratings yet
1.Introduction
24 pages
1 - Module5 - Machine Learning
100% (1)
1 - Module5 - Machine Learning
78 pages
L1 Overview
No ratings yet
L1 Overview
28 pages
AI-Lecture 8 (Machine Learning Overview)
No ratings yet
AI-Lecture 8 (Machine Learning Overview)
42 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
Intro_DL_01
No ratings yet
Intro_DL_01
64 pages
EE353 - 769 06 Intro To ML
No ratings yet
EE353 - 769 06 Intro To ML
27 pages
Ai - Foundations of Machine Learning I
No ratings yet
Ai - Foundations of Machine Learning I
39 pages
Unit 1 Notes_FML
No ratings yet
Unit 1 Notes_FML
95 pages
ML Notes
No ratings yet
ML Notes
7 pages
1 Overview
No ratings yet
1 Overview
22 pages
ENG6500 1 IntroductionToMLDL Part1
No ratings yet
ENG6500 1 IntroductionToMLDL Part1
63 pages
Machine Learning: Bilal Khan
No ratings yet
Machine Learning: Bilal Khan
26 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
Lec-1 Introduction
No ratings yet
Lec-1 Introduction
65 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
Unit 1
No ratings yet
Unit 1
62 pages
CHP 1
No ratings yet
CHP 1
47 pages
Machine Learning
No ratings yet
Machine Learning
74 pages
4. Ai_foundations of Machine Learning i
No ratings yet
4. Ai_foundations of Machine Learning i
40 pages
Unit No. 1
No ratings yet
Unit No. 1
73 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
48 pages
UNIT 1
No ratings yet
UNIT 1
38 pages
Class1-%20Introduction%20and%20foundation-1717413257735
No ratings yet
Class1-%20Introduction%20and%20foundation-1717413257735
23 pages
Unit 1 ML
No ratings yet
Unit 1 ML
70 pages
Lect3 Machine Learning
No ratings yet
Lect3 Machine Learning
27 pages
Machine Learning Lecture 1
No ratings yet
Machine Learning Lecture 1
10 pages
Data in ML
No ratings yet
Data in ML
26 pages
Chap-6 Machine Learning Introduction
No ratings yet
Chap-6 Machine Learning Introduction
49 pages
ETI microproject
No ratings yet
ETI microproject
11 pages
TM 11-5820-1118-12&P (Surf GND Kit)
No ratings yet
TM 11-5820-1118-12&P (Surf GND Kit)
61 pages
SAP BW - Virtual Characteristic (Multiprovider & Infoset) - RSR - OLAP - BADI
No ratings yet
SAP BW - Virtual Characteristic (Multiprovider & Infoset) - RSR - OLAP - BADI
21 pages
Cse 2027 Fda M1
No ratings yet
Cse 2027 Fda M1
55 pages
Modelling On Bus-Transfer Current and Enclosure Current of GIS
No ratings yet
Modelling On Bus-Transfer Current and Enclosure Current of GIS
4 pages
Hda Esr 1546
No ratings yet
Hda Esr 1546
10 pages
Prediction Test Problems of Basic Physics Physics Education Study Program (Billingual Class)
No ratings yet
Prediction Test Problems of Basic Physics Physics Education Study Program (Billingual Class)
4 pages
Lab Based Project Report
No ratings yet
Lab Based Project Report
40 pages
TDS Manual
No ratings yet
TDS Manual
116 pages
Ael Zg554 Ec-2r First Sem 2023-2024
No ratings yet
Ael Zg554 Ec-2r First Sem 2023-2024
5 pages
04 HYDRAULIC MOTORS (Module-IV)
100% (1)
04 HYDRAULIC MOTORS (Module-IV)
19 pages
B.Tech-Notes-1 (RGIPT)
No ratings yet
B.Tech-Notes-1 (RGIPT)
2 pages
Fs Operating Microscopes 2
No ratings yet
Fs Operating Microscopes 2
10 pages
Perioperative Acide-Base Balance - Ahmad Damlaj
No ratings yet
Perioperative Acide-Base Balance - Ahmad Damlaj
26 pages
L2 Sample Project Definition Template
No ratings yet
L2 Sample Project Definition Template
4 pages
Punctuation S
No ratings yet
Punctuation S
3 pages
Baking and Pastry Recipes PDF
89% (9)
Baking and Pastry Recipes PDF
112 pages
Language Implementation Patterns Create Your Own Domain Specific and General Programming Languages 1st Edition Terence Parr - The ebook is available for quick download, easy access to content
100% (1)
Language Implementation Patterns Create Your Own Domain Specific and General Programming Languages 1st Edition Terence Parr - The ebook is available for quick download, easy access to content
57 pages
Instruction Manual: Universal Vibration Monitor
No ratings yet
Instruction Manual: Universal Vibration Monitor
39 pages
Creating and Building Websites: Stanford University Continuing Studies CS 21
No ratings yet
Creating and Building Websites: Stanford University Continuing Studies CS 21
16 pages
Screenshot 2024-01-17 at 4.27.27 PM
No ratings yet
Screenshot 2024-01-17 at 4.27.27 PM
35 pages
K Nearest Neighbor Classification
0% (1)
K Nearest Neighbor Classification
32 pages
Forced Air Oven 2
No ratings yet
Forced Air Oven 2
1 page
2024 - Unit Outline - Linear Algebra
No ratings yet
2024 - Unit Outline - Linear Algebra
2 pages
Chapter 67 The T = Tan Θ/2 Substitution: EXERCISE 274 Page 750
No ratings yet
Chapter 67 The T = Tan Θ/2 Substitution: EXERCISE 274 Page 750
7 pages
Tennis Tutorial
100% (1)
Tennis Tutorial
31 pages
Masonry Fire Resistence
No ratings yet
Masonry Fire Resistence
5 pages
Observing Weather CSE
No ratings yet
Observing Weather CSE
12 pages
File Information: Drive Information: Torque/Force Foldback Information
No ratings yet
File Information: Drive Information: Torque/Force Foldback Information
2 pages
Ignition System Si Engine: Practical No:6
No ratings yet
Ignition System Si Engine: Practical No:6
14 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

2021 Machine Learning Intro

Uploaded by

2021 Machine Learning Intro

Uploaded by

Machine Learning – COMS3007

• “Machine learning is the study of computer algorithms

• Artificial Intelligence (AI)

• Artificial Intelligence (AI)

• Artificial Intelligence (AI)

Training examples Machine learning

What is the ML problem here?

Now, learning task is to find “best” model parameters θ

The model here is a

(Learn to act/make decisions!)

Regression Classification Clustering

Note: inputs are usually multidimensional

Given an unknown dynamic process (environment),

Need the resulting f to generalise to (ideally) all cases

Beware of over-/under- fitting!

The more data the better!

Note: ground truth

These models are

The bias-variance trade-off involves

• Or can be solved at all!

One of the hardest and most important parts of ML!

How sparse are these 20 data points in each case?

Training data (noisy, i.e. a small

Note: this is a linear model

Training error is always better than

There are other strategies to solve this problem (see

Make sure you are comfortable with this all by next

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.