Mathematical Foundations of Machine Learning
Mathematical Foundations of Machine Learning
Mathematical Foundations of Machine Learning
Machine Learning:
Introduction
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
What are we “learning” in Machine Learning?
Inference
Loosely speaking, inference problems take in data, then output some
kind of decision or estimate.
Examples:
• Does this image have a tree in it?
1
These two categories roughly correspond to what are called supervised and
unsupervised learning in the literature.
1
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
• If I tell you the current temperature in Sandy Springs, Mari-
etta, Mableton, College Park, and Decatur, can you tell me the
temperature in downtown Atlanta?
mortgage of $2000/month?
• If I give you a recording of somebody speaking, can you produce
text of what they are saying?
• If I give you a video of a drone moving along with the signal
coming out of the remote control used to fly it, can you discover
the differential equations that govern its motion?
2
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
is a 44% chance this image contains a tree” or “the probability
density function for the distance X of the drone to the sensor is
fX (x) = e−2|x−15.2|”).
Modeling
A second type of problem associated with the words “machine learn-
ing” might be roughly described as:
Given a data set, how can I succinctly describe it (in a quantitative,
mathematical manner)?
Most models can be broken into two categories:
1. Geometric models. The general problem is that we have
example data points x1, . . . , xN ∈ RD and we want to find
some kind of geometric3 structure that (approximately) de-
2
(Or from data space to the space of probability distributions over the range
of possible decisions.)
3
All of the things discussed here could also be described as algebraic models;
we could pose these as finding a system of equations that describes the
3
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
scribes them. Here are examples: given a set of vectors , what
(low dimensional) subspace comes closest to containing them?
Example
What about a low-dimensional manifold?
From Chapter 14 of Hastie, Tibshirani, and Friedman
16
4
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
your probability model.
This course
This course is not so much about actual machine learning algorithms.
Rather, it will focus on the basic mathematical concepts on which
these algorithms are built. In particular, to really understand any-
thing about ML, you need to have a very good grasp of linear al-
gebra and probabilistic inference (i.e. the mathematical theory
of probability and how to use it).
5
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020
are used as building-blocks for nonlinear representations. Here is
where we will need a lot of linear algebra and its extension (called
functional analysis) to infinite dimensions.
6
Georgia Tech ECE 7750 Notes by J. Romberg. Last updated 17:57, August 13, 2020