Linear Models: CS771: Introduction To Machine Learning Piyush Rai

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 8

Linear Models

CS771: Introduction to Machine Learning


Piyush Rai
2
Linear Models
 Consider learning to map an input to the corresponding (say real-valued) output

 Assume the output to be a linear weighted combination of the input features


This defines a linear model with
parameters given by a “weight
vector”
Each of these weights have a simple
interpretation: is the “weight” or importance
of the feature in making this prediction
The “optimal” weights are unknown and
 This simple model can be used for Linear Regression have to be learned by solving an
optimization problem, using some training
data
 This simple model can also be used as a “building block” for more complex models
 Even classification (binary/multiclass/multi-output/multi-label) and various other ML/deep learning
models
 Even unsupervised learning problems (e.g., dimensionality reduction models)
CS771: Intro to ML
3
Simple Linear Models as Building Blocks
 In some regression problems, each output itself is a real-valued vector
 Example: Given a full body image of a person, predict height, weight, hand size, and leg size (

 Such problems are commonly known as multi-output regression

 We can assume a separate linear model for each of the outputs


Now each is a D-dim weight

𝑦 𝑚 =𝒘 𝒙
𝑚
vector for predicting the output

Here is an MxD weight matrix


𝒚=𝐖 𝒙 with its row containing
𝒘⊤
Note: Learning separate models may not be ideal
1
Learning this model will
𝒘⊤
these multiple outputs are somewhat correlated with 2
require us to learn this weight
each other. But this model can be extended to matrix (or equivalently, the
handle such situation (techniques are a bit advanced 𝒙
to be discussed right now – but if curious, you may 𝒘 ⊤𝑀 weight vectors)
look up more about multitask learning techniques) 𝒚 𝐖 CS771: Intro to ML
4
Simple Linear Models as Building Blocks
 A linear model can also be used in classification problems

 For binary classfn, can treat as the “score” of input and threshold to get binary label

Recall that the LwP model can also


be seen as a linear model (although
it wasn’t formulated like this)

Don’t worry. Can easily fold-in the bias


term here as shown in the figure below

Can append a constant feature “1”


Wait – when
for each input and rewrite as where
discussing LwP,
now both and
wasn’t the linear
model of the form ?
We will assume the same and omit
Where did the “bias”
the explicit bias for simplicity of
term go? CS771: Intro to ML
notation
5
Simple Linear Models as Building Blocks
 Linear models are also used in multiclass classification problems

 Assuming classes, we can assume the following model

𝑦 =argmax 𝑘∈ {1,2 , …, 𝐾 } 𝒘 ⊤
𝑘 𝒙

 Can think of as the score of the input for the class

 Once learned (using some optimization technique), these weight vectors (one for each
class) can sometimes have nice interpretations, especially when the inputs are images
The learned weight
These images sort
vectors of each of the 4 of look like class
classes visualized as prototypes if I
images – they kind of were using LwP
look like a “template” of 𝒘 𝑐𝑎𝑟 𝒘 𝑓𝑟𝑜𝑔 𝒘 h𝑜𝑟𝑠𝑒 𝒘 𝑐𝑎𝑡 
Yeah, “sort of”. 
what the images from
That’s why the dot product of each of these weight vectors with No wonder why LwP (with
that class should look an image from the correct class will be expected to be the largest Euclidean distances) acts
like like a linear model.  CS771: Intro to ML
6
Simple Linear Models as Building Blocks
 Linear models are building blocks for dimensionality reduction methods like PCA

This looks very similar to the multi-


output model, except that the values of
the latent features are not known and
have to be learned

 Linear models are building blocks for even deep learning model (each layer is like a multi-
output linear model, followed by a nonlinearity)

In a deep learning model, each layer learns a latent


feature representation of the inputs using a model like a
multi-output linear model, followed by a nonlinearity

The last (output) layer can have one or more outputs

More on this when we discuss deep learning later

CS771: Intro to ML
7
Learning Linear Models

CS771: Intro to ML
8
Next Lecture
 Linear Regression

CS771: Intro to ML

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy