Linear Regression

Download as pdf or txt
Download as pdf or txt
You are on page 1of 34

Linear

Regression

Lecturer : Lyheang UNG


Table Of Content

Introduction 01 02 Linear Regression

Simple Linear Regression 03 04 Gradient Descent

Least Square Method 05 TOC 06 Pros & Cons


1. Introduction
● Regression (Analysis) is a statistical method used in finance, investment, and other
disciplines that attempts to determine the strength of relationship between one
continuous dependent variable and a series of other changing variables known as
independent variables.
● Regression helps investment and financial institution to value assets and understand
the relationships between variables, such as commodity prices and the stocks of
businesses dealing in those commodities.
1. Introduction
● Regression was invented by the cousin of Charles Darwin,
Francis Galton.
● Galton did his first regression in 1877 to estimate the size
of pea seeds based on the size of their parents’ seeds.
● Galton performed regression on a number of things,
including the heights of humans.
● He noticed that if parents were above average in height,
their children also tended to be above average but not as
much as their parents.
● The heights of children were regressing toward a mean
value.
Francis Galton
1. Introduction
● In regression, there are two types of data to be taken into account:
○ Linear Data
○ Non-Linear Data
● Choosing which learning algorithm to be applied, does depends on the distribution and
the characteristic of the data.
2. Linear Regression
● Linear Regression is a simple regression algorithm which uses linear model to study
and analyze the relationship between a scalar/continuous response and one or
more explanatory variables. It performs well when the input and output variables have a
linear relationship.
● The learning is done by fitting a linear equation to the observed data, which in turn will
be then used to predict values for the new data/observation.
● The case of one independent variable is called simple/univariate linear regression. For
more than one independent variable, the process is called multi/multivariate linear
regression.

x ℎ(x) 𝑦 ∈ ℝ
2. Linear Regression - Examples

Stock Price Prediction Health Spending Prediction


2. Linear Regression - Hypothesis
A linear regression of d input variables x = (𝑥1 , 𝑥2 , … , 𝑥𝑑 ) ∈ ℝ𝑑 can be represented by the
following equation known as hypothesis:

ℎw (x) = 𝑤0 + 𝑤1 𝑥1 + 𝑤2 𝑥2 + ⋯ + 𝑤𝑑 𝑥𝑑 𝑤0
𝑑

= ෍ 𝑤𝑗 𝑥𝑗 = w 𝑇 x , (𝑥0 = 1) 𝑤1
𝑥1
𝑗=0
𝑤2 Σ ℎw (x)
Where: 𝑥2
o 𝑥1 … . 𝑥𝑑 : features of x 𝑤𝑑

o 𝑤0 : bias
o 𝑤1 , … , 𝑤𝑑 : weight w.r.t. to each 𝑥𝑖 𝑥𝑑
o ℎw x : predicted value of x (𝑦)

2. Linear Regression - Hypothesis

● In linear regression with higher


Hyperplane dimensions, the hypothesis or the
prediction function ℎw (x) is known as
the hyperplane.
● The prediction is done by projecting the
observation into the hyperplane.

Vehicle Miles Per Gallon (MPG) Prediction


2. Linear Regression - Cost Function
● The ideal hypothesis is the one which best generalize the data. It is learned through the
training data by minimizing the loss known as cost function or objective function as
much as possible.

● Given n examples { x1 , 𝑦1 , x2 , 𝑦2 , . . , (x𝑛 , 𝑦𝑛 )} such that x𝑖 ∈ ℝ𝑑 , the cost function is


the sum of the squares of the residuals.
1
𝐽 w = σ𝑛𝑖=1 (ℎw (xi ) − 𝑦𝑖 )2
2𝑛

● The goal is to find parameters w


ෝ = (𝑤0 , 𝑤1 , … , 𝑤𝑑 ), which minimizes the cost function
𝐽 w as much as possible.

w
ෝ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐽(w)
w
3. Simple Linear Regression
● In simple linear regression, it deals with one dependent variable and one independent
variable.
● It uses a line equation which approximates the relationship between these two
variables, which is learned during the training, to estimate the predicted value.
ℎw (x) = 𝑤0 + 𝑤1 𝑥1

𝑤0
Where: Σ ℎw (x)
o 𝑥1 : feature 𝑥1 of x 𝑤1
o 𝑤0 : bias - interceptor 𝑥1
o 𝑤1 : weight of 𝑥1 - slope
o ℎw x : predicted value of x
3. Simple Linear Regression - Examples
● In simple linear regression, the hypothesis ℎw (x) is a line. The prediction is done
by projecting the observation into that line.

Test Grade Prediction Child Height Prediction


3. Simple Linear Regression - Cost Function
● Given n examples { x1 , 𝑦1 , x2 , 𝑦2 , . . , (x𝑛 , 𝑦𝑛 )} such that x ∈ ℝ, the cost function is
defined as follows:

1
𝐽 w = σ𝑛 (ℎw (x𝑖 ) − 𝑦𝑖 )2
2𝑛 𝑖=1
● The goal is to find parameters w
ෝ = (𝑤0 , 𝑤1 ), which minimizes the cost function 𝐽(w) as
much as possible.

w
ෝ = 𝑎𝑟𝑔𝑚𝑖𝑛 𝐽(w)
w
3. Simple Linear Regression

𝑦 ● The cyan line in the figure represents the


regression line which will be used to
Error-Residual
predict the output 𝑦ෝ𝑖 given the input 𝑥𝑖 .
𝑦𝑖
● The residual/error is the difference
𝑦ෝ𝑖
between the predict value 𝑦ෝ𝑖 and the
real/reference value 𝑦𝑖 .
● The ideal regression line is the one that
best minimizes the overall losses. But
𝑥𝑖 𝑥 how to minimize the loss?
4. Gradient Descent
Gradient Descent is an optimization algorithm used to minimize the cost function. We
first initialize the model parameters by random values and iteratively update their values
till we reach the global minimum by computing derivative of the loss w.r.t. to each
parameter in every iteration and updating parameters as follows:

𝜕
𝑤𝑖 = 𝑤𝑖 − 𝜂 𝐽 w
𝜕𝑤𝑖
𝐽(w)
Where 𝜂 is the learning
rate/step
𝑤1
𝑤0

Gradient Descent Learning Process


4. Gradient Descent - Algorithms

Batch Gradient Descent Flowchart


4. Gradient Descent - Simple Linear Regression
● Given n examples { x1 , 𝑦1 , x2 , 𝑦2 , . . , (x𝑛 , 𝑦𝑛 )} such that x ∈ ℝ , the gradients with
respect to 𝑤0 and 𝑤1 are calculated as follows:

𝜕 1
𝐽 w = σ𝑛𝑖=1(ℎw (x𝑖 ) − 𝑦𝑖 )
𝜕𝑤0 𝑛
1
𝐽 w = σ𝑛𝑖=1 (ℎw (x𝑖 ) − 𝑦𝑖 )2
𝜕 1 2𝑛
𝐽 w = σ𝑛𝑖=1(ℎw (x𝑖 ) − 𝑦𝑖 ) 𝑥𝑖
𝜕𝑤1 𝑛
ℎw (x) = 𝑤0 + 𝑤1 𝑥1
𝜕 1
⟹ 𝐽 w = σ𝑛𝑖=1 ℎw x𝑖 − 𝑦𝑖 𝑥𝑖 , 𝑥0 = 1
𝜕𝑤𝑖 𝑛
4. Gradient Descent - Simple Linear Regression
● The batch gradient descent algorithm is outlined as follow:

Initialize 𝑤0 and 𝑤1 with random values


For all data points:
make predictions – compute ℎw (x𝑖 )
For each weight 𝑤𝑖 :
# compute the gradient and update new
weight
𝜂
𝑤i = 𝑤i − σ𝑛𝑖=1 ℎw (x𝑖 ) − 𝑦𝑖 𝑥𝑖
𝑛
Learning Process
Until weights converges
4. Gradient Descent - Simple Linear Regression

𝑚 = 𝑤1 , 𝑏 = 𝑤0
4. Gradient Descent - Simple Linear Regression

𝑚 = 𝑤1 , 𝑏 = 𝑤0
4. Gradient Descent - Simple Linear Regression

𝑚 = 𝑤1 , 𝑏 = 𝑤0
4. Gradient Descent - Simple Linear Regression

𝑚 = 𝑤1 , 𝑏 = 𝑤0
4. Gradient Descent - Simple Linear Regression

𝑚 = 𝑤1 , 𝑏 = 𝑤0
4. Gradient Descent - Effect of Learning Rate
5. Least Squares Method
● Least Squares Method is another method, which is used to learn directly the optimal
parameters 𝑤 ෝ of the linear regression. It is applicable because the cost function is a
convex function consisting of only one minimum value.
● Given 𝑛 examples { x1 , 𝑦1 , x2 , 𝑦2 , . . , x𝑛 , 𝑦𝑛 } such that x ∈ ℝ, the regression line of
simple linear regression is calculated as follows:

ℎw (x) = 𝑤0 + 𝑤1 𝑥1

σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)

Where 𝑤1 =
σ𝑛𝑖=1 𝑥𝑖 − 𝑥ҧ 2

𝑤0 = 𝑦ത − 𝑤1 𝑥ҧ
5. Least Squares Method
Example: Sam is a owner of ice-cream shop. He wants to improve the income by making
ice-creams based on the duration of sunshine. Following the data he collected, estimate
how many ice-creams Sam should prepare when the weather forecast says "we expect 8
hours of sun tomorrow”.

Hours of Sunshine Ice Creams Sold

2 4

3 5

5 7

7 10

9 15
5. Least Squares Method
Denote:
𝑥: number of sunshine hours
y: number of ice-creams sold
5. Least Squares Method
Step 1: For each pair (𝑥, 𝑦), calculate 𝑥ҧ and 𝑦ത

𝒙 𝒚

2 4

3 5

5 7

7 10

9 15

𝑥ҧ = 5.2 𝑦ത = 8.2
5. Least Squares Method
Step 2: Calculate 𝑥 − 𝑥,ҧ (𝑥 − 𝑥)ҧ 2 , 𝑦 − 𝑦,
ത (𝑥 − 𝑥)(𝑦
ҧ − 𝑦)

𝒙 𝒚 ഥ
𝒙−𝒙 ഥ
𝒙−𝒙 𝟐 ഥ
𝒚−𝒚 (𝑥 − 𝑥)(𝑦
ҧ − 𝑦)

2 4 -3.2 10.24 -4.2 13.44

3 5 -2.2 4.84 -3.2 7.04

5 7 -0.2 0.04 -1.2 0.24

7 10 1.8 3.24 1.8 3.24

9 15 3.8 14.44 6.8 25.84

𝑥ҧ = 5.2 𝑦ത = 8.2
5. Least Squares Method
2
Step 3: Calculate σ(𝑥 − 𝑥)(𝑦
ҧ ത and σ 𝑥𝑖 − 𝑥ҧ
− 𝑦)

𝒙 𝒚 ഥ
𝒙−𝒙 ഥ
𝒙−𝒙 𝟐 ഥ
𝒚−𝒚 (𝑥 − 𝑥)(𝑦
ҧ − 𝑦)

2 4 -3.2 10.24 -4.2 13.44

3 5 -2.2 4.84 -3.2 7.04

5 7 -0.2 0.04 -1.2 0.24

7 10 1.8 3.24 1.8 3.24

9 15 3.8 14.44 6.8 25.84

෍ 𝑥𝑖 − 𝑥ҧ 2 = 32.8 ෍ 𝑥 − 𝑥ҧ 𝑦 − 𝑦ത = 49.8
𝑥ҧ = 5.2 𝑦ത = 8.2
5. Least Squares Method
Step 4: Calculate the parameters 𝑤0 and 𝑤1 to obtain regression line.

σ𝑛𝑖=1(𝑥𝑖 − 𝑥)(𝑦
ҧ 𝑖 − 𝑦)ത 49.8
𝑤1 = = = 1.51
σ𝑚𝑖=1 𝑥𝑖 − 𝑥ҧ 2 32.8

𝑤0 = 𝑦ത − 𝑤1 𝑥ҧ = 8.2 − 1.51 × 5.2 = 0.34


ℎ(𝑥)

ℎ(𝑥) = 1.51𝑥 + 0.34


5. Least Squares Method
To prepare the sale according to the weather forecast, which says "We expect 8 hours for
sun tomorrow”, Sam uses the regression equation to estimate the number of ice creams
as follows:

ℎ 8 = 1.51 × 8 + 0.34 = 12.42 Ice Creams


6. Pros & Cons
● Pros:
○ Easy to implement & interpret results
○ Inexpensive computation
● Cons:
○ Poorly model nonlinear data
○ Sensitive to outliers
● Works with:
○ Numerical values
○ Nominal values
Q&A

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy