0% found this document useful (0 votes)

1 views53 pages

Lecture 0.2 - Linear Methods For Regression, Optimization

This document outlines Lecture 2 of CSC 311: Introduction to Machine Learning, focusing on linear regression and optimization methods. It discusses the modular approach to modeling, the loss function for linear regression, and the importance of vectorization in computations. Additionally, it covers methods for solving minimization problems using algebraic and calculus-based approaches.

Uploaded by

mustafakamel2710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1 views53 pages

Lecture 0.2 - Linear Methods For Regression, Optimization

Uploaded by

mustafakamel2710

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

CSC 311: Introduction to Machine Learning

Lecture 2 - Linear Methods for Regression, Optimization

Roger Grosse Chris Maddison Juhan Bae Silviu Pitis

University of Toronto, Fall 2020

Intro ML (UofT) CSC311-Lec2 1 / 53

Announcements

Homework 1 is posted! Deadline Sept 30, 23:59.

Instructor hours are announced on the course website. (TA OH

TBA)

No ProctorU!

Intro ML (UofT) CSC311-Lec2 2 / 53

Overview

Second learning algorithm of the course: linear regression.

I Task: predict scalar-valued targets (e.g. stock prices)
I Architecture: linear function of the inputs
While KNN was a complete algorithm, linear regression exemplifies
a modular approach that will be used throughout this course:
I choose a model describing the relationships between variables of
interest
I define a loss function quantifying how bad the fit to the data is
I choose a regularizer saying how much we prefer different candidate
models (or explanations of data)
I fit a model that minimizes the loss function and satisfies the
constraint/penalty imposed by the regularizer, possibly using an
optimization algorithm
Mixing and matching these modular components give us a lot of
new ML methods.

Intro ML (UofT) CSC311-Lec2 3 / 53

Supervised Learning Setup

In supervised learning:
There is input x ∈ X , typically a vector of features (or covariates)
There is target t ∈ T (also called response, outcome, output, class)
Objective is to learn a function f : X → T such that t ≈ y = f (x)
based on some data D = {(x(i) , t(i) ) for i = 1, 2, ..., N }.

Intro ML (UofT) CSC311-Lec2 4 / 53

Linear Regression - Model

Model: In linear regression, we use a linear function of the features

x = (x1 , . . . , xD ) ∈ RD to make predictions y of the target value t ∈ R:
X
y =f (x) = wj xj + b
j

I y is the prediction
I w is the weights
I b is the bias (or intercept)
w and b together are the parameters
We hope that our prediction is close to the target: y ≈ t.

Intro ML (UofT) CSC311-Lec2 5 / 53

What is Linear? 1 feature vs D features

2.0 Fitted line

Data
1.5
1.0 If we have only 1 feature:
y: response

0.5
0.0
y = wx + b where w, x, b ∈ R.
0.5
1.0 y is linear in x.
2 1 0 1 2
x: features

If we have D features:
y = w> x + b where w, x ∈ RD ,
b∈R
y is linear in x.

Relation between the prediction y and inputs x is linear in both cases.

Intro ML (UofT) CSC311-Lec2 6 / 53

Linear Regression

We have a dataset D = {(x(i) , t(i) ) for i = 1, 2, ..., N } where,

(i) (i) (i)
x(i) = (x1 , x2 , ..., xD )> ∈ RD are the inputs (e.g. age, height)
t(i) ∈ R is the target or response (e.g. income)
predict t(i) with a linear function of x(i) :

2.0 2.0 FittedData

line
Data
1.5 1.5 t(i) ≈ y (i) = w> x(i) + b
1.0 1.0
y: response
y: response

0.5 0.5
Different (w, b) define different lines.
0.0 0.0 We want the “best” line (w, b).
0.5 0.5
How to quantify “best”?
1.0 1.0
2 2 1 1 0 0 1 1 2 2
x: features
x: features

Intro ML (UofT) CSC311-Lec2 7 / 53

Linear Regression - Loss Function
A loss function L(y, t) defines how bad it is if, for some example x,
the algorithm predicts y, but the target is actually t.
Squared error loss function:

L(y, t) = 12 (y − t)2

y − t is the residual, and we want to make this small in magnitude

The 21 factor is just to make the calculations convenient.
Cost function: loss function averaged over all training examples
N
1 X (i) 2
J (w, b) = y − t(i)
2N i=1
N
1 X > (i) 2
= w x + b − t(i)
2N i=1

Terminology varies. Some call “cost” empirical or average loss.

Intro ML (UofT) CSC311-Lec2 8 / 53
Vectorization
1 PN 2
Notation-wise, 2N i=1 y (i) − t(i) gets messy if we expand y (i) :
N D
!2
1 X X (i)

wj xj + b − t(i)
2N i=1 j=1

The code equivalent is to compute the prediction using a for loop:

Excessive super/sub scripts are hard to work with, and Python

loops are slow, so we vectorize algorithms by expressing them in
terms of vectors and matrices.
w = (w1 , . . . , wD )> x = (x1 , . . . , xD )>

y = w> x + b
This is simpler and executes much faster:

Intro ML (UofT) CSC311-Lec2 9 / 53

Vectorization

Why vectorize?
The equations, and the code, will be simpler and more readable.
Gets rid of dummy variables/indices!
Vectorized code is much faster
I Cut down on Python interpreter overhead
I Use highly optimized linear algebra libraries (hardware support)
I Matrix multiplication very fast on GPU (Graphics Processing Unit)

Switching in and out of vectorized form is a skill you gain with practice
Some derivations are easier to do element-wise
Some algorithms are easier to write/understand using for-loops
and vectorize later for performance

Intro ML (UofT) CSC311-Lec2 10 / 53

Vectorization

We can organize all the training examples into a design matrix X

with one row per training example, and all the targets into the
target vector t.

Computing the predictions for the whole dataset:

 T (1)   (1) 
w x +b y
.
.. . 
Xw + b1 =   =  ..  = y
  

wT x(N ) + b y (N )

Intro ML (UofT) CSC311-Lec2 11 / 53

Vectorization
Computing the squared error cost across the whole dataset:
y = Xw + b1
1
J = ky − tk2
2N
Sometimes we may use J = 12 ky − tk2 , without a normalizer. This
would correspond to the sum of losses, and not the averaged loss.
The minimizer does not depend on N (but optimization might!).
We can also add a column of 1’s to design matrix, combine the
bias and the weights, and conveniently write
 
 (1) >
 b
1 [x ] w1 
(2) > 
X = 1 [x ]  ∈ RN ×(D+1) and w = w  ∈ RD+1
  
..  2
1 . ..
.
Then, our predictions reduce to y = Xw.
Intro ML (UofT) CSC311-Lec2 12 / 53
Solving the Minimization Problem
We defined a cost function. This is what we’d like to minimize.

Two commonly applied mathematical approaches:

Algebraic, e.g., using inequalities:
I to show z ∗ minimizes f (z), show that ∀z, f (z) ≥ f (z ∗ )
I to show that a = b, show that a ≥ b and b ≥ a

Calculus: minimum of a smooth function (if it exists) occurs at a

critical point, i.e. point where the derivative is zero.
I multivariate generalization: set the partial derivatives to zero (or

equivalently the gradient).

Solutions may be direct or iterative

Sometimes we can directly find provably optimal parameters (e.g. set the
gradient to zero and solve in closed form). We call this a direct solution.
We may also use optimization techniques that iteratively get us closer to
the solution. We will get back to this soon.

Intro ML (UofT) CSC311-Lec2 13 / 53

Direct Solution I: Linear Algebra
We seek w to minimize kXw − tk2 , or equivalently kXw − tk
range(X) = {Xw | w ∈ RD } is a D-dimensional subspace of RN .
Recall that the closest point y∗ = Xw∗ in subspace range(X) of RN to
arbitrary point t ∈ RN is found by orthogonal projection.

We have (y∗ − t) ⊥ Xw, ∀w ∈ RD

Why is y∗ the closest point to t?

I Consider any z = Xw
I By Pythagorean theorem and the
trivial inequality (x2 ≥ 0):

kz − tk2 = ky∗ − tk2 + ky∗ − zk2

≥ ky∗ − tk2

Intro ML (UofT) CSC311-Lec2 14 / 53

Direct Solution I: Linear Algebra

From the previous slide, we have (y∗ − t) ⊥ Xw, ∀w ∈ RD

Equivalently, the columns of the design matrix X are all
orthogonal to (y∗ − t), and we have that:

X> (y∗ − t) = 0
X> Xw∗ − X> t = 0
X> Xw∗ = X> t
w∗ = (X> X)−1 X> t

While this solution is clean and the derivation easy to remember,

like many algebraic solutions, it is somewhat ad hoc.
On the hand, the tools of calculus are broadly applicable to
differentiable loss functions...

Intro ML (UofT) CSC311-Lec2 15 / 53

Direct Solution II: Calculus

Partial derivative: derivative of a multivariate function with

respect to one of its arguments.

∂ f (x1 + h, x2 ) − f (x1 , x2 )
f (x1 , x2 ) = lim
∂x1 h→0 h
To compute, take the single variable derivative, pretending the
other arguments are constant.
Example: partial derivatives of the prediction y
   
∂y ∂ X ∂y ∂ X
= wj 0 xj 0 + b = wj 0 xj 0 + b
∂wj ∂wj 0
∂b ∂b 0
j j

= xj =1

Intro ML (UofT) CSC311-Lec2 16 / 53

Direct Solution II: Calculus

For loss derivatives, apply the chain rule:

∂L dL ∂y
=
∂wj dy ∂wj ∂L dL ∂y
=
d 1 ∂b dy ∂b
= (y − t)2 · xj
dy 2 =y−t
= (y − t)xj

For cost derivatives, use linearity and average over data points:

∂J 1 PN (i) (i) ∂J 1 PN
∂wj = N i=1 (y − t(i) ) xj ∂b = N i=1 y
(i) − t(i)

Minimum must occur at a point where partial derivatives are zero.

∂J ∂J
= 0 (∀j), = 0.
∂wj ∂b

(if ∂J /∂wj 6= 0, you could reduce the cost by changing wj )

Intro ML (UofT) CSC311-Lec2 17 / 53
Direct Solution II: Calculus

The derivation on the previous slide gives a system of linear

equations, which we can solve efficiently.
As is often the case for models and code, however, the solution is
easier to characterize if we vectorize our calculus.
We call the vector of partial derivatives the gradient
Thus, the “gradient of f : RD → R”, denoted ∇f (w), is:
>
∂ ∂
f (w), . . . , f (w)
∂w1 ∂wD

The gradient points in the direction of the greatest rate of increase.

Analogue of second derivative (the “Hessian” matrix):
2
∇2 f (w) ∈ RD×D is a matrix with [∇2 f (w)]ij = ∂w∂i ∂wj f (w).

Intro ML (UofT) CSC311-Lec2 18 / 53

Aside: The Hessian Matrix

Analogue of second derivative (the Hessian): ∇2 f (w) ∈ RD×D is a

2
matrix with [∇2 f (w)]ij = ∂w∂i ∂wj f (w).
I Recall from multivariable calculus that for continuously
2 2
differentiable f , ∂w∂i ∂wj f = ∂w∂j ∂wi f , so the Hessian is symmetric.
The second derivative test in single variable calculus: a critical point is a
local minimum if the second derivative is positive.
The multivariate analogue involves the eigenvalues of the Hessian.
I Recall from linear algebra that the eigenvalues of a symmetric
matrix (and therefore the Hessian) are real-valued.
I If all of the eigenvalues are positive, we say the Hessian is
positive definite.
I A critical point (∇f (w) = 0) of a continuously differentiable
function f is a local minimum if the Hessian is positive definite.

Intro ML (UofT) CSC311-Lec2 19 / 53

Aside: The Hessian Matrix

Visualization:1

1
Image source: mkwiki.org
Intro ML (UofT) CSC311-Lec2 20 / 53
Direct Solution II: Calculus

We seek w to minimize J (w) = 12 kXw − tk2

Taking the gradient with respect to w (see course notes for
additional details) we get:

∇w J (w) = X> Xw − X> t = 0

We get the same optimal weights as before:

w∗ = (X> X)−1 X> t

Linear regression is one of only a handful of models in this course

that permit direct solution.

Intro ML (UofT) CSC311-Lec2 21 / 53

Feature Mapping (Basis Expansion)

The relation between the input and output may not be linear.

We can still use linear regression by mapping the input features to

another space using feature mapping (or basis expansion).
ψ(x) : RD → Rd and treat the mapped feature (in Rd ) as the
input of a linear regression procedure.
Let us see how it works when x ∈ R and we use a polynomial
feature mapping.

Intro ML (UofT) CSC311-Lec2 22 / 53

Polynomial Feature Mapping
If the relationship doesn’t look linear, we can fit a polynomial.

Fit the data using a degree-M polynomial function of the form:

M
X
2 M
y = w0 + w1 x + w2 x + ... + wM x = wi x i
i=0

Good model (M=3): Achieves small test error (generalizes well).

1 M =3
t

−1

0 x 1

Intro ML (UofT) CSC311-Lec2 28 / 53

Model Complexity and Generalization

1 M =9
t

−1

0 x 1

As M increases, the magnitude of coefficients gets larger.

For M = 9, the coefficients have become finely tuned to the data.
Between data points, the function exhibits large oscillations.

Intro ML (UofT) CSC311-Lec2 29 / 53

Regularization

The degree M of the polynomial controls the model’s complexity.

The value of M is a hyperparameter for polynomial expansion,
just like k in KNN. We can tune it using a validation set.
Restricting the number of parameters / basis functions (M ) is a
crude approach to controlling the model complexity.
Another approach: keep the model large, but regularize it
I Regularizer: a function that quantifies how much we prefer one
hypothesis vs. another

Intro ML (UofT) CSC311-Lec2 30 / 53

L2 (or `2 ) Regularization
We can encourage the weights to be small by choosing as our
regularizer the L2 penalty.
1X 2
R(w) = 21 kwk22 = wj .
2
j

I Note: To be precise, the L2 norm is Euclidean distance, so we’re

regularizing the squared L2 norm.
The regularized cost function makes a tradeoff between fit to the
data and the norm of the weights.
λX 2
Jreg (w) = J (w) + λR(w) = J (w) + wj
2
j

If you fit training data poorly, J is large. If your optimal weights

have high values, R is large.
Large λ penalizes weight values more.
Like M , λ is a hyperparameter we can tune with a validation set.
Intro ML (UofT) CSC311-Lec2 31 / 53
L2 (or `2 ) Regularization

The geometric picture:

Intro ML (UofT) CSC311-Lec2 32 / 53

L2 Regularized Least Squares: Ridge regression

1
For the least squares problem, we have J (w) = 2N kXw − tk2 .
When λ > 0 (with regularization), regularized cost gives

1 λ
wλRidge = argmin Jreg (w) = argmin kXw − tk22 + kwk22
w w 2N 2
> −1 >
=(X X + λN I) X t

The case λ = 0 (no regularization) reduces to least squares

solution!
Note that it is also common to formulate this problem as
argminw 12 kXw − tk22 + λ2 kwk22 in which case the solution is
wλRidge = (X> X + λI)−1 X> t.

Intro ML (UofT) CSC311-Lec2 33 / 53

Conclusion so far

Linear regression exemplifies recurring themes of this course:

choose a model and a loss function
formulate an optimization problem
solve the minimization problem using one of two strategies
I direct solution (set derivatives to zero)
I gradient descent (next topic)
vectorize the algorithm, i.e. represent in terms of linear algebra
make a linear model more powerful using features
improve the generalization by adding a regularizer

Intro ML (UofT) CSC311-Lec2 34 / 53

Gradient Descent
Now let’s see a second way to minimize the cost function which is
more broadly applicable: gradient descent.
Many times, we do not have a direct solution: Taking derivatives of
J w.r.t w and setting them to 0 doesn’t have an explicit solution.
Gradient descent is an iterative algorithm, which means we apply
an update repeatedly until some criterion is met.
We initialize the weights to something reasonable (e.g. all zeros)
and repeatedly adjust them in the direction of steepest descent.

Intro ML (UofT) CSC311-Lec2 35 / 53

Gradient Descent

Observe:
I if ∂J /∂wj > 0, then increasing wj increases J .
I if ∂J /∂wj < 0, then increasing wj decreases J .
The following update always decreases the cost function for small
enough α (unless ∂J /∂wj = 0):

∂J
wj ← wj − α
∂wj

α > 0 is a learning rate (or step size). The larger it is, the faster w
changes.
I We’ll see later how to tune the learning rate, but values are
typically small, e.g. 0.01 or 0.0001.
I If cost is the sum of N individual losses rather than their average,
smaller learning rate will be needed (α0 = α/N ).

Intro ML (UofT) CSC311-Lec2 36 / 53

Gradient Descent
This gets its name from the gradient:
 ∂J 
∂w1
∂J
∇w J = =  ... 
 
∂w ∂J
∂wD

I This is the direction of fastest increase in J .

Update rule in vector form:
∂J
w ←w−α
∂w
And for linear regression we have:
N
α X (i)
w←w− (y − t(i) ) x(i)
N
i=1

So gradient descent updates w in the direction of fastest decrease.

Observe that once it converges, we get a critical point, i.e. ∂J
∂w = 0.
Intro ML (UofT) CSC311-Lec2 37 / 53
Gradient Descent for Linear Regression

The squared error loss of linear regression is a convex function.

Even for linear regression, where there is a direct solution, we
sometimes need to use GD.
Why gradient descent, if we can find the optimum directly?
I GD can be applied to a much broader set of models
I GD can be easier to implement than direct solutions
I For regression in high-dimensional space, GD is more efficient than
direct solution
I Linear regression solution: (X> X)−1 X> t
I Matrix inversion is an O(D3 ) algorithm
I Each GD update costs O(N D)
I Or less with stochastic GD (SGD, in a few slides)
I Huge difference if D 1

Intro ML (UofT) CSC311-Lec2 38 / 53

Gradient Descent under the L2 Regularization

Gradient descent update to minimize J :

∂
w ←w−α J
∂w
The gradient descent update to minimize the L2 regularized cost
J + λR results in weight decay:
∂
w ←w−α (J + λR)
∂w

∂J ∂R
=w−α +λ
∂w ∂w

∂J
=w−α + λw
∂w
∂J
= (1 − αλ)w − α
∂w

Intro ML (UofT) CSC311-Lec2 39 / 53

Learning Rate (Step Size)

In gradient descent, the learning rate α is a hyperparameter we

need to tune. Here are some things that can go wrong:

α too small: α too large: α much too large:

slow progress oscillations instability
Good values are typically between 0.001 and 0.1. You should do a
grid search if you want good performance (i.e. try
0.1, 0.03, 0.01, . . .).

Intro ML (UofT) CSC311-Lec2 40 / 53

Training Curves

To diagnose optimization problems, it’s useful to look at training

curves: plot the training cost as a function of iteration.

Warning: in general, it’s very hard to tell from the training curves
whether an optimizer has converged. They can reveal major
problems, but they can’t guarantee convergence.
Intro ML (UofT) CSC311-Lec2 41 / 53
Stochastic Gradient Descent

So far, the cost function J has been the average loss over the
training examples:
N N
1 X (i) 1 X
J (θ) = L = L(y(x(i) , θ), t(i) ).
N N
i=1 i=1

(θ denotes the parameters; e.g., in linear regression, θ = (w, b))

By linearity,
N
∂J 1 X ∂L(i)
= .
∂θ N ∂θ
i=1

Computing the gradient requires summing over all of the training

examples. This is known as batch training.
Batch training is impractical if you have a large dataset N 1
(e.g. millions of training examples)!

Intro ML (UofT) CSC311-Lec2 42 / 53

Stochastic Gradient Descent

Stochastic gradient descent (SGD): update the parameters based on the

gradient for a single training example,

1− Choose i uniformly at random,

∂L(i)
2− θ ←θ−α
∂θ

Cost of each SGD update is independent of N !

SGD can make significant progress before even seeing all the data!
Mathematical justification: if you sample a training example uniformly
at random, the stochastic gradient is an unbiased estimate of the batch
gradient:
(i) N
∂L 1 X ∂L(i) ∂J
E = = .
∂θ N i=1 ∂θ ∂θ

Intro ML (UofT) CSC311-Lec2 43 / 53

Stochastic Gradient Descent

Problems with using single training example to estimate gradient:

I Variance in the estimate may be high
I We can’t exploit efficient vectorized operations
Compromise approach:
I compute the gradients on a randomly chosen medium-sized set of
training examples M ⊂ {1, . . . , N }, called a mini-batch.
Stochastic gradients computed on larger mini-batches have smaller
variance.
The mini-batch size |M| is a hyperparameter that needs to be set.
I Too large: requires more compute; e.g., it takes more memory to
store the activations, and longer to compute each gradient update
I Too small: can’t exploit vectorization, has high variance
I A reasonable value might be |M| = 100.

Intro ML (UofT) CSC311-Lec2 44 / 53

Stochastic Gradient Descent

Batch gradient descent moves directly downhill (locally speaking).

SGD takes steps in a noisy direction, but moves downhill on
average.

batch gradient descent stochastic gradient descent

Intro ML (UofT) CSC311-Lec2 45 / 53

SGD Learning Rate

In stochastic training, the learning rate also influences the

fluctuations due to the stochasticity of the gradients.

Typical strategy:
I Use a large learning rate early in training so you can get close to
the optimum
I Gradually decay the learning rate to reduce the fluctuations

Intro ML (UofT) CSC311-Lec2 46 / 53

When are critical points optimal?

critical
point

critical
point local
maximum

local critical
minimum point

global
minimum

Gradient descent finds a critical point, but it may be a local

optima.
Convexity is a property that guarantees that all critical points are
global minima.

Intro ML (UofT) CSC311-Lec2 47 / 53

Convex Sets

A set S is convex if any line segment connecting points in S lies

entirely within S. Mathematically,

x1 , x2 ∈ S =⇒ λx1 + (1 − λ)x2 ∈ S for 0 ≤ λ ≤ 1.

A simple inductive argument shows that for x1 , . . . , xN ∈ S,

weighted averages, or convex combinations, lie within the set:

λ 1 x1 + · · · + λ N xN ∈ S for λi > 0, λ1 + · · · λN = 1.

Intro ML (UofT) CSC311-Lec2 48 / 53

Convex Functions

A function f is convex if for any x0 , x1 in the domain of f ,

f ((1 − λ)x0 + λx1 ) ≤ (1 − λ)f (x0 ) + λf (x1 )

Equivalently, the set of

points lying above the
graph of f is convex.
Intuitively: the function
is bowl-shaped.

Intro ML (UofT) CSC311-Lec2 49 / 53

Convex Functions

We just saw that the

least-squares loss
1 2
2 (y − t) is convex as
a function of y
For a linear model,
z = w> x + b is a linear
function of w and b. If
the loss function is
convex as a function of
z, then it is convex as a
function of w and b.

Intro ML (UofT) CSC311-Lec2 50 / 53

L1 vs. L2 Regularization

The L1 norm, or sum of absolute values, is another regularizer that encourages

weights to be exactly zero. (How can you tell?)
We can design regularizers based on whatever property we’d like to encourage.

— Bishop, Pattern Recognition and Machine Learning

Intro ML (UofT) CSC311-Lec2 51 / 53

Linear Regression with Lp Regularization
Which sets are convex?

Solution of linear regression with Lp regularization:

p = 2: Has a closed form solution.
p ≥ 1, p 6= 2:
I The objective is convex.
I The true solution can be found using gradient descent.
p < 1:
I The objective is non-convex.
I Can only find approximate solution (e.g. the best in its
neighborhood) using gradient descent.
Intro ML (UofT) CSC311-Lec2 52 / 53
Conclusion

In this lecture, we looked at linear regression, which exemplifies a

modular approach that will be used throughout this course:
I choose a model describing the relationships between variables of
interest (linear)
I define a loss function quantifying how bad the fit to the data is
(squared error)
I choose a regularizer to control the model complexity/overfitting
(L2 , Lp regularization)
I fit/optimize the model (gradient descent, stochastic gradient
descent, convexity)
By mixing and matching these modular components, we can
obtain new ML methods.
Next lecture: apply this framework to classification

Intro ML (UofT) CSC311-Lec2 53 / 53

Cse 445 ML - 1
No ratings yet
Cse 445 ML - 1
28 pages
Lec06 Matt
No ratings yet
Lec06 Matt
60 pages
Linear - Regression - SGD
No ratings yet
Linear - Regression - SGD
71 pages
2-LR Optim
No ratings yet
2-LR Optim
60 pages
Linear Regression
No ratings yet
Linear Regression
55 pages
Intro To ML - 3
No ratings yet
Intro To ML - 3
20 pages
Berkeley Machine Learning
No ratings yet
Berkeley Machine Learning
185 pages
ML - Lec 4-Introduction To Regression
No ratings yet
ML - Lec 4-Introduction To Regression
65 pages
Lecture3 Supervised Learning I
No ratings yet
Lecture3 Supervised Learning I
84 pages
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
No ratings yet
Understanding The Geometry of Predictive Models: Workshop at S P Jain School Institute of Management and Research
78 pages
2EL1730 ML Lecture02 Linear and Logistic Regression
No ratings yet
2EL1730 ML Lecture02 Linear and Logistic Regression
65 pages
5.2 Regression
No ratings yet
5.2 Regression
19 pages
Regression
No ratings yet
Regression
39 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
W2 Ecs7020p
No ratings yet
W2 Ecs7020p
54 pages
03 Linear Regression Intuition
No ratings yet
03 Linear Regression Intuition
23 pages
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
No ratings yet
Linear Classifier: by Dr. Sanjeev Kumar Associate Professor Department of Mathematics IIT Roorkee, Roorkee-247 667, India
86 pages
COL774 Practice Problems
No ratings yet
COL774 Practice Problems
22 pages
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
No ratings yet
1.1 ID5059 1.2 Tom Kelsey - Jan 2021: February 15, 2021
43 pages
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
No ratings yet
Linear Models and Learning Via Optimization: Piyush Rai Introduction To Machine Learning (CS771A)
26 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Intro To ML RevisionNotes
No ratings yet
Intro To ML RevisionNotes
24 pages
Abstract: y F X X X, X, X
No ratings yet
Abstract: y F X X X, X, X
10 pages
CS550 Lec2
No ratings yet
CS550 Lec2
24 pages
03 Linear Models
No ratings yet
03 Linear Models
46 pages
Linear Regression
No ratings yet
Linear Regression
62 pages
Lect5 Reg
No ratings yet
Lect5 Reg
16 pages
Regression Using LS Handout
No ratings yet
Regression Using LS Handout
21 pages
Lecture 5
No ratings yet
Lecture 5
18 pages
Machine Learning: Linear Models For Regression
No ratings yet
Machine Learning: Linear Models For Regression
54 pages
Lec9 - Linear Models
No ratings yet
Lec9 - Linear Models
44 pages
02 - Linear Models - A
No ratings yet
02 - Linear Models - A
23 pages
Week 4 Linear Regression
No ratings yet
Week 4 Linear Regression
38 pages
Linear Regression
No ratings yet
Linear Regression
9 pages
Linear Regression
No ratings yet
Linear Regression
29 pages
04 LinearModels
No ratings yet
04 LinearModels
28 pages
04 LinearRegression
No ratings yet
04 LinearRegression
61 pages
Lec 03
No ratings yet
Lec 03
42 pages
ML Linear Model
No ratings yet
ML Linear Model
10 pages
Linear Regression With One Variable
No ratings yet
Linear Regression With One Variable
48 pages
Convex Optimization Prerequisite - Topics
No ratings yet
Convex Optimization Prerequisite - Topics
6 pages
Group 30
No ratings yet
Group 30
33 pages
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
No ratings yet
CS221 - Artificial Intelligence - Machine Learning - 2 Linear Regression
24 pages
AAI ATC Final Notes and Timetable
No ratings yet
AAI ATC Final Notes and Timetable
6 pages
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
No ratings yet
CIS 4526: Foundations of Machine Learning Linear Regression: (Modified From Sanja Fidler)
20 pages
Lec 3-5 (Function Approximation)
No ratings yet
Lec 3-5 (Function Approximation)
34 pages
Sms Essay 2
No ratings yet
Sms Essay 2
6 pages
Lecture 2
No ratings yet
Lecture 2
66 pages
Lecture 3 - Regression
No ratings yet
Lecture 3 - Regression
47 pages
Day 1
No ratings yet
Day 1
41 pages
Lecture 1, Part 1: Linear Regression: Roger Grosse
No ratings yet
Lecture 1, Part 1: Linear Regression: Roger Grosse
9 pages
Today: - Calculus
No ratings yet
Today: - Calculus
61 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
Lecture 3 - Linear Regression
No ratings yet
Lecture 3 - Linear Regression
31 pages
Introduction To Machine Learning Lecture 2: Linear Regression
No ratings yet
Introduction To Machine Learning Lecture 2: Linear Regression
38 pages
ML Aggarwal Solutions Maths Section B Class 12 Chapter 1 Vectors
No ratings yet
ML Aggarwal Solutions Maths Section B Class 12 Chapter 1 Vectors
11 pages
Armstrong Gordon Scott) Theatre and Consciousness
No ratings yet
Armstrong Gordon Scott) Theatre and Consciousness
192 pages
L02 Linear Regression
No ratings yet
L02 Linear Regression
9 pages
COM 201 - Inferential Statistics - 18032022-1
No ratings yet
COM 201 - Inferential Statistics - 18032022-1
58 pages
Wk05 Machine Learning
No ratings yet
Wk05 Machine Learning
6 pages
Lecture 23
No ratings yet
Lecture 23
45 pages
Guidelines For Steel Girder Bridge Analysis - AASHTO-NSBA G13-1
No ratings yet
Guidelines For Steel Girder Bridge Analysis - AASHTO-NSBA G13-1
154 pages
GCSE Research Methods Workbook PDF
No ratings yet
GCSE Research Methods Workbook PDF
26 pages
Lec10 LeastSquaresRegression PDF
No ratings yet
Lec10 LeastSquaresRegression PDF
4 pages
CS7641 Machine Learning Midterm Notes PDF
No ratings yet
CS7641 Machine Learning Midterm Notes PDF
239 pages
Lecture 3, Fundamentals of Systems (عرض تقديمي)
No ratings yet
Lecture 3, Fundamentals of Systems (عرض تقديمي)
12 pages
Tut 01
No ratings yet
Tut 01
39 pages
Plastic Analysis
No ratings yet
Plastic Analysis
37 pages
Decoding The Universe
No ratings yet
Decoding The Universe
3 pages
F 06 Ztransf
No ratings yet
F 06 Ztransf
6 pages
SM015 KMNS 2024 - 2025 A
No ratings yet
SM015 KMNS 2024 - 2025 A
11 pages
Microprocessor Ece Lab Manual
50% (2)
Microprocessor Ece Lab Manual
80 pages
Traffic Impact Models: by Sia Ardekani Ezra Hauer Bahram Jamei
No ratings yet
Traffic Impact Models: by Sia Ardekani Ezra Hauer Bahram Jamei
26 pages
4 - Spatial - Filtering
No ratings yet
4 - Spatial - Filtering
32 pages
(BAO, 2014) A Simulation-Based Portfolio Optimization Approach With Least Squares Learning
No ratings yet
(BAO, 2014) A Simulation-Based Portfolio Optimization Approach With Least Squares Learning
6 pages
2.1.1 Central Tendencies
No ratings yet
2.1.1 Central Tendencies
4 pages
Metro Uses A Simulation Optimization Approach To Improve Fare Collection Shift Scheduling
No ratings yet
Metro Uses A Simulation Optimization Approach To Improve Fare Collection Shift Scheduling
15 pages
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
No ratings yet
Pamantasan NG Lungsod NG Muntinlupa University Road Poblacion, Muntinlupa College of Teacher Education
8 pages
Distributed Mutual Exclusion
No ratings yet
Distributed Mutual Exclusion
28 pages
HORIZONTAL and VERTICAL CURVES Problem Set
No ratings yet
HORIZONTAL and VERTICAL CURVES Problem Set
4 pages
Circle I and II
No ratings yet
Circle I and II
5 pages
Electronics - Week 1
No ratings yet
Electronics - Week 1
2 pages
Division of Gen. Trias City: Project Isulat - Activity Sheets in Mathematics 4 Quarter 3 Grade 4 - Week 2
No ratings yet
Division of Gen. Trias City: Project Isulat - Activity Sheets in Mathematics 4 Quarter 3 Grade 4 - Week 2
4 pages
Measurement of Intelligence
No ratings yet
Measurement of Intelligence
28 pages
DAC Concept
No ratings yet
DAC Concept
6 pages
CAP - Cumulative Accuracy Profile PDF
No ratings yet
CAP - Cumulative Accuracy Profile PDF
1 page
Thin Walled Pressure Vessel Design Calculation Tutorial
No ratings yet
Thin Walled Pressure Vessel Design Calculation Tutorial
9 pages
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Geometric functions in computer aided geometric design
From Everand
Geometric functions in computer aided geometric design
Oscar Ruiz
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.