0% found this document useful (0 votes)

50 views5 pages

Linear Regression - Least-Squares

One of the simplest supervised models is a linear model. A linear model expresses the target output value as a sum of weighted input variables. For example, a linear model could predict a house's market price based on its tax assessment and age. The model would compute the predicted price as a constant plus a tax assessment weight times the tax amount minus an age weight times the age. When training a linear model, the goal is to estimate the weights and constant to minimize errors between predicted and actual prices in the training data.

Uploaded by

Rongrong Fu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views5 pages

Linear Regression - Least-Squares

Uploaded by

Rongrong Fu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

One of the simplest supervised

models are linear models. A linear model expresses the target output
value in terms of a sum of weighted
input variables. For example, our goal may be to predict the market
value of a house, it's expected sales price in the next month, for example. Suppose
we're given
two input variables. How much tax the property
is assessed each year by the local government and the
age of the house in years. You can imagine that these
two features of the house would each have some
information that's helpful in predicting
the market price because in most places there's a positive correlation between the
tax assessment on a
house and its market value. Indeed, the tax assessment
is often partly based on market prices
from previous years. Maybe a negative
correlation between its age in years and
the market value, so older houses may need more repair and
upgrading, for example. One linear model, which
I've made up as an example, could compute the
expected market price in US dollars by starting
with a constant term here, 212,000, and then
adding some number, let's say 109 times the
value of tax paid last year, and then subtracting 2,000 times the age of
the house in years. For example, this linear model would estimate the
market price of a house where the
tax assessment was $10,000 and that was 75 years
old as about $1.2 million. Now, I just made up this
particular linear model myself as an example, but in general, when we talk about
training a linear model, we mean estimating values for the parameters of the model
or coefficients of the model, as we sometimes call
them, which are here, the constant value 212,000 and
the weights 109 and 2,000. In such a way that the
resulting predictions for the outcome variable Y price for different houses
are a good fit to the data from actual past sales. We'll discuss what good
fit means shortly. Predicting house price is an
example of a regression task using a linear model called not surprisingly,
linear regression. While generally in a
linear regression model, there may be multiple input
variables or features, which will denote,
x_0, x_1, etc. Each feature x_i has a
corresponding weight w_i. The predicted output,
which we denote y hat, is a weighted sum of features
plus a constant term b hat. I've put a hat over all the
quantities here that are estimated during the
aggression training process. The w hat and b hat values, which we call the train
parameters or coefficients, are estimated from
training data, and y hat is estimated from
the linear function of input feature values and the train parameters
For example, in the simple housing
price example we just saw, w_0 hat was 109, x_0 represented tax paid, w_1 hat was
negative 20, x_1 was house age, and b hat was 212,000. We call these w-i values
model coefficients or
sometimes feature weights, and b hat is called the bias term or the
intercept of the model. Here's an example of a
linear regression model with just one input variable or feature x_0 on a simple
artificial example dataset. The blue cloud of
points represents training set of x_0, y pairs. In this case, the formula for
predicting the output y hat is just w_0 hat times
x_0 plus b hat, which you might recognize as the familiar slope intercept
formula for a straight line, where w_0 hat is the slope, and b hat is the y-
intercept. The gray and red lines represent different possible linear
regression models that could attempt to explain the
relationship between x_0 and y. You can see that some lines
are a better fit than others. The better fitting
models capture the approximately
linear relationship where as x_0 increases, y also increases in
a linear fashion. The red line seems
especially good. Intuitively, there are not as many blue training
points that are very far above or very far below the
red linear model prediction. Let's take a look at
a very simple form of linear regression
model that just has one input variable or feature
to use for prediction. In this case, we have the vector x is just
as a single component. We'll call it x_0. That's the input variable,
the input feature. In this case, because
there's just one variable, the predicted output is simply the product of the
weight w_0 with the input variable
x_0 plus a bias term b. X_0 is the value that's provided and
comes with the data, and so the parameters we
have to estimate are w-0 and b in order to obtain the parameters for
this linear regression model. This formula may look familiar. It's the formula for
a line
in terms of its slope. In this case, slope
corresponds to the weight w_0 and b
corresponds to the y-intercept, we call the bias term. Here, the job of the model
is to take as input, let's pick a point
here along the x-axis. W_0 corresponds to the
slope of this line, and b corresponds to the
y-intercept of the line. By finding these two parameters together to find a
straight
line in this feature space. Now, the important
thing to remember is that there's a training phase
and a prediction phase. The training phase using
the training data is what we'll use to estimate w_0 and b. One widely used
method for estimating w and b for linear
regression problem, it's called least-squares
linear regression, also known as ordinary
least-squares. Least-squares linear regression
finds the line through this cloud of points
that minimizes what is called the mean squared
error of the model. The mean squared error of
the model is essentially the sum of the squared
differences between the predicted target value and the actual target value for all
the points
in the training set. This plot illustrates
what that means. The blue points represent
points in the training set. The red line here represents the least-squares model
that was found through this cloud
of training points. These black lines show the difference between
the y value that was predicted for a
training point based on its x position and the actual y value of
the training point. For example here, this point, let's say has an x
value of negative 1.75. If we plug it into the formula
for this linear model, we get a prediction here at this point
on the line which is somewhere around, let's say 60. But the actual observed value
in the training set for this point was
maybe closer to 10. In this case, for this
particular point, the squared difference between
the predicted target and the actual target would
be 60 minus 10 squared. We can do this calculation for every one of the points
in the training set. We can compute this
squared difference between the y value we observe
in the training set for a point and the y value
that we'd be predicted by the linear model given that
training points x value. Each of these can be computed as this squared difference
can be computed. Then if we add all these up, and divide by the number
of training points, take the average, that will be the mean squared
error of the model. The technique of least-squares is designed to find the slope,
the w value, and the b value, the y-intercept that minimize
this mean squared error. One thing to note about this linear regression model is
that there are
no parameters to control the model complexity. No matter what the
value of w and b, the result is always going
to be a straight line. This is both a strength and a weakness of the model
as we'll see later. When you have a moment, compare this simple linear model to the
more complex regression
model learned with k-nearest neighbors regression
on the same dataset. You can see that
linear models make a strong prior assumption about the relationship between
the input x and output y. Linear models may
seem simplistic, but for data with many features, linear models can be
very effective and generalize well to new data
beyond the training set. Now the question is, how exactly do we estimate the
linear models, w and b parameters so
the model is a good fit? Well, the w and b parameters are estimated using
the training data, and there are lots of
different methods for estimating w and b
depending on the criteria you'd like to use for the definition of
what a good fit to the training data is and how you want to control
model complexity. For linear models,
model complexity is based on the nature
of the weights w on the input features. Simpler linear models have a weight vector
w that's
closer to zero, i.e., where more features or they're
not used at all and have zero weight or have less influence on the outcome
with very small weight. Typically, given
possible settings for the model parameters, the learning algorithm predicts the
target value for
each training example, and then computes
what is called a loss function for each
training example. That's a penalty value for
incorrect predictions. Predictions incorrect when they predict the target value is
different than the actual target value in
the training set. For example, a squared
loss function would return the squared difference between the target value and the
actual value as the penalty. The learning algorithm then computes or searches
for the set of w, b parameters that minimize the total of this loss function
over all training points. The most popular way to estimate w and b parameters
is using what's called least squares linear regression or
ordinary least-squares. Least-squares finds
the values of w and b that minimize the total sum of squared differences between
the predicted y value and the actual y value
in the training set. Or equivalently, it minimizes the mean squared
error of the model. Least-squares is based on the squared loss function
mentioned before. This is illustrated graphically
here where I've zoomed in on the left lower portion of the simple
regression data set. The red line represents the least squares solution for w and b
through
the training data. The vertical lines
represent the difference between the actual y value
of a training point, x_i [inaudible] y and
its predicted y-value given x_i which lies on the
red line where x equals x_i. Adding up all the squared
values of these differences for all the training points gives
the total squared error. This is what the least
squares solution minimizes. Here there are no parameters
to control model complexity. The linear model
always uses all of the input variables and always is represented
by a straight line. Another name for this quantity is the residual sum of squares.
The actual target value is given in y_i and the predicted
y-hat value for the same training example
is given by the right side of the formula using
the linear model with parameters w and b. Let's look at how to implement
this in scikit-learn. Linear regression in
scikit-learn is implemented by the linear regression class in the sklearn.linear
model module. As we did with other
estimators in scikit-learn, like the nearest neighbors classifier and the
regression models, we use the train
test split function on the original data
set and then create an fit the linear
regression object using the training data in x_train and the corresponding training
data target values in y train. Here, note that we're doing the creation and fitting
of the linear regression object in one line by chaining the fit method with the
constructor for the new object. The linear regression fit method acts to estimate
the
feature weights w, which it calls the
coefficients of the model. It stores this in the coef_ attribute and
the bias term b, which is stored in the
intercept_ attribute. Note that if a scikit-learn objects attribute ends
with an underscore, this means that these
attributes were derived from training data and not say quantities that
were set by the user. If we dump the coef_ and intercept_ attributes
for this simple example, we see that because there's only one input feature
variable, there's only one
element in the coef_ list, the value 45.7. The intercept attribute has
a value of about 148.4. We can see that indeed
these correspond to the red line shown
in the plot which has a slope of 45.7 and a
y-intercept of about 148.4. Here's the same code
in the notebook with additional code to score the quality of the
regression model in the same way that we did
for k nearest neighbors regression using the
R-squared metric. Here's the notebook
code we use to plot the least-squares linear
solution for this data set. Now that we've seen both
K-Nearest Neighbors regression and least squares regression. It's interesting now
to compare the least squares linear
regression results with the K-nearest
neighbor results. Here we can see how these
two regression methods represent two complementary
types of supervised learning. The K nearest neighbor
regressor doesn't make a lot of assumptions about
the structure of the data, and it gives
potentially accurate but sometimes
unstable predictions that are sensitive to small
changes in the training data. That has a
correspondingly higher training set R-squared score compared to least squares
linear regression. K-NN achieves an R-squared
score of 0.72 and least-squares
achieves an R-squared of 0.679 on the training set. On the other hand, linear
models make strong assumptions about
the structure of the data. In other words, that the target value can be predicted
using a weighted sum of the input variables and linear models gives stable but
potentially
inaccurate predictions. However, in this case, it turns out that
the linear model's strong assumption that there's a linear relationship
between the input and output variables happens to be a good fit for this data set.
It's better at more
accurately predicting the y value for a new x values that weren't
seen during training. We can see that
the linear model, it gets a slightly
better test set score of 0.492 versus 0.471 for
K-nearest neighbors. This indicates its
ability to better generalize and capture
this global linear trend.

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
4/5 (6454)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
4/5 (643)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brene Brown
4/5 (1175)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
4.5/5 (1005)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
4.5/5 (1856)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
4/5 (650)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
4/5 (1267)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
4.5/5 (4102)
IB English Guys - A Doll's House
100% (3)
IB English Guys - A Doll's House
49 pages
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
4/5 (903)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
4.5/5 (628)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
4/5 (1022)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
4.5/5 (361)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
4.5/5 (582)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
4.5/5 (298)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
4.5/5 (1139)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
4.5/5 (5181)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
4/5 (100)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
4.5/5 (943)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Tóibín
3.5/5 (2133)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
3.5/5 (464)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
4.5/5 (280)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
3.5/5 (2289)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
4/5 (4372)
Yes Please
From Everand
Yes Please
Amy Poehler
4/5 (2016)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
4/5 (1090)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
4.5/5 (2033)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
3.5/5 (2791)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
4/5 (2884)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
3.5/5 (233)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
4.5/5 (141)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
4.5/5 (244)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
4/5 (4088)
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
4/5 (78)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
3.5/5 (144)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
3.5/5 (919)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
3.5/5 (836)
John Adams
From Everand
John Adams
David McCullough
4.5/5 (2546)
65 Monitoring and Measuring Plate Flatness During Lapping
100% (1)
65 Monitoring and Measuring Plate Flatness During Lapping
8 pages
1984 MWD
100% (1)
1984 MWD
5 pages
Bar Questions in Credit Transactions
100% (2)
Bar Questions in Credit Transactions
3 pages
雅思写作词汇（58页）
No ratings yet
雅思写作词汇（58页）
58 pages
160句同义词替换
No ratings yet
160句同义词替换
3 pages
雅思词汇EXCEL词-乱序版
No ratings yet
雅思词汇EXCEL词-乱序版
648 pages
雅思词汇EXCEL版-顺序版
No ratings yet
雅思词汇EXCEL版-顺序版
648 pages
s10551-006-9316-x
No ratings yet
s10551-006-9316-x
18 pages
Wofalab
No ratings yet
Wofalab
3 pages
GMAT高频核心词汇 -- 赵洪波编 -- 2016 -- 杭州：浙江教育出版社 -- 9787553630762 -- 014bc2280c4c9f52d3690f2b163135dd -- Anna’s Archive
No ratings yet
GMAT高频核心词汇 -- 赵洪波编 -- 2016 -- 杭州：浙江教育出版社 -- 9787553630762 -- 014bc2280c4c9f52d3690f2b163135dd -- Anna’s Archive
504 pages
Sample Interview Questions
No ratings yet
Sample Interview Questions
10 pages
Quant Trading Guide v0-1
No ratings yet
Quant Trading Guide v0-1
10 pages
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
4/5 (45)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
4.5/5 (815)
Little Women
From Everand
Little Women
Louisa May Alcott
4.5/5 (2369)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
4/5 (278)
Guidelines For Optimum Water Consumption in Bulk Drugs Manufacturing Industry PDF
No ratings yet
Guidelines For Optimum Water Consumption in Bulk Drugs Manufacturing Industry PDF
128 pages
Critical
No ratings yet
Critical
3 pages
Normal Mode 1
No ratings yet
Normal Mode 1
18 pages
Counter-Brand and Alter-Brand Communities: The Impact of Web 2.0 On Tribal Marketing Approaches
No ratings yet
Counter-Brand and Alter-Brand Communities: The Impact of Web 2.0 On Tribal Marketing Approaches
16 pages
Student Work - Andrew Stanton - Wall-E Assignment
No ratings yet
Student Work - Andrew Stanton - Wall-E Assignment
2 pages
DRAM Terminology and Basics, Energy Innovations
No ratings yet
DRAM Terminology and Basics, Energy Innovations
14 pages
Anatomy of The Soul by Chaim Kramer Avraham Sutton
No ratings yet
Anatomy of The Soul by Chaim Kramer Avraham Sutton
601 pages
P.E 4 Prelim
No ratings yet
P.E 4 Prelim
2 pages
Nutrition and Immunity
No ratings yet
Nutrition and Immunity
40 pages
Bombay Panel - Compressed (1) Senior Counsels.
No ratings yet
Bombay Panel - Compressed (1) Senior Counsels.
23 pages
MKTG711 - Product and Brand Management
No ratings yet
MKTG711 - Product and Brand Management
3 pages
Communication Research Module (1)
No ratings yet
Communication Research Module (1)
80 pages
Jeevan Mukti by Baba Faqir Chand
100% (1)
Jeevan Mukti by Baba Faqir Chand
33 pages
Case Study (Organizational Structure)
50% (2)
Case Study (Organizational Structure)
2 pages
CFP&A
No ratings yet
CFP&A
4 pages
Projectiden Tificationand Selectio01
No ratings yet
Projectiden Tificationand Selectio01
11 pages
Gunn Diode: Microwave Diode Tutorial
No ratings yet
Gunn Diode: Microwave Diode Tutorial
4 pages
Aishite Aishite Aishite (Love Me, Love Me, Love Me) Sheet Music for Piano (Solo) Musescore.com
No ratings yet
Aishite Aishite Aishite (Love Me, Love Me, Love Me) Sheet Music for Piano (Solo) Musescore.com
1 page
To Kill A Mockingbird by Harper Lee LitCharts Guide - 2. Completed Theme Wheel
100% (1)
To Kill A Mockingbird by Harper Lee LitCharts Guide - 2. Completed Theme Wheel
1 page
Step 1: Answer All Parts of The Question
No ratings yet
Step 1: Answer All Parts of The Question
5 pages
Cascadia Manual
100% (11)
Cascadia Manual
25 pages
Narrative Report - NSTP 2nd Sem Chapter 3
No ratings yet
Narrative Report - NSTP 2nd Sem Chapter 3
1 page
InFlow User Manual
100% (1)
InFlow User Manual
146 pages
Women Empowerment in Islam by Qurat ul Ain Khan.pptx_20240918_150357_0000
No ratings yet
Women Empowerment in Islam by Qurat ul Ain Khan.pptx_20240918_150357_0000
21 pages
Math 1020 Stat 2010 Syllabus Diaz Fa 2022
No ratings yet
Math 1020 Stat 2010 Syllabus Diaz Fa 2022
3 pages
Lisa Nowosad Smart Goals
No ratings yet
Lisa Nowosad Smart Goals
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Linear Regression - Least-Squares

Uploaded by

Linear Regression - Least-Squares

Uploaded by

One of the simplest supervised

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.