0% found this document useful (0 votes)

2 views

Lecture 2 - LRM

The document provides an overview of linear regression, explaining its purpose of relating two variables and estimating the impact of one on the other through regression coefficients. It discusses the Ordinary Least Squares (OLS) method for finding the best fit line, the importance of the coefficient of determination (R²) for assessing model fit, and the process of hypothesis testing for regression coefficients. Additionally, it covers the assumptions of the classical linear regression model and the Gauss-Markov theorem, which ensures that OLS estimators are the best linear unbiased estimators (BLUE).

Uploaded by

thuongnguyen.31231022167

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 2 - LRM

Uploaded by

thuongnguyen.31231022167

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

LINEAR

REGRESSION
Nguyen Quang
quangn@ueh.edu.vn
THE IDEA BEHIND REGRESSION

• We want to relate two different variables – how

does one affect the other?
• Particularly, we want to know how much Y changes
when X increases/decreases by 1 unit.
• In doing so, we need a function in the form
Y = 𝛽𝑋
which lets us know that when X increases by 1 unit, Y
changes by 𝛽.
• Example:
• What is your monthly income?
REGRESSION • How much do you spend on bubble milk tea?
ANALYSIS • Below is the data from a sample of 100 students
• How much more does a student spend on bubble tea each
monthly if his/her income increases by 1 mil. VND?
• How do we find the value of 𝛽 in this case?
REGRESSION • By fitting a line to the data.
ANALYSIS • In particular, we try to find the line of best fit.
• What does best fit mean?
• How do we find the value of 𝛽 in this case?
REGRESSION • By fitting a line to the data.
ANALYSIS • In particular, we try to find the line of best fit.
• What does best fit mean?
REGRESSION
FUNCTION

• Most basic regression does

exactly this
• The method of Ordinary
least squares (OLS) minimizes
the sum of the squared
“distances”
THE LINEAR REGRESSION MODEL (LRM)
• The general form of the LRM model is:
𝑌! = 𝛽" + 𝛽#𝑋#! + 𝛽$𝑋$! + ⋯ + 𝛽% 𝑋%! + 𝑒!
• Or, as written in short form:
𝑌! = 𝛽𝑋 + 𝑒!

• 𝑌 is the regressand, or
dependent/explained variable
• 𝑋 is a vector of regressors, or
independent/explanatory variables
• 𝑒 is an error term/residual.
REGRESSION COEFFICIENTS
𝑌! = 𝛽" + 𝛽# 𝑋#! + 𝛽$ 𝑋$! + ⋯ + 𝛽% 𝑋%! + 𝑒!
• 𝛽" is the intercept/constant
• 𝛽# to 𝛽% are the slope coefficients
• In general, 𝛽 are the regression coefficients or regression parameters. THEY ARE
WHAT WE NEED TO ESTIMATE!
• Each slope coefficient measures the (partial) rate of change in the mean value of 𝑌 for a unit
change in the value of a regressor, ceteris paribus
• Roughly speaking: 𝛽# lets us know when 𝑋# increases by one unit, 𝑌 changes by 𝛽# , other
things (all other Xs) unchanged.
METHOD OF • Method of Ordinary Least Squares (OLS) search
for coefficients that minimizes residual sum of
ORDINARY squares (RSS):

LEAST 𝑅𝑆𝑆 = % 𝑢!"

SQUARES • We need a data set of Y and X to find 𝛽.

• Finding 𝛽 is an optimization problem.
GOODNESS OF FIT: R 2

• 𝑅$, the coefficient of determination, is an overall measure of the goodness of fit of the
estimated regression line.
• 𝑅$ gives the percentage of the total variation in the dependent variable explained by the
regressors:
"
• Explained Sum of Squares 𝐸𝑆𝑆 = ∑ 𝑌) − 𝑌,
• Residual Sum of Squares 𝑅𝑆𝑆 = ∑ 𝑒 "
• Total Sum of Squares 𝑇𝑆𝑆 = ∑ 𝑌 − 𝑌, "

#$$ &$$
• Then: 𝑅" = =1−
%$$ %$$

• It is a value between 0 (no fit) and 1 (perfect fit), higher 𝑅$ indicates better fit.
• When 𝑅$ = 1, 𝑅𝑆𝑆 = 0 and ∑ 𝑒 $ = 0.
• 𝑛 is total number of observations
DEGREE OF • 𝑘 is total number of estimated coefficients
FREEDOM • 𝑑𝑓 for 𝑅𝑆𝑆 = 𝑛 − 𝑘
𝑑𝑓
GOODNESS OF FIT:
R SQUARED ADJUSTED

• 𝑅$ is higher when more regressors are added.

• Sometimes researchers play the game of “maximizing” 𝑅$ (Somebody think the
higher the 𝑅$, the better the model. BUT THIS IS NOT NECESSARILY
TRUE!)
• To avoid this temptation: 𝑅$ should take into account the number of regressors
• Such an 𝑅$ is called an adjusted 𝑅$, denoted as (𝑅$ (R-bar squared), and is
computed from the original (unadjusted) 𝑅$ as follows:
𝑛−1
(𝑅$ = 1 − 1 − 𝑅$
𝑛−𝑘
ILLUSTRATION
DATA
A survey of 20,306 individuals in the US (data file: lrm.xlsx)
• male 1 = male; 2 = female
• age age (year)
• wage wage (US$/hour)
• tenure # years working for current employer
• union 1 = union member, 0 otherwise
• edu years of schooling (years)
• married 1 = married or living together with a partner, 0 otherwise
• race 1 = white; 2 = black; 3 = others
IMPORTING DATA
IMPORTING DATA
PREPARING AND DESCRIBING DATA
DESCRIBING DISCRETE
VARIABLES
• For discrete variables, the mean and standard
deviation do not make a lot of sense.
• We present the frequency of each outcome
instead.
hist(Z$wage, main = "Histogram of wage", xlab = "Wage ($/hour)", col = "yellow", breaks = 100, freq = T)

MORE DETAILED DESCRIPTION: HISTOGRAM

Limit the range of the x axis to (0,100):
hist(Z$wage, main = "Histogram of wage", xlab = "Wage", col = "yellow", breaks = 1000, xlim = c(0,100))

MORE DETAILED DESCRIPTION: HISTOGRAM

SCATTER PLOT
plot(Z$edu,Z$wage, ylab = "Wage (US$/hour)", xlab = "Schooling years")
SCATTER PLOT
plot(Z$age,Z$wage, ylab = "Wage (US$/hour)", xlab = "Age (years)")
SCATTER PLOT
plot(Z$age,Z$wage, ylab = "Wage (US$/hour)", xlab = "Age (years)", ylim = c(0,100))
COMPARING WAGE BETWEEN GROUPS
One more schooling year
REGRESSION results in a wage increase
of about US$2/hour.
RESULTS
REGRESSION
WITHOUT
OUTLIERS
DUMMY
VARIABLE AS
A REGRESSOR
• Coefficient of a dummy
regressor should be
interpreted as the
difference between the two
groups of the dummy
regressor.
QUALITATIVE
REGRESSORS
Dummy variable as a regressor
Transforming categorical variables into dummies
INTRODUCING
CATEGORICAL
VARIABLE
• Recall the categorical variable race:
• race = 1 if white;
• race = 2 if black;
• race = 3 if others.
• How to include this variable in the
wage function?
• We can’t introduce it directly to
the regression function
• Instead, we create a set of
corresponding dummy variables
• White: race = 1
Race • Black: race = 2
Categorization: • Others: race = 3 (all other races)
TRANSFORMING
A CATEGORICAL Dummy • white: 1 if race = 1, 0 if otherwise
VARIABLE TO Variables: • black: 1 if race = 2, 0 if otherwise

DUMMIES
Regression • Include white and black as regressors
Inclusion: • "Others" serves as the base category
THE WAGE
FUNCTION WITH
CATEGORICAL
VARIABLES

• β of white/black indicates
the difference in wage
between white/black and the
base category (“others”).
HYPOTHESIS
TESTING
Testing individual coefficient: t test
Testing multiple coefficients: F test
• To test the following hypothesis:
• 𝐻": 𝛽% = 0
• 𝐻#: 𝛽% ≠ 0

TESTING • Calculate the following and use the 𝑡 table to

obtain the critical value with 𝑛 − 𝑘 degrees of
INDIVIDUAL freedom for a given level of significance (𝛼):
COEFFICIENT: 𝛽'0
T-TEST 𝑡=
𝑠𝑒 𝛽'0
• If this value is greater than the critical 𝑡 value,
we can reject 𝐻0.
TESTING INDIVIDUAL COEFFICIENT: T TEST

• If 𝑡&& > 𝑡!,()% Reject 𝐻" at level of significance of 𝛼

• If 𝑃*+,-. < 𝛼 Reject 𝐻" at level of significance of 𝛼

TESTING
INDIVIDUAL
COEFFICIENT:
T TEST
The hypothesis that
schooling years has no
impact on wage is rejected
at 10%.
TESTING MULTIPLE COEFFICIENTS: F-TEST

• Step 1: Form hypotheses

• H0: 𝛽%# = 𝛽%$ = ⋯ = 𝛽%/ = 0
• Ha: At least one of the tested βs is different from 0

• Step 2: Calculate test statistic 𝐹

(𝑅𝑆𝑆& − 𝑅𝑆𝑆' )/(𝑑𝑓& − 𝑑𝑓' )
𝐹=
𝑅𝑆𝑆' /𝑑𝑓'
𝑑𝑓' = 𝑛 − 𝑘 𝑑𝑓& = 𝑛 − 𝑚 𝑑𝑓& − 𝑑𝑓' = 𝑘 − 𝑚
TESTING MULTIPLE COEFFICIENTS: F-TEST

• Step 3: Determine the critical value

∗
𝐹%&',)&% (𝛼)
• (𝑘 − 𝑚) degrees of freedom for nominator
• (𝑛 − 𝑘) degrees of freedom for denominator

• Step 4: Decide, reject 𝐻" (at the significance level of 𝛼) if

• 𝐹(( > 𝐹 ∗, 𝑜𝑟
• 𝑝 − 𝑣𝑎𝑙𝑢𝑒 = 𝑃 𝐹 > 𝐹(( < 𝛼
TESTING MULTIPLE
COEFFICIENTS:
F-TEST

• The hypothesis that the

coefficients of male, married
and age are simultaneously
equal to zero is rejected at 10%.
F TEST FOR
OVERALL
SIGNIFICANCE

• … is the F-test for the

null hypothesis that all
slopes are equal to zero
simultaneously.
• The hypothesis that all
coefficients are equal to
zero simultaneously is
rejected at 10%.
1. Linear in parameters
2. Full rank
ASSUMPTIONS 3. Regressors X are fixed (non-stochastic)
OF THE 4. Exogeneity of X
CLASSICAL 5. Normal distribution of the error term
LRM 6. Homoskedasticity of the error term
7. No autocorrelation
8. No specification error
• A1: Model is linear in the parameters
• A2: The number of observations must be
greater than the number of parameters, and no
ASSUMPTIONS perfect multicollinearity, or no perfect linear
OF CLASSICAL relationships among the 𝑋 variables.
LRM • A3: Regressors 𝑋𝑠 are fixed or nonstochastic
• A4: No correlation between 𝑋 and 𝑒, or
E e𝑋 = 0
• A5: Given 𝑋, the expected value of the error
term is zero, or 𝐸 𝑒𝑖 𝑋 = 0 and follow
𝑁(0, 𝜎 F ).
ASSUMPTIONS • A6: Homoskedastic, or constant, variance of
OF CLASSICAL 𝑢G . Or 𝑣𝑎𝑟 𝑢G 𝑋 = 𝜎 F is a constant.
LRM • A7: No autocorrelation 𝑐𝑜𝑣(𝑢G , 𝑢H |𝑋) =
0, 𝑖 ≠ 𝑗.
• A8: No specification bias.
• On the basis of assumptions A1 to A8, the
OLS method gives best linear unbiased estimators
(BLUE):
GAUSS – • (1) Estimators are linear functions of the
dependent variable Y.
MARKOV • (2) The estimators are unbiased; in repeated
THEOREM applications of the method, the estimators
approach their true values.
• (3) In the class of linear estimators, OLS
estimators have minimum variance; i.e., they are
efficient, or the “best” estimators.

Econometric S Cheat Sheet
No ratings yet
Econometric S Cheat Sheet
3 pages
Forecasting and Demand Management PDF
100% (1)
Forecasting and Demand Management PDF
39 pages
Lecture 3 - LRM
No ratings yet
Lecture 3 - LRM
40 pages
Linear Regression Model: Topic 2
No ratings yet
Linear Regression Model: Topic 2
49 pages
Multiple Regression
No ratings yet
Multiple Regression
49 pages
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
No ratings yet
Review: Multiple Regression: Holding The Other Explanatory Variables Constant or Fixed
7 pages
Linear Regression Model: Man - PN@VNP - Edu.vn
No ratings yet
Linear Regression Model: Man - PN@VNP - Edu.vn
77 pages
Chap3 - Multiple Regression
No ratings yet
Chap3 - Multiple Regression
56 pages
slides-1-iu
No ratings yet
slides-1-iu
45 pages
1-Chap II Econometrics ABC DR Mitiku
No ratings yet
1-Chap II Econometrics ABC DR Mitiku
80 pages
Econometric estimation BETA
No ratings yet
Econometric estimation BETA
36 pages
Week 8 - 10
No ratings yet
Week 8 - 10
72 pages
CH-3
No ratings yet
CH-3
123 pages
Lecture 2-3
No ratings yet
Lecture 2-3
8 pages
economatrics 3
No ratings yet
economatrics 3
32 pages
Regn_lect_5
No ratings yet
Regn_lect_5
9 pages
CH - 05 - Further Issues - TQT
No ratings yet
CH - 05 - Further Issues - TQT
35 pages
Ch3_slides_Ed4_2024_20(1)
No ratings yet
Ch3_slides_Ed4_2024_20(1)
72 pages
Ch3 Slides Ed4 2024
No ratings yet
Ch3 Slides Ed4 2024
72 pages
Week 2 - The Simple Linear Regression Model PDF
No ratings yet
Week 2 - The Simple Linear Regression Model PDF
47 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
42 pages
Lecture 6
No ratings yet
Lecture 6
19 pages
Emet2007 Notes
No ratings yet
Emet2007 Notes
6 pages
Chapter 3 - Classical Simple Linear Regression
No ratings yet
Chapter 3 - Classical Simple Linear Regression
52 pages
Chapter Two: Bivariate Regression Mode
100% (1)
Chapter Two: Bivariate Regression Mode
54 pages
Regression: Dr. Agustinus Suryantoro, M.S
No ratings yet
Regression: Dr. Agustinus Suryantoro, M.S
31 pages
Short - Notes - Econometric Methods
No ratings yet
Short - Notes - Econometric Methods
22 pages
Introduction To Econometrics - Summary
No ratings yet
Introduction To Econometrics - Summary
23 pages
Regression Analysis (Simple)
100% (1)
Regression Analysis (Simple)
8 pages
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
No ratings yet
Regression Analysis Estimation and Interpretation of Regression Equation Dummy Independent Variable
39 pages
Regression With One Regressor
No ratings yet
Regression With One Regressor
25 pages
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
No ratings yet
BST 32202 LINEAR REGRESSION 6 SLR ASSUMPTIONS LSE
20 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Multiple regression
No ratings yet
Multiple regression
14 pages
Lecture 2
No ratings yet
Lecture 2
39 pages
Session 11 - Correlation and Regression
No ratings yet
Session 11 - Correlation and Regression
28 pages
Welcome To The Course: Financial Econometrics I
No ratings yet
Welcome To The Course: Financial Econometrics I
14 pages
Ordinary Least Squares
No ratings yet
Ordinary Least Squares
21 pages
ECO 401 Econometrics: SI 2021 Week 2, 14 September
100% (1)
ECO 401 Econometrics: SI 2021 Week 2, 14 September
47 pages
Chapter 1 Article
No ratings yet
Chapter 1 Article
9 pages
WEEK2 Simple Regression
No ratings yet
WEEK2 Simple Regression
133 pages
Econometrics Practical
No ratings yet
Econometrics Practical
13 pages
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
No ratings yet
EC226 - Econometrics (Revision Guide - Simple Linear Regression)
9 pages
CH - 02 - Simple Linear Regression - TQT
No ratings yet
CH - 02 - Simple Linear Regression - TQT
61 pages
Lecture 2 Simple Regression Model
100% (1)
Lecture 2 Simple Regression Model
47 pages
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
MKTG 4110 Class 6
No ratings yet
MKTG 4110 Class 6
10 pages
What Is Multiple Linear Regression
No ratings yet
What Is Multiple Linear Regression
23 pages
Topic 2
No ratings yet
Topic 2
23 pages
Basic Regression Analysis
No ratings yet
Basic Regression Analysis
5 pages
Topic 3a
No ratings yet
Topic 3a
64 pages
Choosing A Functional Form
No ratings yet
Choosing A Functional Form
8 pages
C1 English
No ratings yet
C1 English
26 pages
Text - On - Class Econometrics
No ratings yet
Text - On - Class Econometrics
17 pages
Midterm 2 Nem Veg Leges
No ratings yet
Midterm 2 Nem Veg Leges
9 pages
Bus 173 - Lecture 5
No ratings yet
Bus 173 - Lecture 5
38 pages
Oversikt ECN402
No ratings yet
Oversikt ECN402
40 pages
STAT 445 Regression Analysis
No ratings yet
STAT 445 Regression Analysis
49 pages
Regression Analysis
No ratings yet
Regression Analysis
65 pages
Correlation and Regression: Six Sigma Thinking, #8
From Everand
Correlation and Regression: Six Sigma Thinking, #8
Sumeet Savant
5/5 (1)
GCSE Maths Revision: Cheeky Revision Shortcuts
From Everand
GCSE Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (2)
Lecture 7 - Binary
No ratings yet
Lecture 7 - Binary
45 pages
Lecture 3 - Functional Forms
No ratings yet
Lecture 3 - Functional Forms
31 pages
Lecture 4 - Multicolinearity
No ratings yet
Lecture 4 - Multicolinearity
24 pages
Nguyen and Nguyen 2021
No ratings yet
Nguyen and Nguyen 2021
19 pages
GAC010 AE2 Pinkert
No ratings yet
GAC010 AE2 Pinkert
10 pages
F-Ratio Table ANOVA
No ratings yet
F-Ratio Table ANOVA
5 pages
Stat Lesson 1 PDF
100% (1)
Stat Lesson 1 PDF
19 pages
Anova and T Test
No ratings yet
Anova and T Test
5 pages
Saheed's Hypothesis Test
No ratings yet
Saheed's Hypothesis Test
3 pages
Section 2 - Descriptive Multivariate Statistics
No ratings yet
Section 2 - Descriptive Multivariate Statistics
9 pages
Enterprise Risk Management in The Nigerian Insurance Industry
No ratings yet
Enterprise Risk Management in The Nigerian Insurance Industry
7 pages
Correlation Coefficent Worksheet
No ratings yet
Correlation Coefficent Worksheet
4 pages
Laboratory Exercise No 3B
No ratings yet
Laboratory Exercise No 3B
6 pages
W. Interval Estimates For Population Proportion PDF
No ratings yet
W. Interval Estimates For Population Proportion PDF
24 pages
2012 - Dawson - Dichotomizing Continuous Variables in Statistical Analysis - A Practice To Avoid
No ratings yet
2012 - Dawson - Dichotomizing Continuous Variables in Statistical Analysis - A Practice To Avoid
2 pages
Linear Restrictions Using Matrix Approach
No ratings yet
Linear Restrictions Using Matrix Approach
6 pages
Healy 2018
No ratings yet
Healy 2018
16 pages
June 2014 IAL QP - S1 Edexcel
No ratings yet
June 2014 IAL QP - S1 Edexcel
24 pages
BRM MCQ
50% (2)
BRM MCQ
44 pages
Theoretical Study of Rock Mass Investigation Efficiency: Technical Report
No ratings yet
Theoretical Study of Rock Mass Investigation Efficiency: Technical Report
239 pages
Statistical Methods in HYDROLOGY
100% (3)
Statistical Methods in HYDROLOGY
516 pages
Chi Square Analysis
No ratings yet
Chi Square Analysis
18 pages
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
No ratings yet
Analisis Risiko Produksi Usahatani Bawang Merah Di Desa Petak Kecamatan Bagor Kabupaten Nganjuk
17 pages
التطبيقات الالكترونية و دورها في تحقيق جودة الخدمة - دراسة حالة مؤسسة بريد الجزائر بالجلفة
No ratings yet
التطبيقات الالكترونية و دورها في تحقيق جودة الخدمة - دراسة حالة مؤسسة بريد الجزائر بالجلفة
15 pages
Medical Biostatistics 2
No ratings yet
Medical Biostatistics 2
278 pages
Steps 1-10 Constructing A Frequency Distribution Table
No ratings yet
Steps 1-10 Constructing A Frequency Distribution Table
7 pages
Hands On Machine Learning 3 Edition
No ratings yet
Hands On Machine Learning 3 Edition
31 pages
Fatimah Az Zahro - Stat LT A - Week 4
No ratings yet
Fatimah Az Zahro - Stat LT A - Week 4
8 pages
MBA Business Statistics Assignment 1 and 2 Stat
No ratings yet
MBA Business Statistics Assignment 1 and 2 Stat
6 pages
Nashvzucc
No ratings yet
Nashvzucc
6 pages
Hayashi ch3 4 - GMM
No ratings yet
Hayashi ch3 4 - GMM
31 pages
F Test T Test Chi Square Test
No ratings yet
F Test T Test Chi Square Test
6 pages
Sta111 (2014) Exams
No ratings yet
Sta111 (2014) Exams
1 page

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 2 - LRM

Uploaded by

Lecture 2 - LRM

Uploaded by

LINEAR

• We want to relate two different variables – how

• Most basic regression does

LEAST 𝑅𝑆𝑆 = % 𝑢!"

SQUARES • We need a data set of Y and X to find 𝛽.

• 𝑅$ is higher when more regressors are added.

MORE DETAILED DESCRIPTION: HISTOGRAM

MORE DETAILED DESCRIPTION: HISTOGRAM

TESTING • Calculate the following and use the 𝑡 table to

• If 𝑡&& > 𝑡!,()% Reject 𝐻" at level of significance of 𝛼

• If 𝑃*+,-. < 𝛼 Reject 𝐻" at level of significance of 𝛼

• Step 1: Form hypotheses

• Step 2: Calculate test statistic 𝐹

• Step 3: Determine the critical value

• Step 4: Decide, reject 𝐻" (at the significance level of 𝛼) if

• The hypothesis that the

• … is the F-test for the

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.