Week 5 Notes

Introduction
• Regression modelling: background and motivation

• Types of data
• Ordinary least square (OLS) estimation
• Simple linear regression
• Multiple linear regression
Introduction
• Key CLRM assumptions
• Violation of CLRM assumptions
• BLUE properties of OLS estimators
• Hypothesis testing with regression modelling
• Other non-linear functional forms
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
Background and Motivation

Amazon, Netflix Movie Recommendations
Filtering SPAM from Mail and Messages
Big Data Text Analysis
Summary
Making the Computers Learn Without Being Explicitly Programmed
• Amazon, Netflix movie recommendations
• Filtering out spams
• Medical prognosis with health records
• Algorithmic trading, credit scoring models
• Making computers think like humans
• Handwriting recognition, natural language processing, web-click
data
Machine Learning Algorithms

Supervised Learning
House price prediction problem (Regression problem)

Supervised Learning
Credit default scoring (classification problem)

Unsupervised Learning
Clustering problem (clustering problem: market segmentation)

Unsupervised Learning
Clustering problem (clustering problem : news aggregation)

Summary
• Supervised learning algorithm comprise data with features and
labels
• The algorithm is trained map the relationship between the
features and labels
• Then it makes predictions/create-label on the new unlabeled data
based on its features
• The unsupervised learnings algorithms comprise unlabeled data
that only carries features
• The data is clustered in groups based on these features
Types of Data
Types of Data
Cross-sectional data: Observations about multiple individuals (units)

collected over a single period
Time-series data: Observations of a single individual (unit) collected
over multiple periods
Panel or longitudinal data: Observations about multiple individuals
(units) collected over various time periods
Types of Data
If information about A is collected over times t1, t2, t3 then it is time-series data
Types of Data
If information about A, B, C, D, E, and F is collected at t1, then it is cross-
sectional data
Types of Data
If information about A, B, C, D, E, and F is collected at t1, t2, t3, etc. then it is

panel/longitudinal data
Summary
Introduction to Simple Linear

Regression
Introduction to Simple Linear Regression
Consider a simple linear regression model provided :
Y = β0 + β1 X + u
• This is also a two-variable linear regression model or bivariate
linear regression model
• Here ‘Y’ is the dependent /explained /response
/predicted/regressand variable
• Here ‘X’ is the independent/explanatory/predictor/regressor
variable
Introduction to Simple Linear Regression
Consider a simple linear regression model provided :

Y = β0 + β1 X + u
• ‘u’ is the error term, residual term or disturbance term that
represents unobserved factors other than ‘X’ that affect ‘Y’; since
‘u’ is also the random or stochastic variable it has a probabilistic
distribution
• Here, 𝛽0 is the constant term and 𝛽1 is called the slope term
(Why?)
• This simple model aims to study the dependence of Y on X
Regression vs. Causation vs. Correlation
While regression deals with the dependence of one variable over another, it
does not imply causation
• Regression only establishes the statistical strength of the relation, the
causation is established by theory
Example of crop and rain
• A priori theoretical considerations are needed to imply causation
• In regression analysis, dependent variable is considered random or
stochastic (i.e., with probability distribution), while explanatory variable is
assumed to have fixed values
Regression vs. Causation vs. Correlation
A closely associated concept is correlation, which establishes the

degree of linear relationship between the two variables
• In correlation analysis, both the variables are treated in a similar
manner and considered to be random
Summary
Expectations Operator
Expectations Operator ‘E’
Any random probabilistic variable is often represented through

expectations operator
• Any random variable attains multiple values. For example, a
coin-toss can obtain two values with 50% odds for any outcome
• Similarly, in regression any random value is assumed to be
probabilistic in nature and its expected value is represented by E(Y)
Expectations Operator ‘E’
For example, if there are ‘n’ possibilities of an event, y1, y2, y3,..,yn each with
possibilities p1, p2, p3, p4…pn, then expectations operator is defined as
• E(y)= 𝑝1 ∗ 𝑦1 + 𝑝2 ∗ 𝑦2 +𝑝3 ∗ 𝑦3 +……+𝑝4 ∗ 𝑦4
• This is also called probability weighted mean
1
• If all the probabilities are assumed to be equal then 𝑝1 =𝑝2 =…..=𝑝𝑛 =
𝑛
1
• Then E(y)= (𝑦1 +𝑦2 +𝑦3 +….+𝑦𝑛 ), i.e., simple average of Y’s
𝑛
Summary
• We discussed the role of expectations operator (E) in the context
of stochastic random variable with a probability distribution
• In simple terms, expectations are probability weighted averages
of stochastic random variable
• In case there is no a priori probabilities assigned to these
variables, then the expectation is simple average of the stochastic
random variable
A Simple Example
A Simple Example
Consider a simple example of family income and consumption expenditure

shown below
Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

A Simple Example
Here population of 60 families is divided into 10 income (X) groups

from 80-260 (independent or fixed variable)
• The corresponding consumption expenditure values (Y) are also
shown
• For each given level of income (X), the conditional means E(Y/X)
that is mean of Y for a given level of X is also provided
A Simple Example
For example, at X= 80, the mean of Y is 65, i.e., E(Y/X=80)=65;

these are called conditional expectations or conditional means of Y
given the value of X
• As they depend on the conditioning variable X
• The average of all Y’s, that is unconditional mean or unconditional
expected value E(Y)= 121.2
A Simple Example
This unconditional mean does not account for the level of income
(X) and is the prediction of Y (expected value) when there is no
knowledge of X
• However, if one has the knowledge of X, then one can improve
the prediction by computing conditional mean of Y, i.e., E(Y/X),
which is a more accurate prediction of Y
A Simple Example
Que: What is the best (mean) prediction of weekly expenditure of

families with a weekly income of X= 140: Y= 101
• Thus the knowledge of the income level may enable us to better
predict the mean value of consumption expenditure than if we do
not have that knowledge
• This is the essence of regression modelling
A Simple Example


A Simple Example


Population and Sample Regression

Function
Concept of Population Regression
Function
If we join these conditional mean values, we obtain what is known
as the population regression line (PRL)
• More simply, it is the regression of Y on X
• Geometrically, then, a population regression curve is simply the
locus of the conditional means of the dependent variable for the
fixed values of the explanatory variable
Function
Population regression function (PRF)
• 𝐸(𝑌/𝑋𝑖 )=𝑓(𝑋𝑖 )
• In this case, 𝑓(𝑋𝑖 ) is a linear function of X
• The expression is also called population regression function
Function
More generally, for a two variable case: 𝐸(𝑌/𝑋𝑖 )=𝛽0 + 𝛽1 𝑋𝑖
• Here it is important to note that linearity means linearity in
parameters
• 𝐸(𝑌/𝑋𝑖 )=𝛽0 + 𝛽12 𝑋𝑖 ; this model is non-linear in parameters and
will not be handled in linear regression modelling
• 𝑬(𝒀/𝑿𝒊 )=𝜷𝟎 + 𝜷𝟏 𝑿𝟐𝒊 , in contrast this model is non-linear in
variables and can be handled under linear regression models
Sample Regression Function (SRF)
Sample regression function is shown by adding “^” hat symbol,

indicating the estimated values: 𝑌 ෢0 + 𝛽
෡𝑖 = 𝛽 ෢1 𝑋𝑖
෢0 is the estimator of 𝛽0 ; 𝛽
• 𝛽 ෢1 is the estimator of 𝛽1 , and 𝑌෡𝑖 is the
estimator of 𝑦𝑖
• SRF is only an estimate of PRF
• Thus, SRF can over or underestimate PRF values
Sample Regression Function (SRF)
෢0 + 𝛽
Sample regression function (SRF): 𝑌෡𝑖 = 𝛽 ෢1 𝑋𝑖

Ordinary Least Square (OLS)

Estimation
Method of Ordinary Least Square
Estimation (OLS)
Recall the SRF function: 𝑌𝑖 =𝛽 ෢0 + 𝛽
෢1 𝑋𝑖 +ෞ ෢0 + 𝛽
𝜇𝑖 ; where 𝑌෡𝑖 =𝛽 ෢1 𝑋𝑖
෢0 − 𝛽
• Here, 𝜇ෞ𝑖 =𝑌𝑖 -𝑌෡𝑖 = 𝑌𝑖 - 𝛽 ෢1 𝑋𝑖
• The line fit should aim to minimize the square this error 𝜇ෞ𝑖
• Concept of OLS suggests that the best cost function to minimize
is as follows
Estimation (OLS)
෢0 + 𝛽
Recall the SRF function: 𝑌𝑖 =𝛽 ෢1 𝑋𝑖 +ෞ ෢0 + 𝛽
𝜇𝑖 ; where 𝑌෡𝑖 =𝛽 ෢1 𝑋𝑖
෢2 =σ 𝑌 − 𝑌෡ 2 2
• σµ 𝑖 𝑖
෢0 − 𝛽
= σ 𝑌𝑖 − 𝛽 ෢1 𝑋𝑖
𝑖
• That is we minimize squared residuals (why not just residuals or
absolute residuals)
Estimation (OLS)
෢2 =σ 𝑌 − 𝑌෡ 2 2
• σµ 𝑖 𝑖
෢ ෢
= σ 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋𝑖 : Minimize these squared
𝑖
residuals

Estimation (OLS)
෢2 =σ 𝑌 − 𝑌෡ 2 2
σµ 𝑖 𝑖
෢0 − 𝛽
= σ 𝑌𝑖 − 𝛽 ෢1 𝑋𝑖
𝑖
෢2 = 𝑓(𝛽
• Obvious to note here that σ µ ෢0 , 𝛽
෢1 )
𝑖
෢2 =0 that satisfies and double differential
• Setting differential of σ µ 𝑖
to positive for minima condition, one obtains the estimates that is,
෢0 and 𝛽
𝛽 ෢1
Estimation (OLS)
෢2 =σ 𝑌 − 𝑌෡ 2 2
σµ 𝑖 𝑖
෢0 − 𝛽
= σ 𝑌𝑖 − 𝛽 ෢1 𝑋𝑖
𝑖
• Thus, these estimators are called least square estimators
• The regression model such estimated is also called the Gaussian,
standard, or classical linear regression model (CLRM),
Estimation (OLS)
෢2 =σ 𝑌 − 𝑌 2 2
σµ 𝑖
෡𝑖 ෢0 − 𝛽
= σ 𝑌𝑖 − 𝛽 ෢1 𝑋𝑖
𝑖
𝜕 σ µ෢2𝑖
• ෢0 + β
= −2 σ Yi − β ෢1 Xi =-2σ µෝi ෢0 )
(partial differential w.r.t. to 𝛽
෢0
𝜕𝛽
𝜕 σ µ෢2𝑖
• ෢0 + β
= −2 σ Yi − β ෢1 Xi Xi =-2σ µෝi Xi (partial differential w.r.t. to
෢1
𝜕𝛽
෢1 )
𝛽
Estimation (OLS)
෢2 =σ 𝑌 − 𝑌෡ 2 2
σµ 𝑖 𝑖
෢0 − 𝛽
= σ 𝑌𝑖 − 𝛽 ෢1 𝑋𝑖
𝑖
σ(𝑿𝒊 −𝑿ഥ ) 𝒀𝒊 −𝒀
ഥ
• ෢𝟏 =
𝜷 and ෢𝟎 = 𝒀
𝜷 ෢𝟏 𝑿
ഥ -𝜷 ഥ
𝑿 𝒊 −𝑿ഥ 𝟐
• However, to achieve this closed form solution CLRM-OLS makes

certain assumptions
Summary
෢2 =σ 𝑌 − 𝑌෡ 2 2
• σµ 𝑖 𝑖
෢ ෢
= σ 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋𝑖 : Minimize these squared
𝑖
residuals

Introduction to Multiple Linear

Regression
Introduction to Multiple Linear Regression
We can generalize the two variable problem into multiple linear

regression as 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 … . +𝜷𝒏 𝑿𝒏 + 𝒖
• 𝑿′𝒊 𝒔 represent the explanatory variables
• Here the coefficients 𝜷𝟏 , 𝜷𝟐 ,…, 𝜷𝒏 are called the partial
regression coefficients
Multiple linear regression 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 … . +𝜷𝒏 𝑿𝒏 + 𝒖

• Other aspects of the regression remain the same, including the
properties of the error term, that is, 𝒖
• Zero conditional mean of error: 𝐸 𝑢𝑖 𝑋1𝑖 , 𝑋2𝑖 , … , 𝑋𝑛𝑖 = 0 for each ‘i’
• No serial correlation: 𝑐𝑜𝑣 𝑢𝑖 , 𝑢𝑗 = 0; Homoscedasticity: 𝑣𝑎𝑟 𝑢𝑖 = 𝜎 2

• Zero correlation (or covariance) between 𝑢𝑖 and X:
𝑐𝑜𝑣 𝑢𝑖 , 𝑋1 =𝑐𝑜𝑣 𝑢𝑖 , 𝑋2 =𝑐𝑜𝑣 𝑢𝑖 , 𝑋𝑛 =0
• The model is correctly specified

• Lastly, one more condition is added; that is no exact linear
relationship between 𝑋𝑖 and 𝑋𝑗 s (𝑋1 , 𝑋2 …𝑋𝑛 ): 𝛼1 𝑋1 +
𝛼2 𝑋2 +𝛼3 𝑋3 +…𝛼𝑛 𝑋𝑛 ≠0
• If such a relationship exists, the model will be affected by the
problem of perfect multicollinearity, and will not run (i.e.,
indeterminate)
Introduction to Multiple Linear regression
However, there may be instances of less than perfect collinearity

across variables and can affect the estimation
• If the multicollinearity is not perfect
but high, then the estimators have
large variances (standard errors of
estimate)

Introduction to Multiple Linear regression
However, there may be instances of less than perfect collinearity

across variables and can affect the estimation
• This makes the ‘t-values’ low and
high chances of failure to reject the
null-hypothesis (wider confidence
intervals), even though the 𝑅2 may
be high

Summary
• We discussed multiple linear regression model
• All the properties and discussions on simple linear regression
model apply to multiple linear regression model
• Some important properties of simple and multiple linear
regression included: (a) zero conditional mean of the error term;
(b) error term should not be serially correlated; (c) variance of the
error term should be constant: Homoscedasticity; (d) no
correlation between the error term and the independent variable;
(e) model should be correctly specified; (f) multicollinearity should
be low
Interpreting the Multiple Linear

Regression
Regression
Similar to the two variable regression, the following expression
𝐸 𝑌 𝑋1 … . 𝑋𝑛 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 … . +𝛽𝑛 𝑋𝑛
• Represents the conditional mean or expected value of Y given the
fixed values of all the 𝑋𝑖′ 𝑠
• The partial coefficient 𝛽1 is the effect of 𝑋1 on Y, net of any effect
from other explanatory variables (𝑋𝑖′ 𝑠 ), or in other words, keeping
all the 𝑋𝑖′ 𝑠 constant
Regression
Similar to the two variable regression, the following expression
𝐸 𝑌 𝑋1 … . 𝑋𝑛 = 𝛽0 + 𝛽1 𝑋1 + 𝛽2 𝑋2 … . +𝛽𝑛 𝑋𝑛
𝐸𝑆𝑆 𝑅𝑆𝑆
• The definition of 𝑅2 = =1− , which is same as earlier
𝑇𝑆𝑆 𝑇𝑆𝑆
𝑅𝑆𝑆
(𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆)
• One also calculates adjusted-𝑅 =1 − 2 𝑛−𝑘
𝑇𝑆𝑆 =1 − ;
(𝑀𝑆𝑆 𝑜𝑓 𝑇𝑆𝑆)
𝑛−1
remember the dfs?
Regression
𝑅𝑆𝑆
(𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆)
Adjusted-𝑅2 =1 − 𝑛−𝑘
𝑇𝑆𝑆 =1 −
(𝑀𝑆𝑆 𝑜𝑓 𝑇𝑆𝑆)
𝑛−1
• Or Adjusted-𝑅2 =1-(1-𝑅2 )*(n-1)/(n-k)

• Adjusted-𝑅2 penalizes addition of more variables. So if the 𝑅2 is
inflated just by adding the number of variables, rather than their
quality, then Adjusted-𝑅2 can identify the same
Regression
෢0 , 𝛽
In the OLS estimation each parameter (𝛽 ෢1 ) is estimated with
some error
• The square-root of the variance of the estimated parameter
indicates that error in estimation or the precession of the estimate
Summary
• The interpretation of multiple linear regression model broadly remains similar
to the bivariate regression model
• The coefficients are partial coefficients that measure the impact of
independent variable on dependent variable, keeping other variables
constant
• The explanatory power of the model is measured using the 𝑅2 measure
• An improvement of over the 𝑅2 measure is adjusted- 𝑅2 measure which
penalizes the addition of variables in the model
• Lower standard errors of the coefficients increases the power and efficiency
of the mode
• OLS estimators are the best estimators in the class of linear estimators
Key CLRM Assumptions

The Gaussian, standard, or classical linear regression model (CLRM), makes

10 key assumptions
Linear in Parameters
• Assumption 1: The regression model is
෢0 , 𝛽
linear in the parameters (𝛽 ෢1 ,….)

Assumption 2: Values taken by the regressor X are considered fixed

in repeated samples. More technically, X is assumed to be non-
stochastic
Assumption 3: Zero conditional mean of disturbance (𝒖𝒊 ): given the

value of X, the mean, or expected, value of the random disturbance
term 𝑢𝑖 is zero. 𝐸(𝑢𝑖 /𝑋𝑖 )=0

Assumption 4: Homoscedasticity or equal variance of 𝑢𝑖 . Given the value of X,

the variance of 𝑢𝑖 is the same for all observations. That is, the conditional
variances of 𝑢𝑖 are identical. 𝑣𝑎𝑟(𝑢𝑖 /𝑥𝑖 )=𝐸 𝑢𝑖 − 𝐸 𝑢𝑖 𝑋𝑖 2 = 𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 = 𝜎 2
• Heteroscedastic variance= 𝑣𝑎𝑟(𝑢𝑖 /𝑥𝑖 )=𝜎𝑖2
Homoscedasticity Heteroscedasticity

Assumption 5: No autocorrelation between the disturbances. Given

any two X values, Xi and Xj (i ≠ j), the correlation between any two
𝑢𝑖 and 𝑢𝑗 (i ≠ j) is zero.
2
Symbolically, 𝐶𝑜𝑣 𝑢𝑖 , 𝑢𝑗 𝑋𝑖 , 𝑋𝑗 )=𝐸 [𝑢𝑖 −𝐸 𝑢𝑖 |𝑋𝑖 ][𝑢𝑗 −𝐸 𝑢𝑗 |𝑋𝑗 ]
= 𝐸[ 𝑢𝑖 𝑋𝑖 𝑢𝑗 |𝑋𝑗 ]=0
Assumption 5: No autocorrelation between the disturbances

(a) Positive autocorrelation
(b) negative autocorrelation
(c) No autocorrelation

Assumption 6: Zero covariance between 𝑢𝑖 and 𝑋𝑖 , or E(𝑢𝑖 𝑋𝑖 ) = 0.

• 𝐶𝑜𝑣 𝑢𝑖 , 𝑋𝑖 = 𝐸[(𝑢𝑖 − 𝐸 𝑢𝑖 )(𝑋𝑖 − 𝐸 𝑋𝑖 ]
• By definition: 𝐸 𝑢𝑖 =0; 𝐸(𝑢𝑖 𝑋𝑖 ) − 𝐸 𝑢𝑖 𝐸 𝑋𝑖
• 𝐶𝑜𝑣 𝑢𝑖 , 𝑋𝑖 = E(𝑢𝑖 𝑋𝑖 ) =0
• That is, 𝑢𝑖 and 𝑋𝑖 are not correlated
• Assumption 7: The number of observations must be greater than

the number of parameters to be estimated
• Assumption 8: The X values (independent variable) must have
some finite variance
• Assumption 9: The regression model is correctly specified
• Assumption 10: There is no perfect multicollinearity, i.e., no
perfect linear relationships among the explanatory variables
Summary
• In this video we reviewed and summarized the ten (10) key CLRM
assumptions
BLUE Properties of OLS Estimators

Biased and Unbiased estimators
Efficient and inefficient estimators
Summary
• OLS estimators are BEST in linear class of estimators

• They are best, linear, unbiased, and efficient estimators
• Thus, they are also consistent estimators: for large samples, OLS
estimators converge to true population parameters
Classical Normal Linear Regression

Model (CNLRM) and Hypothesis
testing I
A Few Words on Normal Distribution
Brealey, Myers and Allen; Principles of Corporate Finance. 10th, 11th, or 12th editions. Chapter 8
A Few Words on Normal Distribution
Standard Normal Distribution
Classical Normal Linear Regression Model
(CNLRM)
The estimation of sample parameters is not complete without hypothesis testing
෢0 , 𝛽
(𝛽 ෢1 )
• It is important to draw inferences about population parameters using sample

estimates, more clearly, we would like the estimated parameters to be as
close as possible to population parameters
• It must be noted that the randomness in the beta (coefficient) estimates is
introduced by 𝜇𝑖 (error term): How?
• Thus, these sample coefficient estimates also have a probability distribution
[as one take different samples from population, one gets different estimates]
The Normality Assumption of the Error
Term 𝜇𝑖
To make any inference about the probability distribution of the estimate, we
need to make some assumption about the distribution of the error term 𝜇𝑖
• The CNLRM assumes that 𝜇𝑖 is distributed normally with the following:
• 𝑀𝑒𝑎𝑛 = 𝐸 𝑢𝑖 = 0
• 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸 𝑢𝑖 − 𝐸 𝑢𝑖 2 = 𝐸 𝑢𝑖 2 = σ2
• 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝐸 𝑢𝑖 − 𝐸 𝑢𝑖 𝑢𝑗 − 𝐸 𝑢𝑗 = 𝐸 𝑢𝑖 , 𝑢𝑗 = 0; 𝑖 ≠ 𝑗
• These assumptions are summarised as 𝒖𝒊 ∼ 𝑵(𝟎, 𝝈𝟐 )

Properties of OLS Estimators under
Normality
Normal distributions are very easily defined with just two parameters, i.e.,
mean and variance of the population
• Under the normality assumption, OLS estimates are unbiased,
efficient, consistent (estimates converge to their to population values
as sample size increases) 𝑌𝑖 =𝛽 ෢0 + 𝛽
෢1 𝑋𝑖 +ෞ
𝜇𝑖 ; where 𝑌 ෢0 + 𝛽
෡𝑖 =𝛽 ෢1 𝑋𝑖
෢1 )=𝛽1 ; Variance = 𝑣𝑎𝑟 𝛽 2
෢1 = 𝜎෢ ෢𝟏 ∼ 𝑵(𝜷𝟏 , 𝝈𝟐෢ )
• Mean: 𝐸(𝛽 𝛽
; then 𝜷 𝜷
1 𝟏
Normality
Normal distributions are very easily defined with just two parameters, i.e.,
mean and variance of the population
෢1 −𝛽1
𝛽
• By the properties of standard normal distribution 𝑍 =
σβ෢
1
• Where 𝒁 ∼ 𝑵(𝟎, 𝟏): Z is normally distributed with mean of 0, and SD=1

Normality: Summary
Normal distributions are very easily defined
with just two parameters, i.e., mean and
variance of the population
Mean: 𝑬(𝜷 ෢𝟏 )=𝜷𝟏 ; Variance = 𝒗𝒂𝒓 𝜷

෢𝟏
= 𝝈𝟐𝜷෢ ; then 𝜷෢𝟏 ∼ 𝑵(𝜷𝟏 , 𝝈𝟐෢ )
𝜷
𝟏 𝟏

Classical Normal Linear Regression

Model (CNLRM) and Hypothesis
testing II
Interval Estimation and Hypothesis
Testing
While in repeated sampling the point estimate 𝛽 ෢1 converges to true
෢1 )=𝛽1 , but the accuracy of this point
population parameter, i.e., 𝐸(𝛽
estimate is important: How reliable is this estimate
• This is so because the single estimate differs from true value; this
reliability of the estimate is measured by its standard error
Testing
෢0 , 𝛽
In the OLS estimation each parameter (𝛽 ෢1 ) is estimated with
some error
• The square-root of the variance of the estimated parameter
indicates that error in estimation or the precession of the estimate
Testing
In statistics we configure the confidence interval around the
estimate
Testing
For example, if you hypothesize that the population parameter = 𝛽1 ;
then you set-up a confidence interval [1- α] around the estimate 𝛽1
• If the estimate does not fall in this interval, then you can reject
your hypothesis at 5% significance level
• Practically, you hypothesize that coefficient is zero. That is, the X
variable does not have any impact on the Y variable. Then you
set-up a confidence interval around that zero value
Testing
Then that range can be written as (±𝑡𝛼/2 ); if the estimated value
falls outside this value, then with a given level of confidence
(1- α generally 90%, 95%, or 99%) or significance level 10%, 5%,
or 1% you reject the hypothesis and state that the variable has a
significant relationship
Testing
Null hypothesis: H0: The true population parameter is 𝛽1 (= ‘0’ in
most cases)
• Alternate hypothesis H1: The true population parameter is not 𝛽1
• Decision rule: Construct [1- α] confidence interval for the
population parameter 𝛽1 ; if the estimate falls outside this value,
you reject the null H0 (don’t say you accept the null hypothesis). If
the estimated parameter falls inside this range, you can not reject
the null
Testing
So if you hypothesized that 𝛽1 =0 (i.e., no impact of X on Y) and the
estimate falls in the confidence interval [this is checked by looking
at the t-value of the estimate] then you say that you fail to reject the
null and there is no relationship between X and Y
• What if it falls outside
Testing
Interpretation: (1-α)% of the times the true parameter will fall within the
confidence interval

Other Functional Forms and Non-

linear Transformations
Other Functional Forms and Non-linear
Transformations
𝛽
• Log-linear or log-log model: 𝑌𝑖 = 𝛽1 𝑋𝑖 2 ∗ 𝑒 𝑢𝑖 ; take natural log and
transform the model as below
• ln 𝑌𝑖 = ln 𝛽1 + 𝛽2 ln 𝑋𝑖 +𝑢𝑖 or alternatively
• 𝑌𝑖′ = α + 𝛽2 𝑋𝑖′ +𝑢𝑖 : The model is now linear in parameters α and 𝛽2
• The interpretation goes as follows: 𝛽2 measures percentage
change in 𝑌𝑖 for a given percentage change in 𝑋𝑖
Other Functional Forms and Non-linear
Transformations
• Log-lin model: 𝑌𝑡 = 𝑌0 1 + 𝑟 𝑡 ; take natural log and transform the model as
below
• ln 𝑌𝑡 = 𝛽1 + 𝑡𝛽2
• This is a semi-log model, and 𝛽2 measures proportional change in 𝑌𝑡 for a
given absolute change in 𝑡
• Vice-versa interpretation goes for Lin-log model below (absolute change in
𝒀𝒕 for a % or relative change in 𝑿𝒕 .
• 𝒀𝒕 = 𝜷𝟏 + 𝒍𝒏 𝑿𝒕 𝜷𝟐
Summary and Concluding Remarks

• Among supervised learning algorithms, regression algorithm is a very

important tool employed in the finance domain for applications such as
forecasting security prices or credit scoring
• Regression algorithms can be run with only two variables (one independent
and one independent) : simple regression or with more than two variables:
Multiple regression
• They key variables in a regression include a dependent variable, one or more
independent variables, coefficients of these variables, and an error term
• The error term accounts for the variation in the dependent

variable that can not be explained by the model (independent
variables)
• While regression analysis can provide the statistical significance
of the relationship, the direction of causality should come a priori
from the theoretical underpinnings (rain vs. crop example)
• OLS is the most often employed method to estimate a regression
model, which involves minimizing residual sum of squares
• OLS estimation of regression involves 10 key assumptions

• The most important assumptions include linearity in parameters,
exogeneity of independent variables, zero conditional mean of the
error (residual) term, homoscedasticity of error variances,
absence of multicollinearity, no autocorrelation across error
terms, no correlation between error and dependent variables
• If these assumptions are held then OLS estimators are referred to
as BLUE, that is best linear unbiased and efficient estimates
• The statistical significance of OLS estimators is determined

through hypothesis testing of coefficients individually
• This requires normality assumption of the error (residuals)
• Very often the model is not linear and may require some kind of
transformation to make it linear, which can be subsequently
estimated through OLS
• However, the interpretation of coefficients also change with such
transformations
Thanks!
Introduction
• Application of regression algorithm in prediction of security prices

• ABC case study
• Simple linear regression
• Multiple linear regression
• Summary and concluding remarks
Case Study: ABC Stock Price

Case Study: Sentiment Problem
• Stock price prediction or stock return prediction is an attempt to

determine the future value of a company based on analysis of
factors, which impact its price movement
• There are a number of factors that help in predicting stock prices
• These can be macroeconomic factors like state of the country's
economy, growth rate inflation, etc.
• There are also other factors that are more specific to a stock like
profit margin, debt to equity issues, sales of a company, and so on
So we are given the data for stock market price for ABC company, along with Nifty and Sensex
(market indices). We are also given the data of dividend announcement and a sentiment index
Dividend
Date Price ABC Sensex Sentiment Nifty
Announced
03-01-2000 718.15 0.079925 0.073772 0 0.048936 0.095816
04-01-2000 712.9 -0.00731 0.021562 0 -0.05504 0.009706
05-01-2000 730 0.023987 -0.02441 0 0.019135 -0.03221
06-01-2000 788.35 0.079932 0.012046 0 0.080355 0.011205
07-01-2000 851.4 0.079977 -0.0013 0 0.094038 -0.0004
10-01-2000 919.5 0.079986 0.019191 1 0.015229 0.030168
11-01-2000 880 -0.04296 -0.04025 0 -0.07217 -0.04966
12-01-2000 893.75 0.015625 0.036799 0 0.01396 0.020999
13-01-2000 875 -0.02098 -0.00845 0 0.057518 -0.01164
14-01-2000 891 0.018286 0.004858 1 0.008828 0.020714
17-01-2000 819.75 -0.07997 -0.01228 0 -0.12395 -0.00962
…… …… …… …… …… …… ……
…… …… …… …… …… …… ……
• Consider a portfolio manager who has built a model for a

particular stock
• The manager wants to predict the ABC stock price returns for this
stock using regression model
• The data starts from 2007 and goes till 2019, so we have
approximately 13 years of data
• We have daily returns of ABC or change in price of ABC in
column B. Next, we have daily return on Sensex in column C and
daily return on Nifty in column D.
• Sensex and nifty are the two main stock indices used in India
• They are benchmark Indian stock market indices that represent
the weighted average of the largest Indian companies
• So, Sensex represent average of 30 largest and most actively
traded Indian companies
• Similarly, Nifty represents a weighted average of 50 largest Indian
companies.
• Another variable is dividend announcement in column E, which is

one, if a company has announced dividend on a particular date
and zero otherwise
• So, for example, it is one on January 2, 2007, because the
company ABC announced a dividend on this date and it is zero
for all other days when the company did not announce any
dividend. Notice that this is a dummy variable
• Lastly, we have a sentiment variable in column F. It is a sentiment

score which quantifies how investors feel about ABC
• It can be based upon news analysis or upon option market
analysis or based on some survey
• We would not go into details of the score here and take it as
given. A very high sentiment score represents bullish investors
and vice versa.
Case Study: Problem Statement

The following tasks need to be performed: Part 1

• Data Visualization
• Training the model
• Testing the model
• Evaluate out-of-sample performance of the model
The following tasks need to be performed: Part 2
• Training and testing the model using multiple linear regression
algorithm
• Testing the model
• Examine issues in estimation and how to resolve them
• Evaluate out-of-sample performance of the model
Case Study: Data Input

Introduction
We start with the R implementation of the case study problem

statement
Summary
To summarize the video, first we loaded the relevant packages and
libraries, then we set the working directory, and finally we read the
“ABC” data file in R
In the next video we will try to visualize various properties of the
data
Case Study: Data Input

Introduction
We start with the R implementation of the case study problem

statement
Summary
To summarize the video, first we will loaded the relevant packages
and libraries, then we set the working directory, and finally we read
the “ABC” data file in R
In the next video we will try to visualize various properties of the
data
Case Study: Data Visualization

Introduction
In this video we will examine the key variables in the data through
visualization
We will visualize the returns on ABC and Nifty
We will also visualize the cumulative returns for ABC and Nifty
Summary
To summarize the video, we visualized the returns and cumulative
returns for ABC and Nifty returns using R programming
Case Study: Data Visualization

Introduction
In this video we will examine the key variables in the data through
visualization
We will visualize the returns on ABC and Nifty
We will also visualize the cumulative returns for ABC and Nifty
Summary
To summarize the video, we visualized the returns and cumulative
returns for ABC and Nifty returns using R programming
In the next video, we will examine the summary measures
Case Study: Data Summary

Introduction
In this video, we will discuss the basic properties of the data and
summary measures
Summary
To summarize the video, first we summarized the key return
variables
Next we plotted the density distribution of these variables
We noted that ABC returns are heavily skewed towards the left
Case Study: Normality

Introduction
In this video we will examine the normality of the data

This includes examination of skewness and kurtosis measures
Summary
To summarize the video, we computed the skewness and kurtosis
measures for the data
Data appears to be left skewed
Then we also examined the statistical significance of the skewness,
kurtosis measures and also conducted the Jarque-Bera test of
normality
Case Study: Stationarity

Introduction
In this video, we will discuss the stationarity property and conduct

the examination of data stationarity in R
Summary
To summarize the video, we conducted tests of data stationarity
These included ADF, PP, and KPSS tests
We found that data is stationary
Case Study: Training and Test Data

Introduction
In this video, we will segregate the “ABC” data into training and test
data
Training data is employed to train the linear regression algorithm
Test data is employed to test the out-of-sample forecasting
efficiency of the algorithm
Summary
To summarize the video, we segregated our data in two segments
The training data included observations from the year 01-Jan-2007
to 01-Dec-2017, comprising 2850 observations
The test data included observations from the year 04-Jan-2017
onwards, comprising 478 observations
Training the Simple Linear Regression

(SLR) Algorithm
Introduction
In this video, we will train a simple linear regression algorithm by

regressing ABC returns on Nifty returns
Summary
To summarize the video, we examined the relationship between
ABC returns and Nifty returns
To this end, we trained a simple linear regression algorithm
We also reviewed the output of the regression model; we found a
significant coefficient for Nifty
We also noted that the model explains around 10.87% variation in
ABC returns
Training the model: Residual

Diagnostics
Introduction
In this video, we will perform the residual diagnostics of the simple

linear regression model build using training dataset
Summary
To summarize the video, we conducted residual diagnostics of the
trained model
First, we plotted the density plot of the raw residuals and
studentized residuals
Next, we checked the normality of the residuals with the help of
qqplot
We also conducted the outlier test
We found certain outliers through these methods; these outliers can
be removed to improve the model estimates
Training the model: Heteroscedasticity

Introduction
In this video, we will examine the econometric issue of

heteroscedasticity or non-constant variance of error terms that
afflicts the estimation
Summary
To summarize the video, first we visualized the issue of
heteroscedasticity by plotting the residuals with fitted values
Residuals appeared to have non-constant variance
We conducted the tests of non-constant variance and found that the
result is indeed statistically significant
Training the model: Autocorrelation

Introduction
In this video, we discuss and empirically test the issue of

autocorrelation
Summary
To summarize the video, we conducted Durbin-Watson and
Breusch-Pagan tests of autocorrelation
We find evidence of serial correlation in error terms at higher order
In practical situation, one can account for such serial correlation by
adding the lags of variable that is serially correlated
Training the model: Robust Standard

Errors
Introduction
In this video, we discuss robust standard error to resolve the issue

of heteroscedasticity and autocorrelation
Summary
To summarize the video, we discussed the application of robust
standard errors in correcting for issues such as heteroscedasticity
and autocorrelation
We discussed four most prominent available rountines (hccm,
vcovHAC, vocvHC, and NeweyWest) for correcting the model
standard errors
Prediction with Simple Linear

Regression (SLR) Algorithm
Introduction
We have trained our simple linear regression algorithm and tested

with test data
Now, we will employ our trained algorithm for prediction using test
data
Summary
To summarize the video, we forecasted ABC returns using test data
We visualized the ABC actual and predicted returns
The predicted returns have 43.78% correlation with actual returns,
and therefore we conclude that regression algorithm has predicted
the returns reasonably accurately
Out-of-sample forecasting efficiency

Introduction
A model may perform good on the training data, i.e., in-sample

goodness-of-fit measures; however, its true capability is established
only if performs well in out-of-sample prediction
In this video, we will perform out-of-sample forecasting and
prediction based on the predicted values and actual values of ABC
returns
Summary
To summarize the video, one needs some cost or error function to
compare between competing algorithms
In this video we discussed and reviewed the implementation of
various cost/error functions (e.g., MSE, RMSE, MAPE, SMAPE,
MSLE, etc.)
Training the Multiple Linear Regression

(MLR) Algorithm
Introduction
In this video, we will implement the multiple linear regression (MLR)

algorithm with variables namely ABC Returns, market returns (Nifty
and Sensex), Dividend announcements, and Sentiment
Summary
To summarize the video, we trained our MLR algorithm using
training dataset
Then we reviewed the output of the model
We find that the model may be afflicted by the issue of
multicollinearity, which will be resolved in the next video
We also note that the variables namely Market returns, Sentiment,
and Dividend announcement appear to be significant
The model explains about 27.84% variation in the ABC returns
Case Study: Multicollinearity

Introduction
Independent variables may be correlated resulting in

multicollinearity
In this video, we will examine the issue of multicollinearity and find
ways to resolve the same
Case Study: Summary
To summarize the video, we computed the correlations across the
dependent variables and found that market proxies (Nifty and
Sensex) are highly correlated leading to the issue of multicollinearity
This is also corroborated by the high variance inflation factor (VIF)
odf ~2.9 with Nifty and Sensex
So we remove one of the market proxy (Nifty) and again train the
model and review the model output
Prediction with MLR

Introduction
In this video, using our trained MLR algorithm, we will make

predictions about ABC returns
Next, we will compare the predicted and actual returns through
visualization
We will also compute the correlations between the predicted and
actual returns
Summary
To summarize the video, we predicted the ABC returns using our
trained algorithm with the test data
Plots of actual vs predicted returns suggest that our predicted
returns are indeed able to mimic our actual returns
Moreover, our predicted returns exhibit a high correlation of about
57.70%, which indicates a high prediction accuracy

• ABC stock prices are modelled using simple regression problem

with market index variable
• Model is trained using training dataset and various goodness-of-
fit measures are examined
• Fitted modelled is examined visually as well
• Model is tested using test dataset and various measures of out of
sample fit are examined
• Next, a multiple linear regression model is trained using training

dataset on multiple variables
• Fitted model is visually examined and also various goodness-of-fit
measures are examined
• Model is evaluated on various issues related to multicollinearity,
heteroscedasticity, and autocorrelation
• Lastly, model is examined on various parameters for its out of
sample fit performance
Thanks!

Week 5 Notes

Uploaded by

Copyright:

Available Formats

Week 5 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 5 Notes

Uploaded by

Copyright:

Available Formats

Introduction

• Regression modelling: background and motivation

Background and Motivation

Machine Learning Algorithms

House price prediction problem (Regression problem)

Credit default scoring (classification problem)

Clustering problem (clustering problem: market segmentation)

Clustering problem (clustering problem : news aggregation)

Cross-sectional data: Observations about multiple individuals (units)

If information about A, B, C, D, E, and F is collected at t1, t2, t3, etc. then it is

Introduction to Simple Linear

Consider a simple linear regression model provided :

A closely associated concept is correlation, which establishes the

Any random probabilistic variable is often represented through

Consider a simple example of family income and consumption expenditure

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

Here population of 60 families is divided into 10 income (X) groups

For example, at X= 80, the mean of Y is 65, i.e., E(Y/X=80)=65;

Que: What is the best (mean) prediction of weekly expenditure of

Que: What is the best (mean) prediction of weekly expenditure of

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

Que: What is the best (mean) prediction of weekly expenditure of

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

Population and Sample Regression

Sample regression function is shown by adding “^” hat symbol,

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

Ordinary Least Square (OLS)

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

• However, to achieve this closed form solution CLRM-OLS makes

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

Introduction to Multiple Linear

We can generalize the two variable problem into multiple linear

Multiple linear regression 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 … . +𝜷𝒏 𝑿𝒏 + 𝒖

Multiple linear regression 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 … . +𝜷𝒏 𝑿𝒏 + 𝒖

Multiple linear regression 𝒀 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏 + 𝜷𝟐 𝑿𝟐 … . +𝜷𝒏 𝑿𝒏 + 𝒖

However, there may be instances of less than perfect collinearity

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

However, there may be instances of less than perfect collinearity

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

Interpreting the Multiple Linear

• Or Adjusted-𝑅2 =1-(1-𝑅2 )*(n-1)/(n-k)

Key CLRM Assumptions

The Gaussian, standard, or classical linear regression model (CLRM), makes

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 2)

Assumption 2: Values taken by the regressor X are considered fixed

Assumption 3: Zero conditional mean of disturbance (𝒖𝒊 ): given the

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

Assumption 4: Homoscedasticity or equal variance of 𝑢𝑖 . Given the value of X,

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

Assumption 5: No autocorrelation between the disturbances. Given

Assumption 5: No autocorrelation between the disturbances

Damodar N. Gujarati, Basic Econometrics, 4th edition onwards (Chapter 3)

Assumption 6: Zero covariance between 𝑢𝑖 and 𝑋𝑖 , or E(𝑢𝑖 𝑋𝑖 ) = 0.

• Assumption 7: The number of observations must be greater than

BLUE Properties of OLS Estimators

• OLS estimators are BEST in linear class of estimators

Classical Normal Linear Regression

• It is important to draw inferences about population parameters using sample

• These assumptions are summarised as 𝒖𝒊 ∼ 𝑵(𝟎, 𝝈𝟐 )