Week 5 Notes
Week 5 Notes
Week 5 Notes
Types of Data
Types of Data
If information about A is collected over times t1, t2, t3 then it is time-series data
Types of Data
If information about A, B, C, D, E, and F is collected at t1, then it is cross-
sectional data
Types of Data
While regression deals with the dependence of one variable over another, it
does not imply causation
• Regression only establishes the statistical strength of the relation, the
causation is established by theory
Example of crop and rain
• A priori theoretical considerations are needed to imply causation
• In regression analysis, dependent variable is considered random or
stochastic (i.e., with probability distribution), while explanatory variable is
assumed to have fixed values
Regression vs. Causation vs. Correlation
Expectations Operator
Expectations Operator ‘E’
For example, if there are ‘n’ possibilities of an event, y1, y2, y3,..,yn each with
possibilities p1, p2, p3, p4…pn, then expectations operator is defined as
• E(y)= 𝑝1 ∗ 𝑦1 + 𝑝2 ∗ 𝑦2 +𝑝3 ∗ 𝑦3 +……+𝑝4 ∗ 𝑦4
• This is also called probability weighted mean
1
• If all the probabilities are assumed to be equal then 𝑝1 =𝑝2 =…..=𝑝𝑛 =
𝑛
1
• Then E(y)= (𝑦1 +𝑦2 +𝑦3 +….+𝑦𝑛 ), i.e., simple average of Y’s
𝑛
Summary
• We discussed the role of expectations operator (E) in the context
of stochastic random variable with a probability distribution
• In simple terms, expectations are probability weighted averages
of stochastic random variable
• In case there is no a priori probabilities assigned to these
variables, then the expectation is simple average of the stochastic
random variable
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
A Simple Example
A Simple Example
This unconditional mean does not account for the level of income
(X) and is the prediction of Y (expected value) when there is no
knowledge of X
• However, if one has the knowledge of X, then one can improve
the prediction by computing conditional mean of Y, i.e., E(Y/X),
which is a more accurate prediction of Y
A Simple Example
0 + 𝛽
Sample regression function (SRF): 𝑌𝑖 = 𝛽 1 𝑋𝑖
𝜕 σ µ2𝑖
• 0 + β
= −2 σ Yi − β 1 Xi =-2σ µෝi 0 )
(partial differential w.r.t. to 𝛽
0
𝜕𝛽
𝜕 σ µ2𝑖
• 0 + β
= −2 σ Yi − β 1 Xi Xi =-2σ µෝi Xi (partial differential w.r.t. to
1
𝜕𝛽
1 )
𝛽
Method of Ordinary Least Square
Estimation (OLS)
2 =σ 𝑌 − 𝑌 2 2
σµ 𝑖 𝑖
0 − 𝛽
= σ 𝑌𝑖 − 𝛽 1 𝑋𝑖
𝑖
σ(𝑿𝒊 −𝑿ഥ ) 𝒀𝒊 −𝒀
ഥ
• 𝟏 =
𝜷 and 𝟎 = 𝒀
𝜷 𝟏 𝑿
ഥ -𝜷 ഥ
𝑿 𝒊 −𝑿ഥ 𝟐
2 =σ 𝑌 − 𝑌 2 2
• σµ 𝑖 𝑖
= σ 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋𝑖 : Minimize these squared
𝑖
residuals
Homoscedasticity Heteroscedasticity
= 𝐸[ 𝑢𝑖 𝑋𝑖 𝑢𝑗 |𝑋𝑗 ]=0
Key CLRM Assumptions
Brealey, Myers and Allen; Principles of Corporate Finance. 10th, 11th, or 12th editions. Chapter 8
A Few Words on Normal Distribution
Standard Normal Distribution
Classical Normal Linear Regression Model
(CNLRM)
The estimation of sample parameters is not complete without hypothesis testing
0 , 𝛽
(𝛽 1 )
So we are given the data for stock market price for ABC company, along with Nifty and Sensex
(market indices). We are also given the data of dividend announcement and a sentiment index
Dividend
Date Price ABC Sensex Sentiment Nifty
Announced
03-01-2000 718.15 0.079925 0.073772 0 0.048936 0.095816
04-01-2000 712.9 -0.00731 0.021562 0 -0.05504 0.009706
05-01-2000 730 0.023987 -0.02441 0 0.019135 -0.03221
06-01-2000 788.35 0.079932 0.012046 0 0.080355 0.011205
07-01-2000 851.4 0.079977 -0.0013 0 0.094038 -0.0004
10-01-2000 919.5 0.079986 0.019191 1 0.015229 0.030168
11-01-2000 880 -0.04296 -0.04025 0 -0.07217 -0.04966
12-01-2000 893.75 0.015625 0.036799 0 0.01396 0.020999
13-01-2000 875 -0.02098 -0.00845 0 0.057518 -0.01164
14-01-2000 891 0.018286 0.004858 1 0.008828 0.020714
17-01-2000 819.75 -0.07997 -0.01228 0 -0.12395 -0.00962
…… …… …… …… …… …… ……
…… …… …… …… …… …… ……
Case Study: Sentiment Problem
• Sensex and nifty are the two main stock indices used in India
• They are benchmark Indian stock market indices that represent
the weighted average of the largest Indian companies
• So, Sensex represent average of 30 largest and most actively
traded Indian companies
• Similarly, Nifty represents a weighted average of 50 largest Indian
companies.
Case Study: Sentiment Problem
In this video we will examine the key variables in the data through
visualization
We will visualize the returns on ABC and Nifty
We will also visualize the cumulative returns for ABC and Nifty
Summary
To summarize the video, we visualized the returns and cumulative
returns for ABC and Nifty returns using R programming
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
In this video we will examine the key variables in the data through
visualization
We will visualize the returns on ABC and Nifty
We will also visualize the cumulative returns for ABC and Nifty
Summary
To summarize the video, we visualized the returns and cumulative
returns for ABC and Nifty returns using R programming
In the next video, we will examine the summary measures
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
In this video, we will discuss the basic properties of the data and
summary measures
Summary
To summarize the video, first we summarized the key return
variables
Next we plotted the density distribution of these variables
We noted that ABC returns are heavily skewed towards the left
INDIAN INSTITUTE OF TECHNOLOGY KANPUR
In this video, we will segregate the “ABC” data into training and test
data
Training data is employed to train the linear regression algorithm
Test data is employed to test the out-of-sample forecasting
efficiency of the algorithm
Summary
To summarize the video, we segregated our data in two segments
The training data included observations from the year 01-Jan-2007
to 01-Dec-2017, comprising 2850 observations
The test data included observations from the year 04-Jan-2017
onwards, comprising 478 observations
INDIAN INSTITUTE OF TECHNOLOGY KANPUR