Heteroscedasticity

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

Click to edit Master title style

Heteroscedasticity
S ub mi tte d b y-
C ha hha t Gup ta (2 K 2 1 /B B A /2 9 )
Ma ni ke sh S i ng h (2 K 2 1 /B B A /8 1 )
Ri c ha C ha turve d i (2 K 2 1 /B B A /11 9 )
S hub hla kshmi (2 K 2 1 /B B A /1 4 2 )
Vikram Bose (2K21/BBA/174)

1
Click to edit Master title style
Meaning of Heteroscedasticity
• He te r o s c e d a s ti c i ty r e fe r s to t he e r r o r v a r i a nc e , o r d e p e nd e nc e o f
s c a t t e r i ng, w i t hi n a mi ni m u m o f o ne i nd e p e nd e n t v a r i a b l e w i t hi n a
p a r t i c ul a r s a mp l e . T he s e v a r i a t i o ns c a n b e us e d t o c a l c ul a t e t he
ma r gi n o f e r r o r b e t w e e n d a t a s e t s , s uc h a s e xp e c t e d r e s ul t s a n d
a c t ua l r e s ul t s , a s i t p r o v i d e s a me a s ur e o f t he d e v i a t i o n o f d a t a
p o i nt s f r o m t he me a n v a l ue .
• It ha p p e ns w he n t he s ta nd a r d d e v i a ti o ns o f a pr e d ic te d v a ri a b le ,
mo ni t o r e d o v e r d i f f e r e nt v a l ue s o f a n i nd e p e nd e n t v a r i a b l e o r as
r e l a t e d t o p r i o r t i me p e r i o d s , a r e no n- c o ns t a nt .
• He te r o s ke d a s ti c i ty i s a v i o l a ti o n o f t he a s s u mp ti o ns fo r l i ne a r
r e gr e s s i o n mo d e l i n g, a nd s o i t c a n i mp a c t t he v a l i d i t y o f
e c o no me t r i c a na l ys i s o r f i na nc i a l mo d e l s l i ke C a p i t a l A s s e t P r i c i n g
M o d e l ( C A P M ) , w hi c h e xp l a i ns t he p e r f o r ma nc e o f a s t o c k i n t e r ms
o f i t s v o l a t i l i t y r e l a t i v e t o t he ma r ke t a s a w ho l e .
• I t o f t e n a r i s e s i n t w o f o r ms : c o nd i t i o na l a nd u nc o nd i t i o na l .
Co nd i ti o na l he te r o s ke d a s ti c i ty i d e nti fi e s no nc o ns ta n t v o l a ti l i ty
r e l a t e d t o p r i o r p e r i o d ' s ( e . g . , d a i l y ) v o l a t i l i t y. U n c o n d i t i o n a l
he te r o s ke d a s ti c i ty r e fe r s to ge ne r a l s tr uc t ur a l c ha n ge s i n v o l a ti l i ty
t h a t a r e n o t r e l a t e d t o p r i o r p e r i o d v o l a t i l i t y. U n c o n d i t i o n a l
he te r o s ke d a s ti c i ty i s us e d w he n f ut ur e p e r i o d s o f hi g h a nd l o w
2 2
v o l a ti l i ty c a n b e i d e nti fi e d .
Applications oftitle
Click to edit Master Heteroscedasticity
style
1 . F i na nc i a l E c o no me t r i c s :
H e t e r o s c e d a s t i c i t y i s w i d e l y p r e s e nt i n f i na nc i a l t i me s e r i e s d a t a d ue t o
ma r ke t f l uc t ua t i o ns a nd v a r i o us e c o no mi c f a c t o r s . I d e nt i f yi n g a nd m o d e l i n g
he te r o s c e d a s ti c i ty i s e s s e nti a l fo r a c c ur a te v o l a ti l i ty fo r e c a s ti n g, r i s k
ma na ge me n t , a nd p o r t f o l i o o p t i mi za t i o n i n f i na nc e . P r o p e r l y a c c o u nt i n g f o r
he t e r o s c e d a s t i c i t y he l p s t o i mp r o v e t he a c c ur a c y o f f i na nc i a l mo d e l s a nd
d e c i s i o n - ma ki ng p r o c e s s e s .

2 . Re gr e s s i o n Ana l ys i s :
H e t e r o s c e d a s t i c i t y a s s e s s me nt i s a c r i t i c a l s t e p i n r e gr e s s i o n a na l ys i s . B y
e xa mi ni n g t he v a r i a b i l i t y o f e r r o r s , a na l ys t s c a n i d e nt i f y he t e r o s c e d a s t i c i t y
a nd ma ke i nf o r me d d e c i s i o ns a b o ut model s p e c i fi c a ti o ns , p a r a me t e r
e s t i ma t i o ns , a nd h yp o t he s i s tes ti ng. Detecti ng a nd addressing
he t e r o s c e d a s t i c i t y e ns ur e s t he v a l i d i t y a nd r e l i a b i l i t y o f r e gr e s s i o n mo d e l s .

3 . E nv i r o nme nt a l R e s e a r c h :
I n f i e l d s s u c h a s e n v i r o n m e n t a l s c i e n c e a n d c l i m a t o l o g y, h e t e r o s c e d a s t i c i t y
a na l ys i s i s v i ta l fo r s t ud yi n g v a r i a b l e s w i th v a r yi n g l e v e l s o f d i s p e r si o n.
Id e n ti f yi n g he te r o s c e d a s ti c p a tte r ns he l p s r e s e a r c he r s und e r s ta nd t he
r e l a t i o ns hi p s b e t w e e n d i f f e r e nt v a r i a b l e s a nd t he i r i mp a c t o n e nv i r o n m e nt a l
p he no me na . I t a l l o w s f o r mo r e a c c ur a t e p r e d i c t i o ns a nd a s s e s s me nt s r e l a t e d t o
3 3
c l i ma t e c ha nge , p o l l ut i o n, a nd na t ur a l r e s o ur c e ma na ge me n t .
Click to edit
Causes of Master title style
Heteroskedasticity
There are many reasons why heteroskedasticity may occur in regression models, but it typically involves problems with the dataset.
It has been shown that models involving a wide range of values are more prone to heteroskedasticity because the differences between the smallest
and largest values are so significant.
• For example-
Suppose a dataset contains values that range from 1,000 to 1,000,000. A 10% increase in 1,000 is only 100. However, a 10% increase in 1,000,000
is 100,000. Therefore, it would be expected that larger residuals would be associated with higher values that will cause an unequal variance of the
residuals and therefore result in heteroskedasticity. The concept can apply to many types of datasets where a wide range of values are expected.

1. Time-series datasets are the situations where the variables change drastically over time.
• For example, if you analyzed retail e-commerce sales for the past 30 years, the number of sales over the past 10 years would be significantly
larger due to the recent prevalence of online shopping. It would potentially skew the residuals and result in heteroskedasticity.

2. Cross-sectional datasets are also prone to heteroskedasticity, as they involve a wide range of values.
• For example, if you were to analyze the incomes of all fast-food workers in Toronto, the range of values wouldn’t deviate too much as most
fast-food workers earn close to minimum wage.
• However, if you were to analyze the incomes of all workers in Toronto, there would be a wide range of values due to all the differences in
salaries. It would result in an unequal distribution of values and increase the chances of heteroskedasticity. 4 4
A realistic
Click example
to edit Master for
title style the concept
of - Heteroskedasticity
One common example of heteroskedasticity is the "Relationship between food
expenditures and income".

• For those with lower incomes, their food expenditures are often restricted
based on their budget.
• As incomes increase, people tend to spend more on food as they have more
options and fewer budget restrictions.
• For wealthier people, they can access a variety of foods with very few
budget restrictions.
• Therefore, there is a greater variance in food expenditures of wealthier
people relative to lower-income individuals.

Thus, in such a situation, the variance of the residuals is unequal across the
independent variable (income). If one were to run a regression using this
dataset, one would find the presence of heteroskedasticity.
5 5
Teststofor
Click editheteroskedasticity
Master title style
The presence of heteroskedasticity affects the estimation and test of hypothesis. The heteroskedasticity can enter into the data due to
various reasons. The tests for heteroskedasticity assume a specific nature of heteroskedasticity. Various tests are available for
heteroskedasticity such as:

Test When to use Limitations

Graphical Method It is generally advised to implement the residual vs fitted This method involves eyeballing the graphs. The
plots or residual vs independent variable graphs after every pattern between residuals and independent
regression. These graphs can provide important insights into variable/fitted values may not always be clear.
the behavior of residuals and heteroscedasticity. Moreover, it might be affected by the subjective
opinions of researchers.
Breusch-Pagan-Godfrey test This test can be employed if the error term or residuals are It is very sensitive to the assumption of normal
normally distributed. Therefore, it is advisable to check the distribution of the residuals.
normality of residuals before implementing this test.

Park test This test can be used as a precursor to the Goldfeld- The error term within the test itself may
Quandt test, to determine which variable is appropriate be heteroscedastic, leading to unreliable results.
to order observations.

6 6
Teststofor
Click editheteroskedasticity
Master title style
Test When to use Limitations
White's Heteroscedasticity test This test does not depend on the normality of error terms It generally requires a large sample because the
and it does not require choosing 'c' or number of variables including the squared and cross
ordering observations. Hence, it is easy to implement. products can be huge, which can restrict degrees of
However, it is important to keep the limitations in mind. freedom. This is also a test of specification errors.
Therefore, test statistic may be significant due to
specification error rather than heteroscedasticity.
Goldfield-Quandt test This test is preferred over the Park test and Glejser's 'c' or central observations to be omitted must be
test. Goldfeld and Quandt suggested that c=8 and c=16 carefully chosen. A wrong choice may lead to
are usually appropriate for around n=30 and n=60 unreliable results. For multiple independent
observations respectively. variables, it becomes difficult to know beforehand
which variable should be chosen to order the
observations. Separate testing is required to
determine which variable is appropriate.
Glejser's Heteroscedasticity test This test has been observed to give satisfactory results Similar to the Park test, the error term within the test
in large samples. However, the most important usage of may be heteroscedastic.
this test is to determine the functional form of
heteroscedasticity before implementing Weighted Least
Squares (WLS). This test can be used to determine the
weights in WLS, depending on the relationship between
the variance of residuals and independent variables.
7 7
Solution
Click to edit to thetitle
Master problem
style of Heteroscedasticity

The best solution for heteroscedasticity is to modify the model so that the problem disappears.
For example:
• Transform some of the numeric variables by taking their natural logarithms
• Transform numeric predictor variables
• Build separate models for different subgroups
• Use models that explicitly model the difference in the variance (as opposed to just modeling the mean, which is what most models do)
• A simpler approach is to use robust methods for computing standard errors, such as Huber-White sandwich estimator. These produce
correctly computed standard errors in the presence of heteroscedasticity; however, they do not correct for the actual model
misspecification, so they may be more of a fig leaf than a real remedy.

8 8
EXAMPLE
Click to edit Master title style
• The graph depicts a scatter plot showing the relationship
between the number of employees and the workforce size.

• The x-axis represents the number of employees, while the y-


axis represents the workforce size.

• Additionally, a linear regression line is displayed on the graph.


The line passes through the origin and has a positive slope.

• The linear regression line indicates that as the workforce size


increases, the number of employees also increases. In other
words, there is a positive correlation between the number of
employees and the workforce size.

• The graph also includes a few data points, which likely


represent the actual values of the number of employees for
corresponding workforce sizes

9
Explanation of the Question with Numerical Values from the Graph
Click to edit Master title style
To analyze the relationship between the number of employees and the workforce size based on the specific numerical values pro vided in the graph.
Here's how we can approach the question:

1. We would analyze the relationship: 2. Linear Regression Line: 3. Specific Examples:

• Correlation: Based on the data points, there To quantify the relationship and •Workforce size of 250: Using the equation,

appears to be a positive correlation between estimate the expected number of the expected number of employees for a

the number of employees and the workforce employees for different workforce workforce size of 250 is:

size. As the workforce size increases, the sizes, we can calculate the equation •Number of Employees = 16 * 250 + 1600 =
number of employees also tends to increase. of the linear regression line. Using 5600
the provided data points, the
• Trend: The data points suggest a linear •Workforce size of 450: Using the equation,
equation of the line is:
trend, meaning the number of employees the expected number of employees for a
Number of Employees = workforce size of 450 is:
increases at a constant rate as the workforce
size increases. 16 * Workforce Size + 1600 Number of Employees = 16 * 450 + 1600 =

• Variability: There is some variability in the 8400

data, with some points falling slightly above


or below the linear trend. This could be due
to other factors influencing the number of 10

employees besides workforce size.


Click
Note: to edit Master title style

Calculating the Values for Expected Number of Employees


The values 16 and 1600 in the equation for the expected number of employees come from the linear regression analysis of the data
points provided in the graph.
Here's a step-by-step explanation of how these values were obtained:

1. Linear Regression : 2. Equation of the Line :

Linear regression is a statistical method used to find • The equation of the linear regression line takes the form:

the best-fit line through a set of data points. This line • y = mx + b


represents the overall trend of the relationship between • where:
two variables. In this case, the variables are the • y: Number of employees (dependent variable)
workforce size (x) and the number of employees (y) • x : Workforce size (independent variable)
• m : Slope of the line
• b : Y-intercept of the line

1111
Click to edit Master title style
Note:
3. Calculating the Slope (m) : 4. Calculating the Y-intercept (b):
The slope of the line represents the rate of change in the number of
The Y-intercept represents the value of y when x is equal to 0.
employees for every unit increase in the workforce size. It can be
It can be calculated using the following formula:
calculated using the following formula:
b = (Σ(y) - mΣ(x)) / n
m = (Σ(xy) - Σ(x)Σ(y)) / (Σ(x^2) - (Σ(x))^2)
where:
where:
* n is the number of data points
• Σ represents the sum of all values
• x and y are the individual data points Using the data points from the graph and the calculated slope,
• x^2 is the square of each x value we can calculate the Y-intercept as follows:
Using the data points from the graph, we can calculate the slope as
This calculation gives us a Y-intercept of 1600.
follows:
This calculation gives us a slope of 16.

1212
Click to edit MasterEXAMPLE
title style -2
• The graph depicts a scatter plot showing the relationship
between Monthly income and the expenditure on food of
individuals.
• The x-axis represents the expenditure on food, while the y-axis
represents monthly income.
• The linear regression line indicates that as the Income increases,
the Amt spent on food also increases. In other words, there is a
positive correlation between the monthly income and the
Expenditure on Food.
• The graph also includes a few data points, which likely
represent the actual values of the Food Expenditure for
corresponding Income Levels.
• Note: We are looking at "Food Expenditure" which in Book
Accounting terms can be 0 or Negative. "Food
Consumption" cannot be negative, but Expenditure can be
negative. Some months the local General Store owner can
extend a line of credit, or the buyer can make Advance
Payments, etc. Scale:-
X-axis: 1 = Rs. 1,00,000 Y-axis: 1 = Rs. 10,000

13
Explanation of the Question with Numerical Values from the Graph
Click to edit Master title style
To analyze the relationship between Monthly Income of individuals and the Food Expenditure based on the specific numerical values
provided in the graph. Here's how we can approach the question:

1. We would analyze the relationship: 2. Linear Regression Line: 3. Specific Examples:

• Correlation: Based on the data points, there appears To quantify the relationship and estimate •Income level of 40,000: Using the equation, the

to be a positive correlation between the Monthly the expected number of employees for expected food expenditure for income level of 40,000

Income of individuals and their Expenditure on different workforce sizes, we can is:
calculate the equation of the linear
Food. As the income level rises, the expenditure on •Food Expenditure = 0.125 * 40,000 + (-1250) = Rs.
food also tends to increase. regression line. The regression line 3750
equation is:
• Trend: The data points suggest a linear trend, •Income level of 60,000: Using the equation, the
meaning Food Expenditure increases at a constant [ Y = 1,250 + 0.125X ] expected Food Expenditure at income level of 60,000

rate as the Monthly Income levels rise. (Y) represents food expenditure (in ₹). is:

(X) represents income (in ₹). Food Expenditure = 0.125 * 60,000 + (-1250) = Rs.
• Variability: There is some variability in the data,
6,250
with some points falling slightly above or below the
linear trend. This could be due to other factors such For every ₹1,000 increase in income, food
expenditure tends to increase by ₹125.
as personal preferences, lifestyle choices, and
cultural factors

14
Click
Note: to edit Master title style

Calculating the Values for Expected Food Expenditure


The values 0.125(m) and (-1250), i.e., (b) in the equation for the expected number of employees come from the linear regression analysis of the
data points provided in the graph.
Here's a step-by-step explanation of how these values were obtained:

1. Linear Regression : 2. Equation of the Line :

Linear regression is a statistical method used to find • The equation of the linear regression line takes the form:

the best-fit line through a set of data points. This line • y = mx + b


represents the overall trend of the relationship between • where:
two variables. In this case, the variables are the • y: Food Expenditure (dependent variable)
Income level (x) and the Food Expenditure (y) • x : Income level (independent variable)
• m : Slope of the line
• b : Y-intercept of the line

1515
Click to edit Master title style
Note:
3. Calculating the Slope (m) : 4. Calculating the Y-intercept (b):
The slope of the line represents the rate of change in the Individual
The Y-intercept represents the value of y when x is equal to 0.
Food Expenditure amounts for every unit increase in Individual
It can be calculated using the following formula:
Monthly Income Levels. It can be calculated using the following
b = (Σ(y) - mΣ(x)) / n
formula:
where:
m = (Σ(xy) - Σ(x)Σ(y)) / (Σ(x^2) - (Σ(x))^2)
* n is the number of data points
where:
• Σ represents the sum of all values Using the data points from the graph and the calculated slope,

• x and y are the individual data points we can calculate the Y-intercept as follows:
• x^2 is the square of each x value This calculation gives us a Y-intercept of (-1,250) or (-1.25)
Using the data points from the graph, we can calculate the slope as
and the scale of Y-axis is 1 unit = Rs. 1,000. Hence, we arrive
follows:
This calculation gives us a slope of 0.125. at the value of (-1,250) as the Y-intercept.

1616
References
Click to edit Master title style

• https://spureconomics.com/heteroscedasticity-causes-and-consequences/

• https://www.researchgate.net/figure/A-Heteroscedastic-Linear-Regression-Relationship-Between-Widgets-Produced-and-
Workforce_fig1_262972023

• https://www.investopedia.com/terms/h/heteroskedasticity.asp

• https://home.iitk.ac.in/~shalab/econometrics/Chapter8-Econometrics-Heteroskedasticity.pdf

1717
Click to edit Master title style

Thank You

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy