Lecture 1 - Introduction and Background

AA017004
INTRODUCTION AND BACKGROUND

LECTURE 1
DR. ELYA NABILA ABDUL BAHRI
AA017004 ECONOMETRIC THEORY AND METHODS 1

OUTLINE
 Introduction
 Identify the scope of research
 The flow chart of econometrics
 Type of Data
 The Data Structure
 Size of Data
 The model specification
 Descriptive Statistic
 Data plotting
 Data transformation
 Data cleaning
 Inference
 Hypothesis
 Estimation
3
 Interpretation
AA017004 ECONOMETRIC THEORY AND METHODS
INTRODUCTION
 Methods refer to the tools or techniques applied in the research process.

 Procedures are the way we put the tools and techniques together, in specific sequences and combinations, to
achieve the objectives of the research study.
 The concern of methodology is that appropriate methods and procedures are selected, designed and applied so
as to both achieve the research objectives and produce reliable knowledge.
Source: Ethridge, Don., Published by Iowa 4

State University Press/AMES
PURPOSES OF METHODS AND PROCEDURES
 The central purpose of methods and procedures in the research project proposal is to provide the plan, and its
description, of how the objectives of the study will achieved.
 It is the what, why, and how of the research project: step-by-step specification and description of what will be
done, how it will be done, and why it will be done in the specified manner.
 How it will be done includes the order or sequence of steps in the process. It includes specification of the
quantitative estimation techniques, analytical methods, data that will be collected, and how the data will be
obtained and processed.

 Methods and procedures also include how each of these relate to the objectives of the research, how empirical
estimates will be tested and analyzed. Model development and design, justification of model mathematical form,
how results will be interpreted, and so on.
 From a more general perspective, the methods and procedures delineate the approach for testing the hypotheses
of the study.
 Recall that hypotheses may be qualitative as well as quantitative and that hypotheses are not always stated
explicitly, particularly in economic research.
 The methods and procedures specify the means through which the various hypotheses are tested (examined)

 The appropriateness of the methods and procedures depends on the problem and the objectives.
 This is yet another reminder of the importance of the problem identification and specification in the research
process.
 Methods and procedures directly address the objectives, but the research objectives are derived from the
research problem.
 Consequently, the research methods and procedures are driven by the problem and objectives, not the other way
around.

 The most complex or sophisticated methods and

procedures are not always the most appropriate.
 Those characteristics may influence the publishability of a
piece of research for an academic journal, but the
complexity, in itself, will not make them more effective or
appropriate.
 Complexity may only make the methods and procedures
inefficient.
 If you keep focus on (1) identifying meaningful
researchable problems, (2) specifying appropriate
objectives, and (3) developing appropriate methods and
procedures to achieve those objectives, your research is
more likely to be well received.
IDENTIFY THE SCOPE OF RESEARCH
 Area of research
 Scope of study
 One single individual or one single time
 Many individual or long time series
 Many individual and varies of time
 Identify the related variables
 Model specification
 Based on the literature
 Dependent variable and independent variables
 Focal variable, control variable
 Identify the data collection
 Primary or secondary data
9
 Quantitative or qualitative
FLOW CHART OF
ECONOMETRICS
10

TYPE OF DATA
 Continuous
 Discrete
 Ordinal
 Likert scale
 Nominal
 Dichotomous
 Count/Non-negative
11

TYPE OF DATA STRUCTURE
 Cross-section : many cross-section (i) (individuals, firms, countries)

 Time Series : long time series (t) (second, minutes, hourly, daily, weekly, monthly, quarterly, yearly)
 Panel Data : the combination of cross-section & time series (it)
12

MODEL SPECIFICATION: LINEAR MODEL
 Cross-section
yi = α + β1X1i + β2X2i + εi i = 1, 2, …, N
 Time series
yt = α + β1X1t + β2X2t + εt t = 1, 2, …, T
 Panel data
yit = α + β1X1it + β2X2it + εit i = 1, 2, …, N and t = 1, 2, …, T
13

SIZE OF DATA
 Cross-section (N) : Depends on the sampling size technique
 Kerjcie and Morgan (1980) table
 Statistical calculation
 Time Series (T) :
 Short time series: T < 10 years
 Long time series: T > 30 years
 Panel Data
 T>N
 T<N
 T=N
14

THE ISSUES OF THE DATA
 Reability and validity of the data - selection bias, accuracy, consistency responses by respondents
 Source of data
 Data discrepancy
 Variable and proxy used
 Data measurement
15

EXAMPLE: DATA FROM WORLD BANK
16

DESCRIPTIVE STATISTICS
 Mean
 Minimum
 Maximum
 Skewness
 Kurtosis
 Distribution : Jarque-Bera test
 Number of observation
17

DATA PLOTTING
 Graph line : to identify the trend – deterministic, fluctuation, random

 Graph bar : to compare the individual by category
 Pie Chart : to compare the percentage or market share
 Scatter Plot : to identify the relationship
 Box Plot : to identify the data scale and potential outlier
18

DATA TRANSFORMATION
 Standardizing scale
 Natural logarithm : Continuous data
 Demean : Continuous data
 Normalization : discrete or continuous data
 Data smoothing (time series)
 Seasonal adjustment
 Denoise – Kalman Filter, wavelet, Hodrick-Prescott
19

DATA CLEANING
 Outlier (cross-section)
 Outlier will effect the estimation
 Outlier detection: DFIT, Cook’s D, scatter plot
 Missing value
 Interpolation
 High frequency to low frequency
 Low frequency to high frequency
 Unbalanced panel
20

HYPOTHESIS
All test must have the hypothesis

 Null hypothesis : opposite outcome
 Alternative hypothesis : expected outcome
21

LINEAR REGRESSION
 The regression has five key assumptions:

 Linear relationship
 Multivariate normality
 No or little multicollinearity
 No auto-correlation
 Homoscedasticity

ESTIMATION
 Choose the estimator that can compatible with our data structure, size
 Calculate the coefficient to investigate the relationship, sign and size
𝑦ො𝑖𝑡 = α +𝛽መ1 X1it + 𝛽መ2 X2it + 𝜀𝑖𝑡

Ƹ
𝑦ො𝑖𝑡 = α +0.5X1it - 0.3X2it + 𝜀𝑖𝑡

Ƹ
s.e. (0.23) (0.12)
23

INFERENCE
 The investigate the relationship is based on the p-value or t-test.

 Unrestricted model
𝑦ො𝑖𝑡 = α +0.5X1it - 0.3X2it + 𝜀𝑖𝑡

Ƹ
s.e. (0.23) (0.12)
 Identify which independent variable (X) have a statistically significant relationship on dependent variable (Y)
 1 unit increase of X1, y will increase 0.5 unit
 1 unit increase of X2, y will decrease 0.3 unit
 If both (IV and DV) use log form, we can say 1 % increase of X1, y will increase 0.5 %, that indicate the elasticity
between X and Y.
24

INFERENCE (CONT.’)
 Restricted model
𝑦ො𝑖𝑡 = 0.2𝑦it-1 +0.4𝑦it-2 + 0.5X1it - 0.3X2it + 𝜀𝑖𝑡
Ƹ
s.e. (0.05) (0.23) (0.12)
 Depend on the sign and size.
Normalization:
 X1 = 0.5/1-(0.2+0.4)
25

DIAGNOSTIC CHECKING
 Heteroscedasticity – ARCH test, Breusch-Godfrey-Pagan test, White test

 Autocorrelation – Breusch-Pagan LM test, Durbin Watson
 Multicollinearity – Variance Inflation Factor (VIF), Pearson Correlation
 Misspecification – Ramsey RESET test
Other issues need to be considered:

 Selection bias – Heckman selection bias test
 Endogeneity problem – Durbin-Wu-Hausman test
 Dynamic effect – lagged dependent
 Persistency
26

RESEARCH DESIGN
 Issue
 Problem statement
 Research question
 Research objective
 Hypothesis
 Estimation
27

EXAMPLE 1
 Issue: Inflation rate increase faster than wage increase

 Problem statement: peoples are unable to survive with current wage to consume goods and service for better life
- consumer not enjoy the higher utility
 Research question – What factors can reduce inflation?
 Research objective
 To investigate which factors (interest rate, exchange rate, consumption) can reduce inflation.
 Scope of Study
 Selected 100 developed and developing countries in year 2019
28

EXAMPLE 1 (CONTINUE)
Interest rate H1
H2
Exchange rate Inflation
H3
Consumption
 Hypothesis
 Ho: Higher interest rate can not reduce the inflation
 H1: Higher interest rate can reduce the inflation
 It means the expected sign for IR coefficient is negative.

29

EXAMPLE 1 (CONTINUE)
𝐼𝑁𝐹𝑖 = α +𝛽1 𝐼𝑅i +𝛽2 𝐸𝑅i +𝛽3 𝐶 i + 𝜀𝑖Ƹ i = 1, 2, …, N
 Estimation
 Ordinary least square
 Two stage least square
 Diagnostic Test
 Heteroscedasticity, autocorrelation, multicollinearity, misspecification
30

RESULTS INTERPRETATION
 The relationship is based on the p-value or t-test.

 Unrestricted model
෣ 𝑖 = α - 0.5𝑙𝑛𝐼𝑅
𝑙𝑛𝐼𝑁𝐹 ෣ 𝑖 + 0.3𝑙𝑛𝐸𝑅
෣ 𝑖 + 0.7𝑙𝑛𝐶
෢ 𝑖 + 𝜀𝑖𝑡
Ƹ
p-value (0.00) (0.00) (0.00)
 1% increase in IR, INF will decrease 0.5%
 1% increase in ER, INF will increase 0.3%
 1% increase in C, INF will increase 0.7%
31

THE BOX-COX TRANSFORMATION
Table 1: Interpretation in logarithmic models
Name Functional Interpretation
Linear Y = β1 + β2X 1 unit change in X will induce a β2 unit change in Y
Linear-log Y = β1 + β2lnX 1 percent change in X will induce a β2/100 unit change in Y
Log-linear lnY = β1 + β2X 1 unit change in X will induce a 100β2 percent change in Y
Double-log lnY = β1 + β2lnX 1 percent change in X will induce a β2 percent change in Y
32

ESTIMATION METHODS FOR CROSS-SECTION ANALYSIS
 Ordinary Least Square (OLS)

 Two Stages Least Square (2SLS)
 Three Stages Least Square (3SLS)
 Binary Logistic Regression
 Order Logit Regression
 Multinomial Regression
 Threshold Regression (Cross-section)
33

CROSS-SECTION DATA
No. Dependent Independent Methods of Estimation Remark
Variable Variable
1. Scale Scale / Discrete Ordinary Least Square If OLS assumptions are fulfilled
2. Scale Scale / Discrete Two-Stage Least Square Overcome the endogeneity problem
3. Scale Scale / Discrete Three-Stage Least Square Simultaneous equation

4. Binary/ Scale / Discrete Binary Logistic Regression The interpretation based on probability,
Dichotomous odds ratio
5. Binary (ordinal)/ Scale / Discrete Ordered Logit Regression The interpretation based on probability,
Dichotomous odds ratio
6. Nominal Scale / Discrete Multinomial Regression The interpretation based on probability,
relative risk ratio
7. Scale Scale / Discrete Threshold Regression The relationship between IV and DV is
change if surpassed the threshold point
34

ESTIMATION METHODS FOR TIME SERIES ANALYSIS
 Ordinary Least Square (OLS)

 Vector Autoregression (VAR)
 Vector Error Correction Model (VECM)
 Autoregression Distributed Lag (ARDL)
 Self-Exciting Threshold Regression (SETAR)
 Asymmetric Cointegration
35

TIME SERIES DATA
No. Stationarity Order of Cointegration Methods Remark
Integration
1. All stationary at level I(0) No (Johansen- Vector Autoregressive No long run relationship
Juselius test) (VAR)
2. All stationary at first I(1) No (Johansen- Vector Autoregressive No long run relationship
difference Juselius test) (VAR)
3. All stationary at first 1(1) Yes (Johansen- Vector Error Correction Long run relationship
difference Juselius test) Model (VECM)
4. Stationary at level and 1(0) & I(1) Yes (Bound test) Autoregressive Short run relationship
first difference Distributed Lag (ARDL) Long run relationship
All stationary at level I(0) or I(1) No (Johansen- Structural VAR Incorporated with the
or first difference Juselius test) shock response
5. - - - Self-Exciting Threshold To test the threshold
Regression (SETAR) effect
6. - - Yes (Johansen- Asymmetric To test the asymmetric 36
Juselius test) Cointegration effect
ESTIMATION METHODS FOR PANEL DATA ANALYSIS
 Panel Ordinary Least Square (POLS)

 Static Panel (POLS, FE, RE)
 Dynamic Panel Data (Generalize Method of Moment (GMM))
 Panel 2SLS, Panel 3SLS
 Panel Seemingly Unrelated Regression
 Panel Cointegration
 Panel VECM
 Panel VAR
 Panel Threshold Regression
37

PANEL DATA
No. Sample Size Model Methods Remark
1. N =~ T Static Panel Data (POLS, POLS (Ordinary Least Square) - one-way incorporated the
(small sample) FE, RE) FE (Within regression) heterogeneity of cross-section
RE (GLS) (N)
2. N =~ T Static Panel Data (POLS, Two-Stage Least Square - To overcome the endogeneity
(small sample) FE, RE) problem
3. N =~ T Dynamic Panel Data Least Square Dummy Variable Bias - To solve the endogeneity
(small sample) Corrected (LSDVC) problem with dynamic effect
4. N = T or Panel Data Hausman Taylor - To solve the endogeneity
N>t problem with time invariant
5. N >T Dynamic Panel Data Generalized Method of Moment N > 50, 4 < T < 10
- Difference-GMM One-step
- System-GMM Two-step
38

PANEL DATA (LONG TIME SERIES)
No. Sample Model Order of Cointegration Methods Remark

Size integration
6. T>N Non-stationary Panel I(1) Yes (Pedroni) Fully Modified Ordinary To solve endogeneity and
Least Square (FMOLS) heterogeneity
7. T>N Non-stationary Panel I(1) Yes (Pedroni) Dynamic Ordinary Least Incorporated with
Square (DOLS) dynamic effect (lead and
lag)
8. T>N Panel VECM I(1) Yes (JJ) Panel VECM (estimation Short run causality
using MLE) Long run relationship
9. T>N Panel VAR I(0) or I(1) No (JJ) Panel VAR Short run causality
10. T>N Heterogenous I(0) & I(1) Yes (Westerlund) Pooled Mean Group (PMG) Short run relationship
Dynamic Panel Mean Group (MG) Long run relationship
Cointegration
11. T>N Non-stationary Panel I(0) & I(1) Yes (Westerlund) Common Correlated Mean Short run relationship
with Cross-Sectional Group (CCMG) Long run relationship 39
Dependency
CORRELATION
Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y)
Cov(X , Y)
Corr(X , Y) 
StdDev(X)  StdDev(Y)
The correlation between two random variables is a dimensionless number

between 1 and -1.

INTERPRETATION
Correlation measures the strength of the linear relationship between two variables.
 Strength
 not the slope
 Linear
 misses nonlinearities completely
 Two
 shows only “shadows” of multidimensional relationships
A correlation of +1 would
arise only if all of the
points lined up perfectly.
Stretching the diagram horizontally or

vertically would change the perceived
slope, but not the correlation.

A positive correlation
signals that large values of
one variable are typically
associated with large
values of the other.
Correlation measures the

“tightness” of the clustering
about a single line.

A negative correlation
signals that large values of
one variable are typically
associated with small
values of the other.

Independent random
variables have a
correlation of 0.

But a correlation of 0
most certainly does not
imply independence.
Indeed, correlations can

completely miss
nonlinear relationships.

EXAMPLE: CUSTOMER SATISFACTION
 Consider overall customer satisfaction (on a 100-point scale) with a Web-based provider of customized software as the order leadtime
(in days), product acquisition cost, and availability of online order-tracking (0 = not available, 1 = available) vary.
 Here are the correlations: Correlations with Satisfaction

leadtime -0.766
ol-tracking -0.242
cost 0.097
 Customers forced to wait are unhappy.

 Those without access to online order tracking are more satisfied.
 Those who pay more are somewhat happier.
 ?????

THE FULL REGRESSION
Regression: satisfaction constant leadtime cost ol-track

Customers dislike high cost, coefficient 192.7338 -6.8856 -1.8025 8.5599
and like online order tracking. std error of coef 16.1643 0.5535 0.3137 4.0729
t-ratio 11.9234 -12.4391 -5.7453 2.1017
Why does customer significance 0.0000% 0.0000% 0.0000% 4.0092%
satisfaction vary? Primarily beta-weight -1.0879 -0.4571 0.1586
because leadtimes vary;
secondarily, because cost standard error of regression 13.9292
varies. coefficient of determination 75.03%
adjusted coef of determination 73.70%

RECONCILIATION
satisfaction leadtime cost ol-tracking

satisfaction 1.000 -0.766 -0.097 -0.242
leadtime -0.766 1.000 -0.543 0.465
cost -0.097 -0.543 1.000 -0.230
ol-tracking -0.242 0.465 -0.230 1.000
 Customers can pay extra for expedited service (shorter leadtime at moderate extra cost), or for express service
(shortest leadtime at highest cost)
 Those who chose to save money and wait longer ended up (slightly) regretting their choice.
 Most customers who chose rapid service weren’t given access to order tracking.
 They didn’t need it, and were still happy with their fast deliveries.

CAUSALITY
 Causality (also referred to as causation, or cause and effect) is influence by which one event, process, state,
or object (a cause) contributes to the production of another event, process, state, or object (an effect) where the
cause is partly responsible for the effect, and the effect is partly dependent on the cause.
 Unidirectional : ‘x’ being a cause of ‘y” but “y” is not being a cause of “x” (x  y) (y→ x)(exogenous)
 Bidirectional : ‘x’ being a cause of ‘y” and “y” is being a cause of “x” (x ↔ y) (endogenous)
/
 Neutral : ‘x’ is not being a cause of ‘y” and “y” is not being a cause of “x” (x ↔ y) (neutral)
/
 Causality can be used for time series (Granger causality based on VAR or VECM) and panel data (Granger
causality, Dumitrescu-Hurlin causality).

SOFTWARE
 SPSS
 Eviews
 STATA
 MATLAB
 GAUSS
 RATS
 Oxmetrix
57

REFERENCES
 Angrist, J. D. and Pischke, J. (2008). Mostly Harmless Econometrics:

An Empiricist’s Companion, Princeton University Press. (AP)
 Green, W. (2018). Econometric Analysis, 8th edition. Pearson. (G1)
 Hansen, B. E. (2021). Econometrics. Princeton University Press. (H1)
 Wooldridge, J. M. (2020). Introductory Econometrics: A Modern
Approach, 7th ed. Cengage. (W2)
 Wooldridge, J. M. (2010). Econometric Analysis of Cross Section and
Panel Data, 2nd ed. The MIT Press. (W1)

Lecture 1 - Introduction and Background

Uploaded by

Copyright:

Available Formats

Lecture 1 - Introduction and Background

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture 1 - Introduction and Background

Uploaded by

Copyright:

Available Formats

AA017004

INTRODUCTION AND BACKGROUND

DR. ELYA NABILA ABDUL BAHRI

AA017004 ECONOMETRIC THEORY AND METHODS 1

 Methods refer to the tools or techniques applied in the research process.

Source: Ethridge, Don., Published by Iowa 4

Source: Ethridge, Don., Published by Iowa 5

Source: Ethridge, Don., Published by Iowa 6

Source: Ethridge, Don., Published by Iowa 7

 The most complex or sophisticated methods and

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 Cross-section : many cross-section (i) (individuals, firms, countries)

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 Graph line : to identify the trend – deterministic, fluctuation, random

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

All test must have the hypothesis

AA017004 ECONOMETRIC THEORY AND METHODS

 The regression has five key assumptions:

AA017004 ECONOMETRIC THEORY AND METHODS 22

𝑦ො𝑖𝑡 = α +𝛽መ1 X1it + 𝛽መ2 X2it + 𝜀𝑖𝑡

𝑦ො𝑖𝑡 = α +0.5X1it - 0.3X2it + 𝜀𝑖𝑡

AA017004 ECONOMETRIC THEORY AND METHODS

 The investigate the relationship is based on the p-value or t-test.

𝑦ො𝑖𝑡 = α +0.5X1it - 0.3X2it + 𝜀𝑖𝑡

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 Heteroscedasticity – ARCH test, Breusch-Godfrey-Pagan test, White test

Other issues need to be considered:

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 Issue: Inflation rate increase faster than wage increase

AA017004 ECONOMETRIC THEORY AND METHODS

 It means the expected sign for IR coefficient is negative.

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 The relationship is based on the p-value or t-test.

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

 Ordinary Least Square (OLS)

AA017004 ECONOMETRIC THEORY AND METHODS

3. Scale Scale / Discrete Three-Stage Least Square Simultaneous equation

AA017004 ECONOMETRIC THEORY AND METHODS

 Ordinary Least Square (OLS)

AA017004 ECONOMETRIC THEORY AND METHODS

 Panel Ordinary Least Square (POLS)

AA017004 ECONOMETRIC THEORY AND METHODS

AA017004 ECONOMETRIC THEORY AND METHODS

No. Sample Model Order of Cointegration Methods Remark

Var(X+Y) = Var(X) + Var(Y) + 2·Cov(X,Y)

The correlation between two random variables is a dimensionless number

AA017004 ECONOMETRIC THEORY AND METHODS 40

Stretching the diagram horizontally or