Influence and Outliers
Influence and Outliers
Influence and Outliers
Annalivia Polselli
February 9, 2023
Contex
▶ I present my method to
▶ Visually detect and identify the type of anomalous unit
▶ Understand how these affect the LS estimates
Plots DGP
e′it β + u
yeit = x eit
▶ LS Residuals: u e′it β
bit = yeit − x b
▶ Average normalised residual squared
!2
T
1 X u
bit
b∗i =
u pP
T t=1 i b2it
u
xtlvr2plot – Leverage versus normalised residual squared plot for panel data.
options
Generated variables
lev average individual leverage
normres2 average individual residual squared
xtlvr2plot y x, ///
mlabel(id) ///
xlabel(, format(%9.3fc)) ///
ylabel(, angle(h) format(%9.3fc)) ///
title("Unit-wise Evaluation", size(medsmall)) ///
saving("xtlvr2plot_example.gph", replace)
20
0.010 50
GL units BL units
Leverage
10
0.005 40
61
21
58
9
9096
54
77
62
29
19
84
26
33
81
68
36
83
5
91
1
27
43
37
16
80
93
63
34
87
7
74
66
46
65
67
44
54
57
92
9899
98
9714
31
69
41
11
48
1
12
452
64
24
23
22
856
828
747
35
18
32
2
739
53
42
72
88
49
100
5655
8
8270
76
23
86
15
94
1373
38
99
75
51
1 8
71
90
59
78
17
89
25
79
95
45 7 VO units 60 30
0.000
0.000 0.002 0.004 0.006
Normalised residuals squared
50 20
.08
50 50
50 50 0.010 50
Leverage
Leverage
.07
20
20 40
20
20 40 10
20 20
20 40 0.005 40
2020 20
20 40
20 20 40
.06 10
20 20
20
20 10 10
5020 10
20
50
50 1010 10 10
5010 10 10 10
5040 40 10
10
50
10 10
50 5040
10 10 10 83
50
5
34
96
8
875040
32506 10 96
61
21
58
905
1
546
8
91
47
35
18
27
7753
39
43
62
37
16
80
93
32
29
63
34
87
7
74
66
56
9 42
28
46
72
65
67
88
44
49
76
23
86
15
94
84
98
26
97 71
100
55
82
1970
57
92
59
13
73
38
14
31
69
41
11
3399
80
40
50
57
18
29
59
39
8 6
37
61
16
13
67
96
40
28
83
1 73
8340
3140
42
40
27 40 48
81
4
68
36 75
51
12
52
24
3
2
4578
17
89
64
25
22
85
79
952 60 30
35
54
44
36
5
56
12
63
64
46
6
89
2
9
28
58
65
83
24
79
16
84
71
22
5
55
94
70
88
33
8
63
56
46
71
4447
41
29
4340
23
99
86
62
46
7
77
94
62
51
98
97
87
72
35
13 70
53
17
42
65
65
19
37
91
94
761
70
75
74
86 1
86
47
88
66
26
47
18
4 91
93
91
32
7440
85
100
5325
78
93
88
72
35
53
56
66
80
96 393040 30 3060
1
4
57
49
100
90
77
54
11
41
7
21
15
69
54
16
42
96
38
82
992
93
61
59
80
32
19
73
61
98
62
6
84
37
17
52
51
65
94
58
27
42 14
34
23
11
65
1467
82
1
49
75
61
77
39
84
38
43
87
99
44
57
88
47
56
18
39
85
89
81
96
62
99
87
26
55
35
74
76
97
15
83
21
4
47
2
29
71
94
72 77
84
83
92
4872
39
43
14
76
40
38
536
95
3
19
87
19
100
75
52
25
9
3455
98
91
45
90
66 62
27 37
55
43
16
47 60
.05 77
29
61
1
8
63
9
83
3
89
3
6
14
7
5 42
51
67
100
48
82
21
23
78
554
79
59
955
31
69
4
8
58
11
44
47
80
13
100
9
92
19
43
52
58
73
3
68
48
23
63
5
41
63
41
77
19
18
51
4
74
35
32
94
8
54
39
14
17
69
76
53
28
98
88
225
73
11
75
86
15
57
39
87
22
90 35
74
46
90
8263
27
26
59
857
3
51
52
17
98
70
53
75
42
15
12
44
5536
98
36
68
67
54
49
87
6775
12
5415
15
6
46
12
34
55
2746
6
76
94
9696
89
57
64
47
43
53
56
91
29
90
26
66
81
90
167
70
18
48 53
77
27
28
424
91
73
47
66
97
2871
2
83
92
7182
60
100
79
32
85
43
23
67
56
33
76
88
81
38
78 92
31
18
66
23
69
69
3
25
8
39
3
31
7878
38
82
92
62
37 1
12
38
27
72
22
57
51
82
27
83
8888
86
2323
34
46
65
12
62
1
95
82
13
11
23
74
93
8 39
53
88
22
64
93
8135
3839
31
8484
31
100
66
22
95
26
7575
81
18
278
33
96
42
9080
52
24
17
9559
63
2
68
21
82
3333
16
9999
4848
16
19
47
213
83
97
34
18
60
63
99
7
97
52 78
63
36
99
42
98
29
70
89
84
82
37
95 8
73
4
4747
13
91
72
45
45
7
28
49
67
1
31
21
2
49
71
8
78
19
72
34
25
8626
7
4
65
57
87
27
48
689
58
59
37
33
82
46
54
43
44
14
42
3
31
79
81
98
82
44
85
22
61
78
5
14
73 2
24
59
85
83
61
43
93
41
55
44
8
91
60
44
92
13
100
73
44
32
92
91
39
86
61
25
38
87
16
65
17
19
33
58
9
78
7272
37
33
38
59
52 1
64
35
4
69
6969
29
44
36
8
96
26
96
70
662
64
70
36
19
77
59
94
13
97
98
63
46
99
64
56
45
99
32
36
9
66
7453
86
99
24
9
77
67
26
68
93
33
85
17
71
4
99
31
16
84
80
24
94
93
72
80
94
28
70
13
43
66
88
95
55
79
60
31
29
57
76
85
18
2 1
9
99
93
17
32
88
56
36
58
49
3535
51
5252
1
89
74
48
96
42
35
41
21
73
4
701
47
80
11
14
47
21
9
45
68
32
84 6
73
18
95
5
41
56
51
2
74
88
42
24
51
89
41
3775
48
3
49
62
90
48
15
64
4
559
53
16
54
87
86
100
51
92
57
81
35
62
4924
88
28
66
23
97
41
23
3
15
11
65
52
57 7
94
78
35
17
45
63
2
25
49
96
99
57 045
9
46
54
60
7037
28
28
76
58
65
6
3
17
93
8
11
21
38
56 4
71
97
94
78
44
28
79
70
86
70
2
100
77
80
2914 13
45
59
42
12
34
13
87
8787
24 80
83
895
71
71
95
77
8
78
24
85
53
75
79
67
23
69
76
61
15
9
54
2222
54
65
100
56
4
69
39
3272
5775
2
38
2222
1212
79
7373
71
78 5
72
7979
73
76
5 6
62
60
74
3
51
67
46
77
75
57
83
637
94
60
2930
60 30
306060
60
3030 30 30
60
3030
60 30 30 30
30 30 3060
30 60 60 0.000
0 .005 .01 .015 .02 0.000 0.002 0.004 0.006
Normalized residual squared Normalised residuals squared
▶ Conditional Effect
▶ Mi(j) = Ci(j) (β)/C
b ii (β)
b
▶ Conditional Effect
▶ Mi(j) = Ci(j) (β)/C
b ii (β)
b
▶ Conditional Effect
▶ Mi(j) = Ci(j) (β)/C
b ii (β)
b
▶ Conditional Effect
▶ Mi(j) = Ci(j) (β)/C
b ii (β)
b
options
** Heat plot
xtinfluence y x, figure(heat) ///
keylabels(all) color(RdBu, reverse) ///
xlabel(5(10)100, angle(h) labsize(small)) ///
xmtick(##10) xmlabel(##2, angle(h)) ///
ylabel(5(10)100, angle(h)) ///
ymtick(##10) ymlabel(##2, angle(h)) ///
saving("xtinfluence_heat")
** Scatter plot
xtinfluence y x, figure(scatter) ///
xlabel(5(10)100, angle(h) labsize(small)) ///
xmtick(##10) xmlabel(##2, angle(h)) ///
ylabel(5(10)100, angle(h)) ///
ymtick(##10) ymlabel(##2, angle(h)) ///
saving("xtinfluence_scatter")
Unit j
8.4583 2.8e+10
50 50
7.1571 2.3e+10
45 5.8558
45 1.9e+10
40 40
4.5545 1.5e+10
35 3.2533
35 1.1e+10
30 30
1.952 6.4e+09
25 .65079
25 2.1e+09
20 20
15 15
10 10
5 5
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
5 15 25 35 45 55 65 75 85 95 5 15 25 35 45 55 65 75 85 95
Unit i Unit i
50
.05964 Unit j 50
1349.6
.05047 1142
45 .04129
45 934.33
40 40
.03212 726.7
35 .02294
35 519.07
30 30
.01376 311.45
25 .00459
25 103.82
20 20
15 15
10 10
5 5
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
5 15 25 35 45 55 65 75 85 95 5 15 25 35 45 55 65 75 85 95
Unit i Unit i
Unit j
50 50
45 45
40 40
35 35
30 30
25 25
20 20
15 15
10 10
5 5
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
5 15 25 35 45 55 65 75 85 95 5 15 25 35 45 55 65 75 85 95
Unit i Unit i
50
Unit j 50
45 45
40 40
35 35
30 30
25 25
20 20
15 15
10 10
5 5
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
5 15 25 35 45 55 65 75 85 95 5 15 25 35 45 55 65 75 85 95
Unit i Unit i
3. Units detected in (1), (2.1) and (2.3) are anomalous; (2.2) and (2.4) explain
how they affect the influence of other units and, hence, LS estimates
B annalivia.polselli[at]essex.ac.uk
https://github.com/POLSEAN
References I
Aquaro, M. and Čı́žek, P. (2013). One-step robust estimation of fixed-effects panel data
models. Computational Statistics & Data Analysis, 57(1):536–548.
Aquaro, M. and Čı́žek, P. (2014). Robust estimation of dynamic fixed-effects panel data
models. Statistical Papers, 55(1):169–186.
Atkinson, A. and Mulira, H.-M. (1993). The stalactite plot for the detection of multi-
variate outliers. Statistics and Computing, 3(1):27–35.
Banerjee, M. and Frees, E. W. (1997). Influence diagnostics for linear longitudinal
models. Journal of the American Statistical Association, 92(439):999–1005.
Belotti, F. and Peracchi, F. (2020). Fast leave-one-out methods for inference, model
selection, and diagnostic checking. The Stata Journal, 20(4):785–804.
Bramati, M. C. and Croux, C. (2007). Robust estimators for the fixed effects panel data
model. The econometrics journal, 10(3):521–540.
Chatterjee, S. and Hadi, A. S. (1988). Impact of simultaneous omission of a variable
and an observation on a linear regression equation. Computational Statistics & Data
Analysis, 6(2):129–144.
Davidson, R., MacKinnon, J. G., et al. (1993). Estimation and inference in econometrics.
OUP Catalogue.
Donald, S. G. and Maddala, G. (1993). 24 identifying outliers and influential observa-
tions in econometric models. In Econometrics, volume 11 of Handbook of Statistics,
pages 663 – 701. Elsevier.
Jiao, X. (2022). A simple robust procedure in instrumental variables regression.
References II
Lawrance, A. (1995). Deletion influence and masking in regression. Journal of the Royal
Statistical Society: Series B (Methodological), 57(1):181–189.
MacKinnon, J. G. (2013). Thirty years of heteroskedasticity-robust inference. In Recent
advances and future directions in causality, prediction, and specification analysis,
pages 437–461. Springer.
MacKinnon, J. G. and White, H. (1985). Some heteroskedasticity-consistent covariance
matrix estimators with improved finite sample properties. Journal of econometrics,
29(3):305–325.
Polselli, A. (2022). Essays on Econometric Methods. PhD thesis, University of Essex.
Rousseeuw, P. J. and Van Zomeren, B. C. (1990). Unmasking multivariate outliers and
leverage points. Journal of the American Statistical association, 85(411):633–639.
Silva, J. S. (2001). Influence diagnostics and estimation algorithms for powell’s scls.
Journal of Business & Economic Statistics, 19(1):55–62.
Verardi, V. and Croux, C. (2009). Robust regression in stata. The Stata Journal,
9(3):439–453.
Vertical outliers Back
80
30 30
30
60 30
30 30
30 30
30 30
Dep. variable
40
20 30 30
30 30
30 30 30
30
30
30
0
-4 -2 0 2 4
Indep. variable
30
20
20
20
20
20 20
Dep. variable
20 20
20
20
20
20 20
10 20
20 20
20 20
20
2020
0
-5 0 5 10 15
Indep. variable
80
10
10 10 10
10
10
60 10
1010
10
Dep. variable
40
20 10 1010
1010
10
1010
10
10
0
-5 0 5 10 15
Indep. variable
Dep. variable
10
30 10
30 10
10
10
30
20 30 10
30 30
30 20
20 20
20 20
20 3030 20
10 20
10 1010
30
30 20
10 20 20
3020
3020 30 20
20 20 20 20 20
20
20 30 1020 20 20
20 20 20
30 10
2010 20 20 20 20 20 20
20 20
20
20
10 20 20
3010 20
0 0
3030
10 10 30
30
1010
10 30 30 30
30
30
-20 -20 10 10
10
10
10 30
-5 0 5 10 15 -10 -5 0 5 10
Indep. variable Indep. variable
If i ̸= j, ′
e ′X b(i,j) (s2 K)−1
Cij (β) b−β
b = β b(i,j) X e β b−β
where
e −1 X
e ′X e ′ Mj −H′ M−1 Hij −1 H′ M−1 u
e ′ M−1 Hij +X
β b(i) − X
b(i,j) = β
i i j ij i ij i b i +b
uj
If i = j, ′
e ′X b(i) (s2 K)−1
Cii (β) b−β
b = β b(i) X e βb−β
e −1 X
e ′X e ′ M−1 u
where β b− X
b(i) = β
i i bi .
This is Banerjee and Frees (1997) metrics as defined by Belotti and Peracchi (2020) for
linear panel data models with fixed effects.
Both measures are distributed as F(ν1 , ν2 ); a distributional cutoff can be chosen.
Conditional Influence Back
N
!
′ X
e ′i(j) X b(j) (s2 K)−1
Ci(j) (β) b(i,j) − β
b = β b(j) X e i(j) b(i,j) − β
β
i=1
i̸=j
▶ Ci(j) (β)
b = 0 for i = j
▶ Ci(j) (β)
b ̸= Cj(i) (β)
b
▶ Ci(j) (β)
b ≈ F (ν1 , ν2 ) from which a distributional cutoff can be
chosen
Data generating process Back