Lecture w6 2 Hypothesis Testing 1 PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 29

ECONOMETRICS & DATA ANALYSIS

ECON 131
Week 6, Lecture 2

Hypothesis Testing with Categorical Data


October 7, 2015

1
2

Hypothesis Testing so far


•  One sample test: µ = µ0
•  Two sample test: µ1 = µ2
3

Hypothesis Testing so far


•  One sample test: µ = µ0
•  Two sample test: µ1 = µ2

•  Large samples and small samples

•  Independent samples and paired samples

•  One-sided tests and two-sided tests


4

Hypothesis Testing so far


•  One sample test: µ = µ0
•  Two sample test: µ1 = µ2

•  Large samples and small samples

•  Independent samples and paired samples

•  One-sided tests and two-sided tests

Variables have always been continuous.


5

Today’s Agenda

1.  Z-tests of proportional outcomes


•  One sample (H0: µ=µ0)
•  Two samples (H0: µ1=µ2)

2.  Chi-square tests


•  Are distributions of an outcome the same across
multiple subgroups?

3.  Fisher’s exact test


•  Same as Chi-square test, but doesn’t rely on CLT
Which  of  the  following  determinants  of  poli4cal  
par4cipa4on  do  you  find  most  interes4ng?  
A.  Income  
B.  Educa4on  
C.  Race  
D.  Home  ownership  
E.  Religiosity  
F.  Poli4cal  views  (liberal/
conserva4ve)  
G.  Age  
H.  Sex  
I.  Marital  Status  
7

Large Sample Tests of Means


One sample Two independent samples Two paired
samples
H0 µ = µ0 µ1 = µ2 µ1 = µ2

Test statistic
x−µ s=
1
( (n1 −1)s12 + (n2 −1)s22 ) 1
z = o (n1 + n2 − 2) sd = ∑ (di − d )2
s n −1
n x1 − x2
z= d
1 1 z=
s + sd
n1 n2 n
Distribution Z ~ N(0,1) Z ~ N(0,1) Z ~ N(0,1)
of test
statistic

Key assumption: Sample size is large (at least >30).


8

Large Sample Tests of Means


One sample

H0 µ = µ0

Test statistic
x−µ
z = o
s
n

Distribution Z ~ N(0,1)
of test
statistic
9

Large sample tests for binary outcome


One sample
Xi is a Bernoulli random variable.

H0 µ = µ0 x is an estimate of p.

Under H0:
Test statistic
x−µ •  E[Xi] = p = µ0
z = o
s
n •  Var[Xi] = p(1-p)

•  SD[Xi] = p(1− p)
= µ 0 (1− µ 0 )
Distribution Z ~ N(0,1)
of test We can use this instead of s when
statistic computing our test statistic.
10

Large sample tests for binary outcome


One sample

H0 µ = µ0

Test statistic

x−µ
z = o
µo (1−µo )
n

Distribution Z ~ N(0,1)
of test
statistic
11

Testing a mean in one sample in Stata

. ttest vote08==.73 if polviews==4

One-sample t test
------------------------------------------------------------------------------
Variable | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
vote08 | 639 .6525822 .018851 .4765229 .6155647 .6895996
------------------------------------------------------------------------------
mean = mean(vote08) t = -4.1068
Ho: mean = .73 degrees of freedom = 638

Ha: mean < .73 Ha: mean != .73 Ha: mean > .73
Pr(T < t) = 0.0000 Pr(|T| > |t|) = 0.0000 Pr(T > t) = 1.0000
12

Testing one sample in Stata when


sample mean is a proportion
. prtest vote08==.73 if polviews==4

One-sample test of proportion vote08: Number of obs = 639


------------------------------------------------------------------------------
Variable | Mean Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
vote08 | .6525822 .0188362 .6156639 .6895004
------------------------------------------------------------------------------
p = proportion(vote08) z = -4.4081
Ho: p = 0.73

Ha: p < 0.73 Ha: p != 0.73 Ha: p > 0.73


Pr(Z < z) = 0.0000 Pr(|Z| > |z|) = 0.0000 Pr(Z > z) = 1.0000
13

Large Sample Tests of Means


Two independent samples

H0 µ1 = µ2

Test statistic
1
s=
(n1 + n2 − 2)
( (n1 −1)s12 + (n2 −1)s22 )

x1 − x2
z=
1 1
s +
n1 n2

Distribution Z ~ N(0,1)
of test
statistic
14

Large sample tests for binary outcomes


Two independent samples

H0 µ1 = µ2

Test statistic
1 We can do better
s=
(n1 + n2 − 2)
( (n1 −1)s12 + (n2 −1)s22 )

x1 − x2
z=
1 1
s +
n1 n2

Distribution Z ~ N(0,1)
of test
statistic
15

Large sample tests for binary outcomes


Two independent samples

H0 µ1 = µ2

Test statistic
x1 + x2
s = x(1− x ) x=
n1 + n2

x1 − x2
z=
1 1
s +
n1 n2

Distribution Z ~ N(0,1)
of test
statistic

x1 is the number of 1’s in sample 1


x2 is the number of 1’s in sample 2
16

Testing means of two samples in Stata

. ttest vote08 if polviews==2 | polviews==6 ,by(polviews)

Two-sample t test with equal variances


------------------------------------------------------------------------------
Group | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
liberal | 226 .7743363 .0278679 .4189466 .7194208 .8292518
conserva | 271 .8302583 .0228465 .3761003 .7852784 .8752382
---------+--------------------------------------------------------------------
combined | 497 .804829 .0177958 .3967316 .7698644 .8397935
---------+--------------------------------------------------------------------
diff | -.055922 .0356862 -.126037 .014193
------------------------------------------------------------------------------
diff = mean(liberal) - mean(conserva) t = -1.5671
Ho: diff = 0 degrees of freedom = 495

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(T < t) = 0.0589 Pr(|T| > |t|) = 0.1177 Pr(T > t) = 0.9411
17

Testing two samples in Stata when


sample means are proportions
. prtest vote08 if polviews==2 | polviews==6 ,by(polviews)

Two-sample test of proportions liberal: Number of obs = 226


conservative: Number of obs = 271
------------------------------------------------------------------------------
Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
liberal | .7743363 .0278062 .7198372 .8288354
conservative | .8302583 .0228043 .7855627 .8749539
-------------+----------------------------------------------------------------
diff | -.055922 .0359614 -.126405 .0145609
| under Ho: .0357025 -1.57 0.117
------------------------------------------------------------------------------
diff = prop(liberal) - prop(conservative) z = -1.5663
Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(Z < z) = 0.0586 Pr(|Z| < |z|) = 0.1173 Pr(Z > z) = 0.9414
18

Contingency tables
Suppose we want to know if the voting behavior in the population
could be the same across liberal/conservative spectrum.

did r vote |
in 2008 | think of self as liberal or conservative
election | extremely liberal slightly moderate slghtly c conservat extrmly c | Total
-------------+-----------------------------------------------------------------------------+----------
0 | 20 51 32 222 50 46 16 | 437
| 4.58 11.67 7.32 50.80 11.44 10.53 3.66 | 100.00
-------------+-----------------------------------------------------------------------------+----------
1 | 56 175 150 417 196 225 48 | 1,267
| 4.42 13.81 11.84 32.91 15.47 17.76 3.79 | 100.00
-------------+-----------------------------------------------------------------------------+----------
Total | 76 226 182 639 246 271 64 | 1,704
| 4.46 13.26 10.68 37.50 14.44 15.90 3.76 | 100.00

•  Is the distribution of political views significantly different for


voters and non-voters?
19

Contingency tables
Suppose we want to know if the voting behavior in the population
could be the same across liberal/conservative spectrum.

did r vote |
in 2008 | think of self as liberal or conservative
election | extremely liberal slightly moderate slghtly c conservat extrmly c | Total
-------------+-----------------------------------------------------------------------------+----------
0 | 20 51 32 222 50 46 16 | 437
| 4.58 11.67 7.32 50.80 11.44 10.53 3.66 | 100.00
-------------+-----------------------------------------------------------------------------+----------
1 | 56 175 150 417 196 225 48 | 1,267
| 4.42 13.81 11.84 32.91 15.47 17.76 3.79 | 100.00
-------------+-----------------------------------------------------------------------------+----------
Total | 76 226 182 639 246 271 64 | 1,704
| 4.46 13.26 10.68 37.50 14.44 15.90 3.76 | 100.00

•  Is the distribution of political views significantly different for


voters and non-voters?
•  Equivalently: Is the distribution of voting behavior the same for
each category of political views?
20

Contingency tables Our data looks


did r vote | like this:
in 2008 | think of self as liberal or conservative
election | extremely liberal slightly moderate slghtly c conservat extrmly c | Total
-------------+-----------------------------------------------------------------------------+----------
0 | 20 51 32 222 50 46 16 | 437
| 4.58 11.67 7.32 50.80 11.44 10.53 3.66 | 100.00
-------------+-----------------------------------------------------------------------------+----------
1 | 56 175 150 417 196 225 48 | 1,267
| 4.42 13.81 11.84 32.91 15.47 17.76 3.79 | 100.00
-------------+-----------------------------------------------------------------------------+----------
Total | 76 226 182 639 246 271 64 | 1,704
| 4.46 13.26 10.68 37.50 14.44 15.90 3.76 | 100.00

Under H0 that the distributions in the populations are identical, we would expect
something more like this:
did r vote |
in 2008 | think of self as liberal or conservative
election | extremely liberal slightly moderate slghtly c conservat extrmly c | Total
-------------+-----------------------------------------------------------------------------+----------
0 | 19.5 57.9 46.7 163.9 63.1 69.5 16.4 | 437
| |
-------------+-----------------------------------------------------------------------------+----------
1 | 56.5 168.0 135.3 475.1 183.0 201.5 47.6 | 1,267
| |
-------------+-----------------------------------------------------------------------------+----------
Total | 76 226 182 639 246 271 64 | 1,704
| 4.46 13.26 10.68 37.50 14.44 15.90 3.76 | 100.00

1267 * 0.1068 = 135.3


21

Chi-square test
Define the following test statistic:
2 2 2
(O
2 − E
Χ = 11 11 ) (O
+ 12 − E12 ) (O
+... + RC − E RC )
E11 E12 ERC

Under H0 that the distributions are all the same, this


statistic is approximately distributed as a Chi-square
random variable with (R-1)×(C-1) degrees of freedom
2 2 2
2
Χ = (20 −19.5) + (51− 57.9) +... + (48 − 47.6) = 49.4
19.5 57.9 47.6

p-value = Pr(Χ 26 > 49.4) = 6.2 ×10 −9


22

The Chi-square distribution

Fun fact: A Chi-square random variable with k degrees of freedom


is the sum of k squared independent standard normal
random variables.
23

Chi-square test for a 2x2 table


did r vote |

in 2008 | respondents sex Real Data


election | male female | Total
-------------+----------------------+----------
0 | 234 251 | 485
1.  Fill in the expected counts
| 48.25 51.75 | 100.00 under the H0 that the sex
-------------+----------------------+---------- distribution is the same
1 | 564 740 | 1,304
| 43.25 56.75 | 100.00
among voters and non-
-------------+----------------------+---------- voters.
Total | 798 991 | 1,789
2.  Compute the test statistic.
| 44.61 55.39 | 100.00
3.  What is its distribution?
did r vote |

in 2008 | respondents sex Expected


election | male female | Total
-------------+----------------------+----------
0 | | 485
| |
-------------+----------------------+----------
1 | | 1,304
| | 2 2
-------------+----------------------+---------- 2
Χ = (O11 − E11 ) + (O12 − E12 )
Total | 798 991 | 1,789 E11 E12
| 44.61 55.39 | 100.00 2
+... + (ORC − E RC )
ERC
24

Chi-square test for a 2x2 table


did r vote |

in 2008 | respondents sex Real Data


election | male female | Total
-------------+----------------------+----------
0 | 234 251 | 485
1.  Fill in the expected counts
| 48.25 51.75 | 100.00 under the H0 that the sex
-------------+----------------------+---------- distribution is the same
1 | 564 740 | 1,304
| 43.25 56.75 | 100.00
among voters and non-
-------------+----------------------+---------- voters.
Total | 798 991 | 1,789
2.  Compute the test statistic.
| 44.61 55.39 | 100.00
3.  What is its distribution?
did r vote |

in 2008 | respondents sex Expected


election | male female | Total
-------------+----------------------+----------
0 | 216.4 268.6 | 485
| | Χ 2 = 3.57
-------------+----------------------+----------
1 | 581.7 722.3 | 1,304
| |
-------------+----------------------+----------
Total | 798 991 | 1,789 It’s a X2 with 1 degree of freedom.
| 44.61 55.39 | 100.00
p-value = Pr(Χ12 > 3.57) = 0.059
25

Chi-square test in Stata


. tab vote08 sex ,row chi2

+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+

did r vote |
in 2008 | respondents sex
election | male female | Total
-------------+----------------------+----------
0 | 234 251 | 485
| 48.25 51.75 | 100.00
-------------+----------------------+----------
1 | 564 740 | 1,304
| 43.25 56.75 | 100.00
-------------+----------------------+----------
Total | 798 991 | 1,789
| 44.61 55.39 | 100.00

Pearson chi2(1) = 3.5709 Pr = 0.059


26

Chi-square test in Stata


. tab vote08 sex ,row chi2

+----------------+ This is the same as


| Key |
|----------------| testing if the proportion
| frequency | that vote is the same for
| row percentage |
+----------------+
males and females.

did r vote |
in 2008 | respondents sex
election | male female | Total
-------------+----------------------+----------
0 | 234 251 | 485
| 48.25 51.75 | 100.00
-------------+----------------------+----------
1 | 564 740 | 1,304
| 43.25 56.75 | 100.00
-------------+----------------------+----------
Total | 798 991 | 1,789
| 44.61 55.39 | 100.00

Pearson chi2(1) = 3.5709 Pr = 0.059


27

The equivalent proportion test


. prtest vote08 ,by(sex)

Two-sample test of proportions male: Number of obs = 798


female: Number of obs = 991
------------------------------------------------------------------------------
Variable | Mean Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
male | .7067669 .0161155 .6751812 .7383527
female | .7467205 .0138147 .7196441 .7737968
-------------+----------------------------------------------------------------
diff | -.0399536 .0212263 -.0815563 .0016492
| under Ho: .021143 -1.89 0.059
------------------------------------------------------------------------------
diff = prop(male) - prop(female) z = -1.8897
Ho: diff = 0

Ha: diff < 0 Ha: diff != 0 Ha: diff > 0


Pr(Z < z) = 0.0294 Pr(|Z| < |z|) = 0.0588 Pr(Z > z) = 0.9706
28

Fisher’s exact test


•  Rule of thumb: If any of the cells has a count less than or equal
to five, you can’t trust the approximation that the distribution of
the statistic is Chi-squared.

•  Key insight: The cells have counts that are drawn from
multinomial distributions. In the 2x2, they are draws from
binomial random variables.

•  It’s computationally intensive, but often not too hard to compute


the exact probability that you will see a table as extreme or
more extreme than the one you see, assuming the null that the
distributions are equivalent.

•  Fisher's exact test is great for 2x2 but as size of table grows, it
gets infeasible quickly—e.g., 5x5 not feasible
29

Fisher’s exact test in Stata


. tab vote08 sex ,row exact

+----------------+
| Key |
|----------------|
| frequency |
| row percentage |
+----------------+

did r vote |
in 2008 | respondents sex
election | male female | Total
-------------+----------------------+----------
0 | 234 251 | 485
| 48.25 51.75 | 100.00
-------------+----------------------+----------
1 | 564 740 | 1,304
| 43.25 56.75 | 100.00
-------------+----------------------+----------
Total | 798 991 | 1,789
| 44.61 55.39 | 100.00

Fisher's exact = 0.061


1-sided Fisher's exact = 0.033

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy