Week9
Week9
Week9
Özgür Asar
ozgur.asar@sabanciuniv.edu
Weldon's dice
• Walter Frank Raphael Weldon (1860 - 1906), was an
English evolutionary biologist and a founder of
biometry. He was the joint founding editor of
Biometrika, with Francis Galton and Karl Pearson.
2
Labby's dice
3
Labby's dice (cont.)
• Labby did not actually observe the same phenomenon that
Weldon observed (higher frequency of 5s and 6s).
• Automation allowed Labby to collect more data than Weldon
did in 1894, instead of recording "successes" and "failures",
Labby recorded the individual number of pips on each dice.
4
Expected counts
• Labby rolled 12 dice 26,306 times. If each side is equally
likely to come up, how many 1s, 2s, ..., 6s would he
expect to have observed?
a) 1/6
b) 12/6
c) 26,306 / 6
d) 12 x 26,306 / 6
5
Expected counts
• Labby rolled 12 dice 26,306 times. If each side is equally
likely to come up, how many 1s, 2s, ..., 6s would he
expect to have observed?
• 1/6
• 12/6
• 26,306 / 6
• 12 x 26,306 / 6
6
Summarizing Labby's results
• The table below shows the observed and expected
counts from Labby's experiment.
7
Summarizing Labby's results
• The table below shows the observed and expected counts from
Labby's experiment.
Why are the expected counts the same for all outcomes but the
observed counts are different? At a first glance, does there appear to be
an inconsistency between the observed and expected counts?
8
Setting the hypotheses
• Do these data provide convincing evidence of an
inconsistency between the observed and expected counts?
9
Setting the hypotheses
• Do these data provide convincing evidence of an
inconsistency between the observed and expected counts?
10
Evaluating Hypothesis
• To evaluate these hypotheses, we quantify how different
the observed counts are from the expected counts.
• Large deviations from what would be expected based on
sampling variation (chance) alone provide strong
evidence for the alternative hypothesis.
• This is called a goodness of fit test since we're evaluating
how well the observed data fit the expected distribution.
11
Chi-square statistic
When dealing with counts and investigating how far the
observed counts are from the expected counts, we use a
new test statistic called the chi-square (χ2) statistic.
χ2 statistic
12
Chi-square statistic
When dealing with counts and investigating how far the
observed counts are from the expected counts, we use a
new test statistic called the chi-square (χ2) statistic.
χ2 statistic
13
Calculating the chi-square statistic
14
Calculating the chi-square statistic
15
Calculating the chi-square statistic
16
Why square?
• Squaring the difference between the observed and the
expected outcome does two things:
• Any standardized difference that is squared will now be
positive.
• Differences that already looked unusual will become much
larger after being squared.
17
The chi-square distribution
• In order to determine if the χ2 statistic we calculated is
considered unusually high or not we need to first describe
its distribution.
18
The chi-square distribution
19
The chi-square distribution
20
Finding areas under the chi-square curve
• p-value = tail area under the chi-square distribution (as usual)
21
Back to Labby's dice
• The research question was: Do these data provide convincing evidence
of an inconsistency between the observed and expected counts?
22
Degrees of freedom for a
goodness of fit test
• When conducting a goodness of fit test to evaluate how
well the observed data follow an expected distribution,
the degrees of freedom are calculated as the number of
cells (k) minus 1.
df = k – 1
df = 6 - 1 = 5
23
The p-value for a chi-square test is defined as
the tail area above the calculated test statistic.
The p-value for a chi-square test is defined as the tail area above the calculated test
statistic.
24
Conclusion of the hypothesis test
• We calculated a p-value less than 0.001. At 5%
significance level, what is the conclusion of the
hypothesis test?
a) Reject H0, the data provide convincing evidence that the dice are
fair.
b) Reject H0, the data provide convincing evidence that the dice are
biased.
c) Fail to reject H0, the data provide convincing evidence that the dice
are fair.
d) Fail to reject H0, the data provide convincing evidence that the dice
are biased.
25
Conclusion of the hypothesis test
• We calculated a p-value less than 0.001. At 5%
significance level, what is the conclusion of the
hypothesis test?
a) Reject H0, the data provide convincing evidence that the dice are
fair.
b) Reject H0, the data provide convincing evidence that the dice are
biased.
c) Fail to reject H0, the data provide convincing evidence that the dice
are fair.
d) Fail to reject H0, the data provide convincing evidence that the dice
are biased.
26
Recap: p-value for a chi-square test
27
Conditions for the chi-square test
1. Independence: Each case that contributes a count to
the table must be independent of all the other cases in
the table.
2. Sample size: Each particular scenario (i.e. cell) must
have at least 5 expected cases.
3. df > 1: Degrees of freedom must be greater than 1.
28
Popular kids
• In the dataset popular, students in grades 4-6 were asked whether
good grades, athletic ability, or popularity was most important to
them. A two-way table separating the students by grade and by
choice of most important factor is shown below. Do these data
provide evidence to suggest that goals vary by grade?
29
• The hypotheses are:
• H0: Grade and goals are independent. Goals do not vary by
grade.
• HA: Grade and goals are dependent. Goals vary by grade.
•The test statistic is calculated as
30
Expected counts in two-way tables
31
Expected counts in two-way tables
32
Expected counts in two-way tables
33
Calculating the test statistic in
two-way tables
Expected counts are shown in blue next to the observed counts.
34
Calculating the test statistic in
two-way tables
Expected counts are shown in blue next to the observed counts.
35
Calculating the test statistic in
two-way tables
Expected counts are shown in blue next to the observed counts.
36
Calculating the p-value
Which of the following is the correct p-value for this hypothesis test?
37
Conclusion
• Do these data provide evidence to suggest that goals
vary by grade?
• H0: Grade and goals are independent.
Goals do not vary by grade.
• HA: Grade and goals are dependent.
Goals vary by grade.
38
Conclusion
• Do these data provide evidence to suggest that goals
vary by grade?
• H0: Grade and goals are independent.
Goals do not vary by grade.
• HA: Grade and goals are dependent.
Goals vary by grade.
39