Hypothesis Testing - The Scientists' Moral Imperative

Hypothesis Testing - the scientists'
moral imperative
• To tell whether our data supports or rejects our ideas,
we use statistical hypothesis testing. Immanuel Kant
• The problem is that we often get data that seem to

support our ideas. The literature is full of papers that
accept a pet idea uncritically. Statistical testing keeps
scientists honest.
• If you read a paper that suggests some alternative

hypothesis should be accepted, but there is no statistical
test, don't believe it.
If a topic is part of science, ideas have consequences that can be checked.
We can count and measure aspects of nature to check our ideas.
If our measurements are contrary to the

predictions of our model, our hypothesis
is FALSE, and we reject it

"It does not make any difference how beautiful your guess is. It does not make
any difference how smart you are, who made the guess, or what his name is--
if it disagrees with experiment it is wrong. That is all there is to it."
-Richard Feynman (The Character of Physical Law)
http://video.google.com/videoplay?docid=-5157969812375041230&hl=en

“If you haven't measured it
you don't know what
you
are talking about.”
-William Thompson, Lord Kelvin
The Mean
• The mean is one of the commonly used
statistics in science. It is often the "Expected
Value" i.e. the value we expect to get.
• The mean is found by totalling the values for
all observations (∑x) and dividing by the total
number of observations (n).
The formula for finding Mean = ∑x

the mean is: n
Standard Deviation
• Measures of the spread of data are the
standard deviation, and variance
Sample
variance
• For s.d. calculate the difference of each

observation and the mean, square it, add
all these up, divide by the sample size,
and take the square root.
•Refers to actual population
Sample Standard Deviation
• "It is rarely possible to obtain observations from
every item … in a population" Fowler et al. 1998 p36
• The estimate of is s
• Where N-1 is called the degrees of freedom, and is one

less than the number of observations
Knowing the Distribution
• In coin toss experiments, we know a formula for
calculating the probability of any number of k
heads in n trials, the Binomial Distribution.
• Fortunately, we don’t have to know the

distribution for every situation in nature.
• We are saved by the Central Limit Theorem

Central Limit Theorem
• “… the means of a large number of samples drawn
randomly from the same population are normally
distributed ….".Fowler et al. 1998
p 91
• So it “pays us” to use means, not raw data
Sample this, formula unknown Means distributed like this,

formula known
Normal Distribution
• So if we make many observations and use
averages as our data, we can draw valid
conclusions because we know their distribution
• Many test statistics

are available for this.
• In a few moments we will
learn Chi-Square (X2)
Expected Value
• The expected value is the average result we
expect.
• It is the product of the probability times the
number of observations E = p x n
• A really useful case is the where the probability
of all cases is equal.
• For example, in fair coin tosses, the probability
of Heads =1/2. If we flip a coin 24 times, we
EXPECT ½ x 24 = 12 Heads
• Equal probability cases are usually the basis of
the "Null Hypothesis"
Hypotheses
• A hypothesis is a statement of the
researcher’s idea or guess.
• To test a hypothesis the first thing we do is

write down a statement – called the null
hypothesis.
• The null hypothesis is often the opposite of

the researcher’s guess.
For Example
• Some null hypotheses may be:
– “there is no difference in lava viscosity
between Hawaiian and Cascades volcanoes”.
– “there is no relation between a volcanic

islands’ height and its age.”
– “there is no connection between the time

since subaerial exposure of sediment and the
height of the hills it forms”
The Hypotheses
• Null Hypothesis H0: ‘There is no difference
between the average number of streams per
square kilometer and the bedrock type.'
• Alternative Hypothesis HA or H1: ‘There is a

difference between the average number of
streams per square kilometer and the bedrock
type.'
Significance (1)
• Before carrying out any test we have to
decide on a significance level which lets
us determine at what point to reject the
null hypothesis and accept the alternative
hypothesis.
Significance (2)
• Significance is based on the probability of a
particular result.
• Statisticians have calculated the probability

of all possible ‘chance’ events occurring.
Significance (3)
• Many trials, and the use of a higher significance
level (P=.01 not P=.05) , make this less likely
• If the probability of a particular result is less than 1

in 20 (P=0.05), we say the result is significant, ie: the
result is not just a chance event.
• If the probability of a particular result is less than 1

in 100 (P=0.01), we say the result is highly
significant; again, the result is not just a chance
event.
Avoiding Decision Errors
• We always run the risk that we will observe a rare
event, and we will draw the wrong conclusion.
• Usually we want to avoid a Type I error, where we

reject H0 even though it is true.
• Many trials, and the use of a higher significance

level (P=.01 not P=.05) , make this less likely
• In a Type II error, we accept the null hypothesis even

though it is false.
• We will see an example of a type II error later.

Test Statistics
• We often want to see if two things are
different from each other.
• In science we calculate what we would

expect if there was no difference between
them (the usual null hypothesis).
• We can then compare this to what we

actually observe.
Test Statistics
• To check the null hypothesis we calculate a figure
known as a test statistic, which is based on data
from our samples.
• Different types of problems require different test

statistics. Values for comparison to our data have
all been put into statistical tables.
• All we need to do is to calculate our value and

compare it with the value in the table to get our
answer.
Using a Test Statistic
If the test statistic shows you “observed an
unlikely result”, you reject the null
hypothesis and accept the alternative
hypothesis
x2 Significance Tables
• The significance levels available on a x2
table are usually 0.05, 0.01, and .001,
which means there is, respectively, only a
1 in 20 (0.05), a 1 in 100 (0.01), or a 1 in
1000 (0.001), probability of the event
occurring by chance if that x2 is obtained.
• The values in the tables are called critical
values.
Chi-Square (X2) Critical Values In Use
• For Chi-Square (X2) :

If the value of the test statistic you have
calculated is greater than the value in the
table (the critical value) you decided to
use, you can reject the null hypothesis
and accept the alternative hypothesis.
H0: There is no relation between the number of peaks along a ridge and the time since exposure
df P = 0.05 P = 0.01 P = 0.001
1 3.84 6.64 10.83
2 5.99 9.21 13.82
3 7.82 11.35 16.27
Chi-Square (X2) 4 9.49 13.28 18.47
5 11.07 15.09 20.52

Critical Values 6 12.59 16.81 22.46
7 14.07 18.48 24.32
8 15.51 20.09 26.13

Significance table of X values.
2
This is the critical value table we 9 16.92 21.67 27.88

will use in the examples below. 10 18.31 23.21 29.59
11 19.68 24.73 31.26
12 21.03 26.22 32.91
13 22.36 27.69 34.53
14 23.69 29.14 36.12
15 25.00 30.58 37.70

It is prudent to use
Important means, not raw data,
to insure a normal
distribution
• The chi square test can only be used on
observations that have the following
characteristics: Objects being counted are independent**
The frequency data must have a
The data must be in the form precise numerical value and must be
of frequencies organised into categories or groups.
The expected frequency in any one cell

of the table must be greater than 5. *
The total number of observations must be

greater than 20.
* See the exception next slide **There are statistics designed to test this assumption
The expected frequency in any one cell
of the table must be greater than 5.
An Exception
• "The discrepancy is not large, however,
when X 2 is computed from contingency
tables with a fairly large number of cells
(more than 4, at a minimum) and only a
few theoretical frequencies are less than
5."
• Source: Spence, J. et al.(1968) Elementary Statistics

Other Statistics
• If any of the assumptions for X 2
are false, we cannot use X 2
• However, there are test statistics for most

situations, and they are all similar in their use.
• Once you know X 2 you can look up the

correct statistic and apply it
The x 2 formula
means take the sum

Worked Example 1:
• Step 1. Write down the NULL HYPOTHESIS
(H0) and ALTERNATIVE HYPOTHESES (Ha) and
set the LEVEL OF SIGNIFICANCE.
• H0 'A basaltic sand pile will not spread further than
a quartz sand pile in the same time'
• Ha ' A basaltic sand pile will spread further than a

quartz sand pile in the same time '
• We will set the level of significance at 0.05.

Step 2: Construct a table with the information you have
observed. Use averages as data
Method: 220 Hawaiian sand and 250 New Jersey sand
piles of 50 cc each are left out in the weather for 1 week
After 1 week, the distance of the furthest grain beyond the initial
perimeter is measured. Every 5 piles are averaged.
Furthest 1-5 6-10 11-15 16-20 21-25 Row

grain Total
(mm)
Quartz 9 13 10 10 8 50
Basaltic 4 3 5 9 21 42
Column 13 16 15 19 29 92
Total
Note that although there are 3 cells in the table that are not greater than 5, these
are observed frequencies. It is only the expected frequencies that have to be
greater than 5.
Work out the expected frequency.
Expected frequency = row total x column total

Grand total
Eg: expected frequency for oaks in PL1 = (50 x 13) / 92 = 7.07
Furthest 1-5 6-10 11-15 16-20 21-25 Row

grain Total
(mm)
Quartz 7.07
Basaltic
Column
Total
You do the rest
The Expected Frequencies
Furthest 1-5 6-10 11-15 16-20 21-25 Row

grain Total
(mm)
Quartz 7.07 8.70 8.15 10.33 15.76 50
Basaltic 5.93 7.30 6.85 8.67 13.24 42
Column 13 16 15 19 29 92
Total
For each of the cells calculate: (O – E)2
E
Eg: Basaltic in 1-5 mm is (9 – 7.07)2 / 7.07 = 0.53
Furthest 1-5 6-10 11-15 16-20 21-25 Row

grain Total
(mm)
Quartz 0.53
Basaltic
Column
Total
You do the rest

(O – E)2
These are the
E
Furthest 1-5 6-10 11-15 16-20 21-25

grain
(mm)
Quartz 0.53 2.13 0.42 0.01 3.82
Basaltic 0.63 2.54 0.50 0.01 4.55
But x = 2 (O – E) 2
So: Add up all of the above numbers to obtain

the value for chi square: x 2
= 15.14.
Now:
• Look up the X2 value on the table in the slide

above. This will tell you whether to accept
the null hypothesis or reject it.
The number of degrees of freedom to use is: the number
of rows in the table minus 1, multiplied by the number of
columns minus 1. This is (2-1) x (5-1) = 1 x 4 = 4 degrees
of freedom.
We find that our answer of 15.14 is greater than the

critical value of 9.49 (for 4 degrees of freedom and a
significance level of 0.05) and so we reject the null
hypothesis.
We conclude:
‘The distribution of grains spreading from sand
piles made of basaltic minerals versus quartz is
significantly different.’
Now you have to look for physical factors to
explain your findings
If you ask me, you should check the density.

Basaltic Hawaiian basaltic sands are mostly
pyroxenes
The density of pyroxene is 3.24 g/cm3.

The density of quartz is 2.536
The pyroxene has greater potential energy when hilled to the same height as
quartz sand

Hypothesis Testing - The Scientists' Moral Imperative

Uploaded by

Copyright:

Available Formats

Hypothesis Testing - The Scientists' Moral Imperative

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing - The Scientists' Moral Imperative

Uploaded by

Copyright:

Available Formats

Hypothesis Testing - the scientists'

• The problem is that we often get data that seem to

• If you read a paper that suggests some alternative

If our measurements are contrary to the

The formula for finding Mean = ∑x

• For s.d. calculate the difference of each

• The estimate of is s

• Where N-1 is called the degrees of freedom, and is one

• Fortunately, we don’t have to know the

• We are saved by the Central Limit Theorem

• So it “pays us” to use means, not raw data

Sample this, formula unknown Means distributed like this,

• Many test statistics

• To test a hypothesis the first thing we do is

• The null hypothesis is often the opposite of

– “there is no relation between a volcanic

– “there is no connection between the time

• Alternative Hypothesis HA or H1: ‘There is a

• Statisticians have calculated the probability

• If the probability of a particular result is less than 1

• If the probability of a particular result is less than 1

• Usually we want to avoid a Type I error, where we

• Many trials, and the use of a higher significance

• In a Type II error, we accept the null hypothesis even

• We will see an example of a type II error later.

• In science we calculate what we would

• We can then compare this to what we

• Different types of problems require different test

• All we need to do is to calculate our value and

• For Chi-Square (X2) :

2 5.99 9.21 13.82

3 7.82 11.35 16.27

Chi-Square (X2) 4 9.49 13.28 18.47

5 11.07 15.09 20.52

7 14.07 18.48 24.32

8 15.51 20.09 26.13

This is the critical value table we 9 16.92 21.67 27.88

11 19.68 24.73 31.26

12 21.03 26.22 32.91

13 22.36 27.69 34.53

14 23.69 29.14 36.12

15 25.00 30.58 37.70

The expected frequency in any one cell

The total number of observations must be

• Source: Spence, J. et al.(1968) Elementary Statistics

are false, we cannot use X 2

• However, there are test statistics for most

• Once you know X 2 you can look up the

means take the sum

• Ha ' A basaltic sand pile will spread further than a

• We will set the level of significance at 0.05.

Furthest 1-5 6-10 11-15 16-20 21-25 Row

Expected frequency = row total x column total

Eg: expected frequency for oaks in PL1 = (50 x 13) / 92 = 7.07

Furthest 1-5 6-10 11-15 16-20 21-25 Row

Furthest 1-5 6-10 11-15 16-20 21-25 Row

Eg: Basaltic in 1-5 mm is (9 – 7.07)2 / 7.07 = 0.53

Furthest 1-5 6-10 11-15 16-20 21-25 Row

You do the rest