Research Methods Lesson 15 - Test For Significance
Research Methods Lesson 15 - Test For Significance
Research Methods Lesson 15 - Test For Significance
Tests for statistical significance are used to address the question: what is
the probability that what we think is a relationship between two variables is
really just a chance occurrence?
If we selected many samples from the same population, would we still
find the same relationship between these two variables in every sample? If we
could do a census of the population, would we also find that this relationship
exists in the population from which the sample was drawn? Or is our finding
due only to random chance?
Tests for statistical significance tell us what the probability is that the
relationship we think we have found is due only to random chance. They tell us
what the probability is that we would be making an error if we assume that we
have found that a relationship exists.
But using probability theory and the normal curve, we can estimate the
probability of being wrong, if we assume that our finding a relationship is true.
If the probability of being wrong is small, then we say that our observation of
the relationship is a statistically significant finding.
Statistical significance means that there is a good chance that we are
right in finding that a relationship exists between two variables. But statistical
significance is not the same as practical significance. We can have a
statistically significant finding, but the implications of that finding may have no
practical application. The researcher must always examine both the statistical
and the practical significance of any research finding.
Often times, when differences are small but statistically significant, it is
due to a very large sample size; in a sample of a smaller size, the differences
would not be enough to be statistically significant.
Longer training programs will place the same number or fewer trainees
into jobs as shorter programs.
Female graduate assistants are paid at least 75% or more of what male
graduate assistants are paid.
Even in the best research project, there is always a possibility (hopefully
a small one) that the researcher will make a mistake regarding the relationship
between the two variables. There are two possible mistakes or errors.
The first is called a Type I error. This occurs when the researcher
assumes that a relationship exists when in fact the evidence is that it does not.
In a Type I error, the researcher should accept the null hypothesis and reject
the research hypothesis, but the opposite occurs. The probability of committing
a Type I error is called alpha.
The second is called a Type II error. This occurs when the researcher
assumes that a relationship does not exist when in fact the evidence is that it
does. In a Type II error, the researcher should reject the null hypothesis and
accept the research hypothesis, but the opposite occurs. The probability of
committing a Type II error is called beta.
If a Type I error is committed, then the new drug is assumed to be better
when it really is not (the null hypothesis should be accepted, but it is rejected).
People may be treated with the new drug, when they would have been better off
with the old one.
For nominal and ordinal data, Chi Square is used as a test for statistical
significance. For example, we hypothesize that there is a relationship between
the type of training program attended and the job placement success of
trainees. We gather the following data:
Type of Training Attended: Number attending Training
Vocational Education 200
Work Skills Training 250
Total 450
Placed in a Job? Number of Trainees
Yes 300
No 150
Total 450
To compute Chi Square, a table showing the joint distribution of the two
variables is needed:
Chi Square is computed by looking at the different parts of the table. The
"cells" of the table are the squares in the middle of the table containing
numbers that are completely enclosed. The cells contain the frequencies that
occur in the joint distribution of the two variables. The frequencies that we
actually find in the data are called the "observed" frequencies.
In this table, the cells contain the frequencies for vocational education
trainees who got a job (n=175) and who didn't get a job (n=25), and the
frequencies for work skills trainees who got a job (n=125) and who didn't get a
job (n=125).
The "Total" columns and rows of the table show the marginal
frequencies. The marginal frequencies are the frequencies that we would find if
we looked at each variable separately by itself. For example, we can see in the
"Total" column that there were 300 people who got a job and 150 people who
didn't. We can see in the "Total" row that there were 200 people in vocational
education training and 250 people in job skills training. Finally, there is the
total number of observations in the whole table, called N. In this table, N=450.
To find the value of Chi Square, we first assume that there is no
relationship between the type of training program attended and whether the
trainee was placed in a job. If we look at the column total, we can see that 300
of 450 people found a job, or 66.7% of the total people in training found a job.
We can also see that 150 of 450 people did not find a job, or 33.3% of the total
people in training did not find a job.
If there was no relationship between the type of program attended and
success in finding a job, then we would expect 66.7% of trainees of both types
of training programs to get a job, and 33.3% of both types of training programs
to not get a job.
The first thing that Chi Square does is to calculate "expected" frequencies
for each cell. The expected frequency is the frequency that we would have
expected to appear in each cell if there was no relationship between type of
training program and job placement.
The way to calculate the expected cell frequency is to multiply the
column total for that cell, by the row total for that cell, and divide by the total
number of observations for the whole table.
For the upper left hand corner cell, multiply 200 by 300 and divide by
450=133.3. For the lower left hand corner cell, multiply 200 by 150 and divide
by 450=66.7 . For the upper right hand corner cell, multiply 250 by 300 and
divide by 450=166.7. For the lower right hand corner cell, multiply 250 by 150
and divide by 450=83.3
This table shows the distribution of "expected" frequencies, that is, the
cell frequencies we would expect to find if there was no relationship between
type of training and job placement.
Note that Chi Square is not reliable if any cell in the contingency table
has an expected frequency of less than 5.
a) Subtract the value of the observed frequency from the value of the expected
frequency
b) square the result
c) divide the result by the value of the expected frequency
To calculate the value of Chi Square, add up the results for each cell--
Total=70.42
DEGREES OF FREEDOM
We cannot interpret the value of the Chi Square statistics by itself.
Instead, we must put it into a context.
In theory, the value of the Chi Square statistic is normally distributed;
that is, the value of the Chi Square statistics looks like a normal (bell-shaped)
curve. Thus we can use the properties of the normal curve to interpret the
value obtained from our calculation of the Chi Square statistic.
If the value we obtain for Chi Square is large enough, then we can say
that it indicates the level of statistical significance at which the relationship
between the two variables can be presumed to exist.
However, whether the value is large enough depends on two things: the
size of the contingency table from which the Chi Square statistic has been
computed; and the level of alpha that we have selected.
The larger the size of the contingency table, the larger the value of Chi
Square will need to be in order to reach statistical significance, if other things
are equal. Similarly, the more stringent the level of alpha, the larger the value
of Chi Square will need to be, in order to reach statistical significance, if other
things are equal. The term "degrees of freedom" is used to refer to the size of
the contingency table on which the value of the Chi Square statistic has been
computed. The degrees of freedom are calculated as the product of (the number
of rows in the table minus 1) times (the number of columns in the table
minus).
For a table with two rows of cells and two columns of cells, the formula
is:
df = (2 - 1) x (2 - 1) = (1) x (1) = 1
For a table with two rows of cells and three columns of cells, the formula
is:
df = (3 - 1) x (2 - 1) = (2) x (1) = 2
For a table with three rows of cells and three columns of cells, the
formula is:
df = (3 - 1) x (3 - 1) = (2) x (2) = 4
The level of alpha can vary, but the smaller the value, the more stringent
the requirement for reaching statistical significance becomes. Alpha levels are
often written as the "p-value", or "p=.05." Usual levels are p=.05 (or the chance
of one in 20 of making an error), or p=.01 (or the chance of one in 100 of
making an error), or p=.001 (or the chance of one in 1,000 of making an error).
When reporting the level of alpha, it is usually reported as being "less
than" some level, using the "less than" sign or <. Thus, it is reported as p<.05,
or p<.01; unless you are reporting the exact p-value, such as p=.04 or p=.22.
DISTRIBUTION TABLES
Once we have the calculated value of the Chi Square statistic, and the
degrees of freedom for the contingency table, and the desired level for alpha, we
can look up the normal distribution for Chi Square in a table. There are many
tables available in statistics texts for this purpose.
In the table, find the degrees of freedom (usually listed in a column down
the side of the page). Next find the desired level of alpha (usually listed in a row
across the top of the page). Find the intersection of the degrees of freedom and
the level of alpha, and that is the value which the computed Chi Square must
equal or exceed to reach statistical significance.
For example, for df=2 and p=.05, Chi Square must equal or exceed 5.99
to indicate that the relationship between the two variables is probably not due
to chance. For df=4 and p=.05, Chi Square must equal or exceed 9.49.
If the computed value for Chi Square equals or exceeds the value
indicated in the table for the given level of alpha and degrees of freedom, then
the researcher can assume that the observed relationship between the two
variables exists (at the specified level of probability of error, or alpha), and
reject the null hypothesis. This gives support to the research hypothesis. The
computed value of Chi Square, at a given level of alpha and with a given degree
of freedom, is a type of "pass-fail" measurement. It is not like a measure of
association, which can vary from 0.0 to (plus or minus) 1.0, and which can be
interpreted at every point along the distribution. Either the computed value of
Chi Square reaches the required level for statistical significance or it does not.
It is important to note that Chi Square, like other tests for statistical
significance:
Using T-Tests
T-Tests are tests for statistical significance that are used with interval
and ratio level data. T-tests can be used in several different types of statistical
tests:
1) to test whether there are differences between two groups on the same
variable, based on the mean (average) value of that variable for each
group; for example, do students at private schools score higher on the
SAT test than students at public schools?
2) to test whether a group's mean (average) value is greater or less than
some standard; for example, is the average speed of cars on freeways in
California higher than 65 mph?
3) to test whether the same group has different mean (average) scores on
different variables; for example, are the same clerks more productive on
IBM or Macintosh computers?
To calculate a value of t,
To calculate a value of t,
Like other statistics, the t-test has a distribution that approaches the
normal distribution, especially if the sample size is greater than 30. Since we
know the properties of the normal curve, we can it to tell us how far away from
the mean of the distribution our calculated t-score is.
The normal curve is distributed about a mean of zero, with a standard
deviation of one. A t-score can fall along the normal curve either above or below
the mean; that is, either plus or minus some standard deviation units from the
mean.
A t-score must fall far from the mean in order to achieve statistical
significance. That is, it must be quite different from the value of the mean of
the distribution, something that has only a low probability of occurring by
chance if there is no relationship between the two variables. If we have chosen
a value of p=.05 for alpha, we look for a value of t that falls into the extreme 5%
of the distribution.
If we have a hypothesis that states the expected direction of the results,
e.g., those male graduate assistant salaries are higher than female graduate
assistant salaries, and then we expect the calculated t-score to fall into only
one end of the normal distribution. We expect the calculated t-score to fall into
the extreme 5% of the distribution.
If we have a hypothesis, however, that only states that there is some
difference between two groups, but does not state which group is expected to
have the higher score, then the calculated t-score can fall into either end of the
normal distribution. For example, our hypothesis could be that we expect to
find a difference between the average salaries of male and female graduate
assistant members (but we do not know which is going to be higher, or which
is going to be lower).
e) Calculate t
To calculate t,
1) subtract the mean of the second group from the mean of the first group
2) calculate, for each group, the variance divided by the number of
observations minus 1
3) add the results obtained for each group in step two together
4) take the square root of the results of step three
5) divide the results of step one by the results of step four
For example,
1) subtract the mean of the second group from the mean of the first group
17095-14885=2210
2) calculate, for each group, the variance divided by the number of
observations minus 1
Male graduate assistants:
[40056241 / (403-1)] = [40056241 / (402)] = 99642
Female graduate assistants:
[21864976 / (132-1)] = [21864976 / (131)] = 166908
3) add the results obtained for each group in step two together
99642+166908=266550
4) take the square root of the results of step three
square root of 266550=516.28
5) divide the results of step one by the results of step four
2210/516.28=4.28
To interpret the results,
f) calculate the degrees of freedom
g) look up the value in the table
h) interpret the value of t
Degrees of freedom
The degrees of freedom for the t-test is calculated by adding up the
number of observations for each group, and then subtracting the number two
(because there are two groups). For example, (403 + 132 - 2) = 533
Distribution of T
The values of t are printed in tables in most statistics texts. The values of
the degrees of freedom are listed in a column down the side, and the values of
alpha (p-value) are listed in a row across the top. There are different tables for
one-tailed and two-tailed tests of t. Find the correct table for the number of
tails. Then find the intersection of the degrees of freedom and the value of
alpha in the table. That value is the value that the calculated t-score must
equal or exceed to indicate statistical significance.
For a one-tailed test of t, with df=533 and p=.05, t must equal or exceed
1.645.
For a two-tailed test of t, with df=533 and p=.05, t must equal or exceed
1.960.
If the computed t-score equals or exceeds the value of t indicated in the
table, then the researcher can conclude that there is a statistically significant
probability that the relationship between the two variables exists and is not
due to chance, and reject the null hypothesis. This lends support to the
research hypothesis.
In this example, the computed t-score of 4.28 exceeds the table value of
t, so we can reject the null hypothesis of no relationship between graduate
assistant gender and graduate assistant pay, and instead accept the research
hypothesis and conclude that there is a relationship between graduate
assistant gender and graduate assistant pay.
Remember, however, that this is only one statistic, based on just one
sample, at one point in time, from one research project. It is not absolute,
conclusive proof that a relationship exists, but rather support for the research
hypothesis. It is only one piece of evidence, that must be considered along with
many other pieces of evidence on the same subject.
The third way to report tests of statistical significance is to include them
in tables showing the results of an extended analysis of the data, including a
number of variables. For example, here are some results from a study of older
Hispanic women in El Paso, TX, and Long Beach, CA.
*t significant at p<.05
**t significant at p<.01
Final Comments
Tests for statistical significance are used to estimate the probability that
a relationship observed in the data occurred only by chance; the probability
that the variables are really unrelated in the population. They can be used to
filter out unpromising hypotheses.
Tests for statistical significance are used because they constitute a
common yardstick that can be understood by a great many people, and they
communicate essential information about a research project that can be
compared to the findings of other projects.
However, they do not assure that the research has been carefully
designed and executed. In fact, tests for statistical significance may be
misleading, because they are precise numbers. But they have no relationship
to the practical significance of the findings of the research.
Finally, one must always use measures of association along with tests for
statistical significance. The latter estimate the probability that the relationship
exists; while the former estimate the strength (and sometimes the direction) of
the relationship. Each has its use, and they are best when used together.