JAMIE HALE - in Evidence We Trust - 2nd Edition

Although the author has made every effort to ensure the
accuracy and completeness of information contained in

this book, the author assumes no responsibility for errors,
inaccuracies, omissions, or inconsistencies in the book.
Cover design: Kevin Akers www.polemicsapps.com
Copyright  2019, by Jamie Hale. All rights reserved.

No portion of this book may be reproduced or transmitted
in any form or by any means without the prior written
permission of the publisher.
Contact information:
Jamie Hale
323 Calmes Blvd.
Winchester, KY 40391
www.knowledgesummit.net
www.maxcondition.com
Published by MaxCondition Publishing.

Contents
Acknowledgements …......................................................1
Introduction ….................................................................2
Chapter 1: The Need for Science and Statistics …..........7
The Skeptic …......................................................8

Scientific & Nonscientific Approaches to
Knowledge ….....................................................19
Science Might Have it Wrong? …......................32
The Common Sense Myth! …............................34
Correlational Studies are Important Even if
They Don’t Imply Causation! …........................37
Why We Need Statistics! …...............................39
When Experts are Wrong …...............................43
Understanding Scientific Research Methods ….52
Why Science Matters by James Randi …...........63
The Nonsense Detection Kit …..........................65
Science Roundtable: Discussing Scientific
Matters …...........................................................74
Guidelines for Reading Research Reports ….....87
Association Between Scientific Cognition and
Scientific Literacy.............................................. 90
Analytical Reading........................................... 111
Chapter 2: The Need for Rationality ….........................114
Developing The RQ Test …...............................115

Good Thinking: More Than Just Intelligence ...123
Intelligence and Rationality: different
cognitive abilities …......................................... 126
The Ultimate Goal of Critical thinking …........131
Man is an Irrational Animal! …........................135
Common Myths About Rationality …..............142
Dysrationalia: Intelligent People Behaving
Irrationally …...................................................145
Rationality Quotient........................................ 154
Chapter 3 FAQ: Research Methods and Statistics …..161
References ...................................................................216
Appendices ..................................................................224
Appendix A Practice Problems …................................224
Appendix B APA Style Citation and Reference
Lists ….........................................................................233
Index …........................................................................236
About the Author ….....................................................242

Acknowledgements
I would like to thank Kevin Akers for his editorial work,

design and layout work, and for making suggestions
regarding the book’s contents. Thanks to Jonathan Gore for
editing the section addressing frequently asked questions
about research methods and statistics. Thanks to Richard
Osbaldiston for allowing me to use practice problems from
his course on research methods and statistics.
Thanks to everyone who participated in the interviews

featured in the book: Keith Stanovich, Maggie Toplak,
Richard West, Andreas Zourdos, Kurtis Frank, Alvaro
Fernandez, Scott Lilienfeld, Jonathan Gore, Richard
Osbaldiston, Bret Contreras, Lars Avemarie, and Brad
Schoenfeld. Lastly, thanks to James Randi for contributing
an article to chapter one.
1
Introduction
We are bombarded with information on a daily basis. It is

often said we live in the information age, but we also live
in the mis-information age. How do we decide what
constitutes knowledge and what constitutes nonsense?
Maybe there are no wrong or right answers, and just
opinions? Of course, some individuals want you to believe
this notion. However, this notion is fallacious. There are
facts and opinions, right and wrong answers. There is a
reality that extends beyond personal comforts and opinions
(Mitchell & Jolley, 2010). In the context used here facts
are tentative. They are assertions that are supported by the
preponderance of evidence. Facts in the context of science
(primary concern in this book) are based on levels of
certainty, but absolute certainty isn't attained. Scientific
findings are often presented in terms of probabilities,
principles, theories, laws, etc., and are revised in
accordance to findings. Science is a vast enterprise; an
array of methods and statistics are used. The methods and
information discussed in this book are only a sample of a
large body of strategies used in science. Different domains
of science often use different research and modeling
strategies.
For the purposes perpetuated in the current book evidence

is synonymous with scientific evidence. Testimonials,
anecdotes, they-says, wishful thinking and so on do not
count for evidence. If we consider these types of claims
and feelings as evidence then any discussion of evidence is
vacuous. Testimonials exist for almost any claim you can
imagine. That does not mean that claims of this sort have
no value. Experiences are confounded (confused by
2
alternative explanations). Experiences may be very
important in some contexts, and they may serve as
meaningful research questions. However, a meaningful
question or a possible future finding is not synonymous
with scientific evidence; scientific evidence depends on
converging evidence. That is, the convergence of different
strategies, making use of the preponderance of evidence,
that converge as a tentative finding. In the following pages
science, rationality/ critical thinking (in cognitive science
terms) and statistics (frequentist type of stats) are
discussed.
The content in chapter one includes short-articles (old, new

& revised), a science discussion roundtable (featuring
individuals from various fields), a full research report,
concise overview of a study involving a teaching strategy
regarding research methodologies and a nonsense detection
kit. Some of the short articles presented in chapter one have
been published on various internet sites, and some of the
same or similar information may be discussed across
different articles. There are at least two key benefits that
can occur when presenting similar information across
different articles (in different contexts): strengthening of
memory connections, and each article can be read as a
stand-alone article. In the science discussion roundtable
participants are asked two questions. One) Do you have
any tips for people that are interested in enhancing their
ability to read scientific research? Two) What is the
biggest (or at least one of the biggest misconceptions)
misconception about science? The Nonsense Detection Kit
is presented in chapter one. The impetus for designing the
Nonsense Detection Kit was similar kits devised by Sagan,
Shermer, and Lilienfeld.
3
Chapter two features short articles on rationality. That is
rationality, as defined by cognitive science. Some of the
same or similar information is contained across different
articles. There are at least a couple of advantages to
presenting information in this manner (refer to previously
mentioned advantages in chapter one). Some of the articles
focus on the rationality intelligence dichotomy. Also
included in this chapter are interviews with Keith Stanovich
and the Stanovich Research Lab (Keith Stanovich, Richard
West and Maggie Toplak). In the interviews with
Stanovich, he discusses the development of an RQ Test. In
the interview with the Stanovich lab, rationality and
intelligence are discussed.
Chapter three features frequently asked questions about

research methods and statistics. Many of the questions are
questions I have received in the past from my students.
Some of the questions address basic research and statistics
problems, while other questions are more complex. To
reiterate, the research methods and stats discussed in this
book mention a small sample relative to the wide range of
methods and stats used across areas of science. Hard
sciences and soft sciences often use different models,
methodologies, stats, inferential processes, manipulations
and assessments. Even within the same area of science
different strategies are used. Not all science involves null
hypothesis statistical testing and Popperian principles of
falsification; science and the methodologies used are much
broader. Science is a vast enterprise.
The book ends with an appendices section. Practice

problems, and guidelines regarding APA citations and
reference lists are given.
4
The content in this book may be difficult for some to
comprehend. However, with some effort and patience the
content is learnable for most people. In the words of Albert
Einstein “Things should be made as simple as possible, but
not any simpler.” Science, rationality and statistics can be
simplified to a degree, but relative to most other topics
these topics are difficult. This book is not written for
cognitive misers (the cognitively lazy). This book is
written for individuals that are interested in separating
knowledge and nonsense, and are willing to put forth at
least a moderate level of cognitive effort.
This edition consists of a lot of the same material that was

in the first edition. The 2nd edition offers a revision to
some parts of the 1st edition, and it consists of additional
information including a full research report (examining
association between scientific cognition and scientific
literacy), concise overview of a research report on teaching
research methodology, development of the rationality
quotient and a discussion on different topics involving
statistics (valuable sources provided if further study is an
interest).
My hopes are after reading this book you appreciate the

need for critical thinking and the importance of critical
thinking regarding goal attainment and the formation of
evidence based beliefs. Many people who will read this
book already have a sound understanding of science,
rationality and statistics. For those people my hopes are
the information contained in this book helps to strengthen
their knowledge base, and leads them to some new
perspectives.
Science is the great reality detector. A scientific worldview
5
is mechanistic, materialistic and congruent with model
dependent reality (Hawking & Mlodiknow, 2010).
6
Chapter 1
The Need for Science and Statistics
In this chapter the focus will be on science and statistics.

Scientific methods are the most powerful methods we have
for discovering reality. Statistics allows us to organize,
summarize, model and interpret research data collected
from samples. Statistics are mentioned briefly in this unit.
A more comprehensive discussion of statistics is featured in
chapter three. Science is a complex concept; it involves
multiple components. A comprehensive discussion of the
multiple components is not featured in this book. Different
components are mentioned, but the main discussion points
presented here are elements of research methods (mostly
methods used in behavioral, cognitive, social, health,
biomedical and educational sciences) and the philosophy of
science.
7
The Skeptic
“Jamie why are you so skeptical?” “Why do you have such

a negative view of the world?” “You are so cynical.” I hear
these types of questions and statements on a daily basis. It
is important to point out that being skeptical is not a bad
thing, and is not synonymous with being cynical.
Skeptic or Cynic
Some people believe that skepticism is the rejection of new

ideas. People often confuse “skeptic” with “cynic.”
Skeptic is derived from the Greek skeptikos, which means
"inquiring" or "to look around”. Skeptics apply reason to
and need evidence for all claims. It is important to
consider who are making the claims, but no matter who
makes the claim evidence is required. The individual’s
reputation, authority or credentials do not make the claim
correct. The evidence determines whether the claim is
correct. Skepticism is a method used to question the
validity of a particular claim. In its simplest form
skepticism requires evidence for a claim to be accepted as
fact (tentative fact).
Science has not investigated every topic. Many claims are

so outlandish and unjustifiable (according to already
established scientific facts) they do not warrant scientific
investigation, and some cannot be investigated
scientifically. These are the type of claims that violate
basic laws, and principles of science. The people that
promote these claims generally make up their own
terminology and attempt to impress people with complex
8
sounding words dressed up as scientific words (in many
cases words that do not exist or words they cannot
accurately define).
What is a cynic? Cynics are distrustful of any advice or

information that they do not agree with themselves. Cynics
do not accept any claim that challenges their belief system.
Recently in an interview I was asked the following:
“W Noble: Do you have any concerns about some people

saying this book promotes a cynical approach to the fitness
industry?” My answer was:
“Coach Hale: No. The only people that will make this
claim are people that are not willing to look at truth and
people that promote quack science. Fitness Skepticism
(this includes the health, nutrition and supplement
industries) is an approach to claims that investigates reason
to any and all ideas. Skeptics do not go into an
investigation closed to the possibility that a claim might be
true. When I say “skeptical,” I mean that I need to see valid
evidence before believing a claim. Cynical on the other
hand means taking a negative view and not willing to
accept valid evidence for the claim. I think skepticism is
healthy and should be promoted in all fields.”
Skepticism and Science
Skepticism is a key part of science. Science is vast; it

involves different parts and systematic processes for
analyzing information, making models, testing and making
educated, probability inferences about the world. The word
science is derived from a latin word scientia meaning an
organized body of theoretical knowledge. The term
9
scientific method should be avoided. Science uses an array
of methods and modeling strategies, and they vary
tremendously within and between domains. A single
concept, or simple definition doesn't suffice. Scientific
processes are systematic processes, and they are best we
have for acquiring new knowledge that uses deductive and
inductive reasoning (Randall, 2012). The hypothetico-
deductive model suggests scientific inquiry proceeds by
formulating a hypothesis (predicted outcome) in a form that
can be tested, falsified, and then it involves test(s) on
observable data (direct or indirect) where the outcome is
not yet known. This model is often taught to students as
the scientific method- the way science is done. This is a
over simplification; a lot of scientific data has been
acquired in the absence of this model. Research, often,
doesn't fit with this model. That are important scientific
findings that didn't happen via the hypothetico-deductive
model. This model often involves standard null hypothesis
statistical testing (as incorporated in the frequentist model),
which is often used in research, but it isn't always used.
Sean Carroll has written an excellent article-Beyond

Falsifiability-
(https://www.preposterousuniverse.com/blog/2018/01/17/b
eyond-falsifiability/) on the limitations of falsification.
Carroll asserts "it’s not so much that falsifiability is
completely wrong-headed, it’s just not quite up to the
difficult task of precisely demarcating the line between
science and non-science... If we care about accurately
characterizing the practice and principles of science, we
need to do a little better — which philosophers work hard
to do, while some physicists can’t be bothered."
Inductive reasoning involves going from specifics to
10
generalizations. "Scientists inductively work from
observations to try to establish a consistent framework that
matches all the measured phenomena. Once the theory is
in place, scientists and detectives make deductions, too, in
order to predict other phenomena and relationships in the
world" (Randall, 2012, p.43-44). Some of the primary
goals of science include describing, predicting and
explaining phenomena. Science shines light on reality.
The following is an excerpt from Why People Believe Weird

Things (Shermer, 1997, p.19). “Through the scientific
method, we may form the following generalizations:
Hypothesis: A testable statement accounting for a set of

observations.
Theory: A well-supported and well-tested hypothesis or set

of hypotheses.
Fact: A conclusion confirmed to such an extent that it

would be reasonable to offer provisional agreement.”
When using scientific methods one of the primary goals is

objectivity. Proper use of science leads to epistemic
rationality (basing beliefs on evidence). Relying on science
also helps us avoid dogmatism (adherence to doctrine over
rational and enlightened inquiry, or basing conclusion on
authority rather than evidence).
Thinking Gone Wrong
Why do so many people believe everything they read or

hear? One of the key reasons they believe almost
everything they hear is that throughout life they have been
11
discouraged from critical thinking. Don’t question
authority (so they are told). When we were kids our
parents gave us advice and told us what to do. No
questions were asked. This continued throughout our
school years. The formal education system generally
discourages critical thinking (process of actively and
skillfully conceptualizing, applying, analyzing,
synthesizing, and evaluating information to reach an
answer or conclusion). You have probably often thought if
“your teacher said it so it must be right”. This cycle
continues throughout most of our lives. We are constantly
exposed to Newspapers, TV, so-called experts and other
sources of information that tell us what is right and wrong.
The lack of emphasis on critical thinking leads to various
problems in the decision making process. These problems
make it difficult to distinguish fact from fiction.
Anecdotes are not science
Anecdotes are personal testimonies that support a claim.

They are not science. Anecdotal evidence lacks controlled
study and generally has many confounds.
With anecdotal evidence it is impossible to determine what

variable(s) are responsible for any differences in
performance. A basic example of anecdotal evidence
follows:
I supplemented with BCAAs (branch chained amino acids)

when I was preparing for my bodybuilding show. I didn’t
lose any muscle; therefore, supplementary BCAAs prevent
muscle loss. However, in addition to BCAA
supplementation I also weight trained, supplemented with
protein shakes, ate a large amount of protein foods (means
12
high levels of BCAAs) and so on. The point is there is no
way that it could be accurately determined that the BCAA
supplement was the reason the bodybuilder did not lose
muscle. This is also an example of confusing correlation
with causation. The illusion of cause is one of the most
common everyday cognitive errors. It is natural to see
patterns and infer causation even when there are no
causative patterns. High quality research, that rates high in
internal validity, helps get us closer to determining cause
and effect.
Scientific Jargon does not make a science
Disguising a diet plan, equipment ad, supplement

promotion, beauty product, or cleaning product with
scientific jargon to make it sound legit is common.
Scientific sounding words impress many people. Don’t be
impressed. Look for the evidence. What you will often
find are these phrases and words are taken out of context
and are sometimes not even recognized words.
Bold Statements and Bold testimonials
Most of the time if it sounds too good to be true it is. The

same holds true for celebrity testimonials. This does not
mean companies that use bold statements and bold
testimonials are always being deceitful (but, buyer beware).
The power of celebrity is evident in society. There are
probably thousands of people who buy products just
because their heroes endorse them. Hero worship often
drives consumers to make bad choices (spending 300
dollars per month on supplements, purchasing the new
washing product even though research suggests its
ineffectiveness).
13
Rumors everywhere
“They say…” or “I have always heard…” and so on.

That’s how rumors start. Someone makes a suggestion and
the next thing you know it is accepted as fact. What is the
basis of these suggestions? Most people accept these
rumors having no idea where or why they were started. All
they know is “that is what they have always heard”.
Something can be repeated one million times and that still
doesn’t make it correct.
Ignoring failures
Even though failures in science are not always reported in

journals, high level scientists often identify failures when
the evidence doesn’t support their claim. By identifying
failures scientists are able to form new questions and use
various methods in an effort to get closer to the truth.
Perpetuators of pseudoscience ignore failures and find a
way to justify their failures. Science progresses and high
level scientists and scientific thinkers will often change
some of their views over time.
Correlation and causation
Just because two events follow each other in sequence does

not mean they are connected or that one causes the other.
Common example: My friend has high insulin levels and
my friend is fat. Therefore high insulin levels make my
friend fat. When looking at this situation closer my friend
also eats above his maintenance level of calories everyday
and lives a sedentary lifestyle. In order to deem causation
three criteria need to be met: covariation (association,
14
correlation), temporal precedence, and internal validity
(eliminate confounds, possible alternative explanations for
outcome).
Emotive words
Using emotive words is one of the key marketing tactics

used by the fitness and other industries. Encouraging
words and telling people what they want to hear is one of
the key determinants of whether a person or product will be
successful in the fitness industry. Common examples:
Twenty page ad copies with free gift offers, highlighted
power emotive words and never before revealed secrets,
infomercials featuring people “just like you”: Easy fast
workouts: supplement ads that promise rippling abs and
bulging muscles. This type of marketing strategy almost
always means quack.
Ad Hominem
Ad Hominem means “argument against the man”. This

type of argument consists of replying to an argument or
factual claim by attacking or appealing to a characteristic or
belief of the person making the argument or claim, rather
than by addressing the argument or producing evidence
against the claim. This is type of argument is very common
occurrence appearing on internet forums. Common
examples: “That guy is a total jerk; I would not listen to
anything he said”… or “He is not very strong he shouldn’t
be giving strength training advice”. These type of attacks
are usually committed by someone who lacks debating
skills or has little knowledge of the subject being discussed.
15
Overreliance on authorities
Our culture relies heavily on the advice of authorities,

especially if these authorities are rich and famous.
Someone that is considered an authority should not be
given a free pass when it comes to providing evidence for
their statements. Consulting with authorities can be useful,
but it can also be very dangerous if we become uncritical of
their statements. The person’s credentials (credentials can
carry various meanings and whether a credential is relevant
or not is completely subjective) have no effect on the
validity of the statement. Accepting a statement as absolute
fact without investigation can lead to accepting a wrong
idea just because it was supported by someone we respect
(e.g. Pro bodybuilders say using a supplement A is why
they are huge, so consumers buy a huge quantity of
supplement A ). On the other end rejecting a valid idea by
someone who we disrespect could lead to never finding the
truth. If you are looking for the truth you need to forget
your personal bias towards the claimant.
Another thing that is important to recognize is that many

people have scientific degrees but do not actually practice
scientific methods, or consistently engage in scientific
cognition. Just because someone has a scientific degree
doesn’t necessarily imply they are a scientist or scientific
thinker.
Argumentum ad antiquitatem
Also known as appeal to common practice or false

induction or the "is/ought" fallacy. This is a common
logical fallacy in which an idea is deemed correct on the
basis that it has a long standing tradition behind it. In other
16
words "This is right because we've always done it this
way.” Consider the following example: Boxers have
traditionally performed moderate to high levels of long
distance roadwork to optimize their endurance levels. So,
performing long distance road work must be the best way
to optimize endurance; even though boxing relies primarily
on the anaerobic energy system and running long distance
mainly utilizes the aerobic energy system. Many boxers
have never tried anything different in regards to enhancing
endurance. There are many things people have always
done a particular way. That does not mean that way is the
most effective method for achieving the desired outcome. I
am sure everyone reading this has heard “well that’s the
way I have always done it”.
Argumentum ad novitatem
This fallacy occurs when people believe something is better

or correct because it is new. Common example: New
elliptical machines are a much better piece of equipment
than the outdated models. The new machines feature
updated state of the art monitors. It is a big fallacy to
equate newer with better.
Shifting the burden of proof
The burden of proof is on the person making the claim.

Shifting the burden of proof is the fallacy of putting the
burden of proof on the person who denies or questions the
claim. As an example, some individuals in Maine claim to
have seen talking bears. Numerous individuals have
spotted these bears. Of course, they can’t provide proof for
their claims, but no one can disprove their claims. The
source of the fallacy is the assumption that something is
17
true unless proven otherwise. Shifting the burden of proof
is a common fallacy used in various industries. The quack
(bamboozler, charlatan, pretend fitness expert, etc.) often
lacks evidence for his/her claim therefore they shift roles
and insist that the skeptic disprove their claim.
Relativist Fallacy
The Relativist Fallacy is committed when a person rejects a

claim by asserting that the claim might be true for others
but is not for him/her. Common example: John says:
“Look Dave, I read that the fastest way to lose weight is not
to eat.” Dave says: “That might be true for other people but
I can’t lose weight even if I don’t eat.”
Don’t be afraid to question so-called experts. Once you

become a skilled skeptic you will be better at making
decisions.
18
Scientific & Nonscientific Approaches to Knowledge
Science is concerned with discovering reality, overcoming

personal biases, superstitious and various other types of
cognitive errors. Proper use of scientific methods leads us
to rationalism (basing conclusion on intellect, logic and
evidence). Relying on science also helps us avoid
dogmatism (adherence to doctrine over rational and
enlightened inquiry, or basing conclusion on authority
rather than evidence), and leads us closer to reality.
Scientific processes/ methods are unmistakably the most
successful processes we have for describing, predicting,
modeling and explaining phenomena in the observable
universe. If reality is your preference then science is the
way. Different domains of science may use different
approaches to knowledge. Their processes can be
drastically different. As an example, physicists conduct
research and build knowledge in a different manner than
social scientists. There are common approaches used
across different areas of science, but to reiterate, there is
also variations in the specifics of knowledge building. In
the context mentioned here, the approaches are mostly
characteristic of the social and health sciences. The
nonscientific approach to knowledge involves informal
kinds of thinking. This approach can be thought of as an
everyday unsystematic way of thinking. There is a wide
array of nonscientific approaches to knowledge. Here, only
a few are mentioned
19
Comparing Scientific & Nonscientific Approaches to
Knowledge
Scientific Nonscientific
General approach Empirical Intuitive
(systematically)
Observation Controlled Uncontrolled
Reporting Unbiased Biased
Concepts Clear definitions Ambiguous
definitions
Instruments Accurate/precise Inaccurate /
imprecise
Measurement Reliable/repeatable Non-reliable
Hypotheses Testable Untestable
Attitude Critical Uncritical
*Based on Table 1.1 from Research Methods in Psychology
(Shaughnessy & Zechmeister, 1990, p.6)
General approach
The scientific approach to knowledge is based on

systematic empiricism (Stanovich, 2007). Observation
itself is necessary in acquiring scientific knowledge, but
unstructured observation of the natural world does not lead
to an increased understanding of the world. “Write down
every observation you make from the time you get up in the
morning to the time you go to bed on a given day. When
you finish, you will have a great number of facts, but you
will not have a greater understanding of the world”
(Stanovich & Stanovich, 2003, p. 12).
Systematic Empiricism is systematic because it is structured

in a way that allows us to learn more precisely about the
world. After careful systematic observations, such as those
20
in controlled experiments, some causal relationships are
supported while others are rejected. Extending these
observations, scientists propose general explanations that
will explain the observations. “We could observe end-less
pieces of data, adding to the content of science, but our
observations would be of limited use without general
principles to structure them” (Myers & Hansen, 2002, p.
10).
The empirical approach (as used in everyday observation)

allows us to learn things about the world. However,
everyday observations are often made carelessly and
unsystematically. Thus, using everyday observations in an
attempt to describe, predict and explain phenomena is
problematic.
Many everyday judgments are based on intuition. This

usually means that “gut feeling” or “what feels right.” The
Penguin Dictionary of Psychology defines intuition as a
mode of understanding or knowing characterized as direct
and immediate and occurring without conscious thought or
judgment. Intuition can be a valuable cognitive process,
but becoming too reliant on intuition can be a problem.
What’s right is often counterintuitive. Our intuition often
fails to recognize what is actually true because our
perceptions may be distorted by cognitive biases or because
we neglect to weigh evidence appropriately. We tend to
perceive a relationship between events when none exists.
We are also likely to notice events that are consistent with
our beliefs and ignore ones that violate them. We
remember the hits and forget the misses.
Below is an example of the difference between the “gut

feeling” approach and the one preferred by scientists. The
21
excerpt is from The Demon-Haunted World (Sagan 1996).
‘I am frequently asked, “Do you believe there’s

extraterrestrial intelligence?”
I give the standard arguments-there are a lot of
places out there, the molecules of life are
everywhere, I use the word billions, and so on.
Then I say it would be astonishing to me if there
weren’t extraterrestrial intelligence, but of course
there is as yet no compelling evidence for it. Often
I am asked next, “What do you really think?” I say,
“I just told you what I really think.” “Yes, but
what’s your gut feeling?” But I try not to think with
my gut. If I’m serious about understanding the
world, thinking with anything besides my brain, as
tempting as that might be, is likely to get me in
trouble. Really, it’s okay to reserve judgment until
the evidence is in.’
Observation
When observing phenomena a scientist likes to exert a

specific level of control. When utilizing control, scientists
investigate the effects of various factors one by one. A key
goal for the scientist is to gain a clearer picture of those
factors that actually produce a phenomenon. It has been
suggested that systematic control is the key feature of
science. Non-scientific approaches to knowledge are often
made unsystematically and with little care. The non-
scientific approach does not attempt to control many factors
that could affect the events they are observing (don’t hold
conditions constant). This lack of control makes it difficult
to determine real, systematic relationships (too many
confounds, unintended independent variable(s)).
22
Determining causality is particularly hard with so many
confounds.
The factors that the researcher manipulates, in experimental

research, in order to determine their effect on the outcome
are called the independent variables. In its simplest form
the independent variable has two levels. A variable is
manipulated when participants / subjects are assigned to
receive different levels of the variable. These two levels
(or conditions) include the experimental condition; the
condition in which the treatment is present and the control
condition; the condition in which the treatment is absent.
Only with experimental research can we determine cause
and effect (or probability of causal relationship).
The measures that are used to assess the effect of the

independent variables are called dependent variables
(Shaughnessy & Zechmeister, 1990). Proper control
techniques must be used if changes in the dependent
variable are to be interpreted as a result of the effects of the
independent variable. Scientists often divide control
technique into three types: manipulation, holding
conditions constant, and balancing. We have already
discussed manipulation when we looked at the two levels
of the independent variable. Holding conditions constant
other than the independent variables is a key factor
associated with control. This helps eliminate the possibility
of confounds influencing the measured outcome.
Balancing is used to control factors that cannot be

manipulated or held constant (e.g. subjects characteristics).
The most common method of balancing is to assign
subjects randomly to the different groups being tested. An
example of random assignment would be putting names on
23
a slip of paper and drawing them from a hat (flipping coin
or number generator may also be used for random
assignment). This does not mean there will be no
differences in the subject’s characteristics, ideally the
differences will probably be minor, and generally have
minimal effect on the results.
Reporting
How can two people witness the same event but see
different things? This often occurs due to personal biases
and subjective impressions. These characteristics are
common traits among non-scientists. Their reports often go
beyond what has just been observed and involve
speculation. In the book Research Methods in Psychology
(Shaughnessy & Zechmeister, 1990) an excellent example
is given demonstrating the difference between scientific
and non-scientific reporting. An illustration is provided
showing two people running along the street with one
person running in front of the other. The scientist would
report it in the way it was just described. The non-scientist
may take it a step further and report one person is chasing
the other or they are racing. A nonscientific approach is
generally more speculative than a scientific approach. This
type of reporting lacks objectivity.
Scientific reporting attempts to be objective and unbiased.

One way to lessen the chance of biased reporting is
checking to see if other independent observers report the
same findings. Even when using this checkpoint the
possibility of bias is still present. Following strict
guidelines to prevent biased reporting decreases the
chances of it occurring. Totally unbiased reporting rarely,
if ever, occurs. Scientists are humans, and humans are
24
susceptible to a wide range of conscious and unconscious
biases. Scientists may also be driven by incentives.
Concepts
It is not unusual for people in everyday conversation to

discuss concepts they really don’t understand. Many
subjects are discussed on a routine basis, even though
neither party knows exactly what the subject means. They
may have an idea of what they are discussing (even though
their ideas may be totally opposite), or the idea may be
ambiguous to both. The concepts being discussed often
cannot be measured, nor are they defined in a meaningful
way.
The scientist attaches an operational definition (a definition

based on the set of operations that produced the thing
defined) to concepts. An operational definition
(operationalization) is a precise, observable operation
(concrete, physical steps) used to manipulate or measure a
variable (Jackson, 2009; Mitchell & Jolley, 2010). An
operational definition allows others to see exactly how the
study was conducted, and allows replication and criticism
of the study (Patten, 2004; Stanovich, 2007).
Instruments
In everyday life numerous instruments are used to measure

events. Common instruments include gas gauges, weight
scales, and timers. These instruments are not very precise
compared to the more exact instruments used with the
scientific approach. Your bathroom scale weighs you in
pounds. What if you weigh 100lbs and 2 oz? What if your
friend weighs 100 lbs and 6 oz? Your friend is heavier but
25
the bathroom scale says you weigh the same. Of course,
this weight difference is negligible, and probably has little
practical significance.
A common device used by coaches and athletes to measure

sprint times are hand held timers. These timers are highly
inaccurate and read to the tenths place. In the Olympics
winners and losers are often separated by hundredths of a
second. The instruments we generally depend on in
everyday life give us approximations, not exact
measurements. Often approximations are all we need from
a practical standpoint, but sometimes a more precise
measure is required.
Measurement
An instrument can provide accuracy and preciseness but

still lack value if the measurement is non-valid. When
determining the validity of the measurement one must ask
does the measurement really measure the concept in
question?
The key aspects concerning the quality of scientific

measures are reliability and validity (Hale, 2011).
Reliability is a measure of the internal consistency and
stability of a measuring device. Validity gives us an
indication of whether the measuring device measures what
it claims to.
Internal consistency is the degree in which the items or

questions on the measure consistently assess the same
construct. With an internally consistent measure items are
positively correlated with each other. This measure of
internal consistency is particularly important regarding self-
26
report measures. It isn't as important when considering
performance based measures, tests or surveys. Each
question should be aimed at measuring the same thing.
Stability is often measured by test / retest reliability. The
same person takes the same test twice and the scores from
each test are compared. Interrater reliability is sometimes
used in assessing reliability. With interrater reliability
different judges or raters (two or more) make observations,
record their findings and then compare their observations.
If the raters are reliable then the percentage of agreement
should be high.
When asking if a measure is valid we are asking if it

measures what is supposed to. Validity is a judgment based
on collected data, not a statistical test. Two primary ways
to determine validity include: existing measures and known
group differences.
The existing measures test determines if the new measure

correlates with existing relevant valid measures. The new
measure should be similar to measures that have been
recorded with already-established valid measuring devices.
Known group differences determine whether the new
measure distinguishes between known group differences.
An illustration of known group differences is seen when
different groups are given the same measure, and are
expected to score differently. As an example, if you were
to give Democrats and Republicans a test assessing the
strength of certain political views, you would expect them
to score differently. Various sub-categories of validity
(external, internal, statistical and construct) are also
important in some contexts. Validity rating is not overly
objective, in fact it is relatively subjective in some areas.
There isn't a perfect validity.
27
It is possible to have a reliable but not valid measure.
However, a valid measure is always a reliable measure.
Often, when using unsystematic approaches to knowledge

measures are not reliable or valid. That is, they do not
measure the trait or characteristic of interest consistently
nor do they measure what they are intended to measure.
With unscientific thinking reliability and validity are often
not even considered. This can lead to erroneous
conclusions based on measures. Scientific approaches
generally make great efforts to ensure reliability and
validity.
Hypotheses
Scientists are interested in testable hypotheses. At least,

they are interested in hypothesis testing when using
predictive and explanatory research. However, let's not
forget lots of quality scientific data is collected and
constructed without the use of hypothesis testing. As an
example, Darwin, Fossey, Einstein, Hawking and Goodall,
to name a few, didn't use standard null hypothesis testing.
When testing scientific hypotheses- predicted outcome of
study involving potential relationships between at least two
variables- scientists are not attempting to prove their
hypotheses, but are attempting to falsify them. Or at least,
that is what textbooks report. Many disagree with this
notion. High school and college students are usually taught
hypothesis testing is concerned with falsifying.
Scientists set up hypotheses that they attempt to falsify /

disprove. Two mutually exclusive hypotheses are formed
with the intent of falsifying one while gaining support for
the other. The null hypothesis- no relationship, no
28
difference or not statistical difference- predicts when
comparing different groups there will be no difference. The
alternative hypothesis- there is a difference, statistically
significant difference- predicts when groups are compared
there will be a difference.
A hypothesis, as it is used in everyday language, is a

tentative explanation for a phenomenon, or a guess. It
often attempts to the answer the questions “How” and
Why?” Almost everyone has formed their own hypotheses
about some things. The scientist proposes hypotheses that
are testable. The non-scientist suggests hypotheses in an
ambiguous manner, which doesn’t allow testing, or they
many suggest hypotheses that are un-testable. These
ambiguous hypotheses are often stated in a way that
virtually any type of finding can be used to support them.
A scientific hypothesis is precisely stated and an
operational definition is provided.
Hypotheses are not testable if the concepts they refer to are

not precisely defined, or if they are not susceptible to
measuring.
Attitude
The key attribute of scientists is skepticism. Scientists

question everything (almost everything). They want to see
evidence. Their personal epistemology is one of and
evaluative nature. Belief formation is dependent on the
preponderance of evidence, and is tentative. It is also
important to realize all humans are fallible. The scientist
has the attitude that there are no absolute certainties. R.A
Lyttleton suggests using the bead model of truth (Lyttleton,
1977). This model depicts a bead on a horizontal wire that
29
can move left or right. A 0 appears on the far left end and a
1 appears on the far right end. The 0 corresponds with total
disbelief and the 1 corresponds with total belief (absolute
certainty). Lyttleton suggests that the bead should never
reach the far left or right end. The more that the evidence
suggests the belief is true the closer the bead should be to 1.
The more unlikely the belief is to be true the closer the
bead should be to 0.
The non-scientist is ready to accept explanations that are

based on insufficient evidence or sometimes no evidence.
They heard it on CNN or their teacher said it so it must be
true (logical fallacy of an Appeal to Authority). They reject
notions because they can’t understand them or because they
don’t respect the person making the claim. The scientist
investigates the claim and critically evaluates the evidence.
Even though the scientist is skeptical, it is not practical to

be skeptical all the time. Imagine that every time someone
tells you something you ask for evidence to support his or
her claim. You would have very few friends and you would
get very little accomplished.
Science or non-science
I prefer the scientific approach to knowledge. The

approach is not perfect, but it is the best process we have
for discovering reality. Science is subject to change, and
this is one of its best qualities. Scientific theories are
provisional. The cognitive mechanisms that underpin
scientific thinking are relevant in an array of contexts, other
than just conducting and evaluating research claims.
In science the word theory is used differently than it is in
30
everyday language (Johnson, 2000). To a scientist, the
word theory represents that of which he or she is most
certain; in everyday language the word implies a guess (not
sure). This often causes confusion for those unfamiliar
with science. This confusion leads to the common
statement “It’s only a theory.”
In conclusion, science cannot explain how and why

everything happens. Science finds solutions to problems
when solutions are possible. Some things that cannot be
understood presently will be understood in the future.
Science is the best we have in an effort to understand.
31
Science Might Have it Wrong?
Opponents of science often argue- science could be wrong.

Science can’t explain everything- another popular claim by
those that attack science Recently, a friend and I were
discussing some psychology research when he asked, “Are
there any definites in Psychology?” I answered with “there
are no definites in psychology or any other branch of
science.”
Some people have the erroneous assumption that science

claims certainty; when in fact, science makes no such
claims. Scientific knowledge is tentative, and the tentative
nature of science is one of its strong points. Science, unlike
faith-based belief accepts the preponderance of evidence
and changes its stance if the evidence warrants. Science
takes us where the evidence leads.
“The Real purpose of the scientific method is to

make sure Nature hasn’t misled you into thinking
you know something you actually don’t know.”
R. Pirsing, Zen and the art of motorcycle
Maintenance
(Gilovich, 1991, p.185)
The scientist has the attitude that there are no absolute

certainties. R.A Lyttleton suggests using the bead model of
truth (Lyttleton, 1977). This model depicts a bead on a
horizontal wire that can move left or right. 0 appears on the
far left end, and a 1 appears on the far right end. The 0
corresponds with total disbelief and the 1 corresponds with
total belief (absolute certainty).
Lyttleton suggests that the bead should never reach the far
32
left or right end. The more that the evidence suggests the
belief is true the closer the bead should be to 1. The more
unlikely the belief is to be true the closer the bead should
be to 0.
Being a scientific thinker enables one to understand

evidence and resist falling for nonsensical claims. The
more one learns about scientific thinking the more one
becomes aware of what is not known, and the more aware
one becomes of science’s willingness to modify claims
when warranted. Science is not about the need for closure,
but the need for establishing principles that are tentative.
Proper use of the scientific method(s) leads to epistemic

rationality (holding beliefs that are commensurate with
evidence). Relying on science also helps us avoid
dogmatism (adherence to doctrine over rational and
enlightened inquiry, or basing conclusion on authority
rather than evidence).
Scientific methods are the best methods we have for

learning about how things work in the observable universe.
Sometimes, science doesn’t get it completely right, but
science does not claim absolutism, nor does it claim to have
all the answers. I have heard some people say, “science
doesn’t matter, what matters is what happens in everyday
life and the real world” Here is a reality check for those
misinformed folks- scientific methods are the very best
methods we have for understanding everyday life and the
real world.
33
The Common Sense Myth!
“Albert Einstein said common sense is the collection of

prejudices acquired by the age of 18. It is also a result of
some pervasive and extremely stupid logical fallacies that
have become embedded in the human brain over
generations, for one reason or another. These
malfunctioning thoughts--several of which you've had
already today--are a major cause of everything that's wrong
with the world” (Shakespeare, 2009).
Webster’s New World Dictionary (2003) defines common

sense as: “good sense or practical judgement.” This is
probably the most commonly accepted definition of the
word.
From Wikipedia:
“Common sense, based on a strict construction of the term,

consists of what people in common would agree on: that
which they "sense" as their common natural understanding.
Some people (such as the authors of Merriam-Webster

Online) use the phrase to refer to beliefs or propositions
that — in their opinion — most people would consider
prudent and of sound judgment, without reliance on
esoteric knowledge or study or research, but based upon
what they see as knowledge held by people "in common".
The most common meaning to the phrase is good sense and

sound judgement in practical matters.”
A better definition, in regarding its every use, of common
34
sense is: commonly held belief, regardless of its truth-
value.
It doesn’t matter which definition you prefer to use when

discussing common sense. Referring to common sense as
reason for a particular claim is a mistake. Yesterday’s
common sense is often today’s common nonsense. Once
upon a time it was common sense that the world was flat.
History is replete with examples of the failure of common
sense.
Frank Lovell contributed the list below of common sense

counterfactuals. Frank is a member of Kentucky
Association of Science Educators and Skeptics Member.
Common Sense Counterfactuals:
“The sun orbits Earth once a day. FALSE -- Earth rotates

under the (relative to Earth, essentially) stationary sun once
a day, and orbits the stationary sun once a year.
Velocities are simply additive (1mph+1mph=2mph, and

100,000mps+100,000mps=200,000mps). FALSE -- special
relativity.
Time is absolute. FALSE -- Special Relativity.
Space is absolute. FALSE -- special relativity (what IS

absolute is "space-time").
Earth's continents do not move. FALSE -- plate tectonics.
Everything that happens is rigorously mechanically

determined. FALSE -- quantum mechanics.”
35
From Lilienfeld et al. (2010, p.6):
“…French writer Voltaire (1764) pointed out, ‘Common

sense is not so common.’ Indeed, one of our primary goals
in this book is to encourage you to mistrust your common
sense when evaluating psychological claims. As a general
rule, you should consult research evidence, not your
intuitions, when deciding whether a scientific claim is
correct.
As several science writers, including Lewis Wolpert (1992)

and Alan Cromer (1993), have observed, science is
uncommon sense. In other words, science requires us to put
aside our common sense when evaluating evidence (Flagel
& Gendreau, 2008; Gendreau et al., 2002).”
When engaging in argument avoid using the common sense

fallacy, it gives an impression that you have no evidence to
support your claim. This tactic is sometimes used as a
persuasive method; even when the persuader doesn’t really
believe the statement is common sense. It may persuade
some people, but it will fail when arguing with someone
who has a firm understanding of logic.
36
Correlational Studies are Important Even if They Don’t
Imply Causation!
Correlation does not necessarily imply causation, but that

doesn’t mean correlation is not important. Two variables
may be associated without having a causal relationship.
However, just because a correlation has limited value as a
causative inference, does not mean that correlation studies
are not important to science.
Why are correlation studies important? Stanovich (2007)

points out the following:
“First, many scientific hypotheses are stated

in terms of correlation or lack of correlation,
so that such studies are directly relevant to
these hypotheses."
"Second, although correlation does not

imply causation, causation does imply
correlation. That is, although a correlational
study cannot definitely prove a causal
hypothesis, it may rule one out."
"Third, correlational studies are more useful

than they may seem, because some of the
recently developed complex correlational
designs allow for some very limited causal
inferences."
"…some variables simply cannot be

manipulated for ethical reasons (for
instance, human malnutrition or physical
37
disabilities). Other variables, such as birth
order, sex, and age are inherently
correlational because they cannot be
manipulated, and, therefore, the scientific
knowledge concerning them must be based
on correlation evidence.”
Correlation does not imply causation, however

causation does imply correlation. Correlational
studies are a stepping-stone to the more powerful
experimental method.
Conditions Necessary to Infer Causation (Morling,

2012):
Time precedence (temporal precedence): For A to cause B,

A must precede B. The cause must precede the effect.
Relationship (covariation): The variables must correlate. To

determine the relationship of two variables, it must be
determined if the relationship could occur due to chance.
Lay observers are often not good judges of the presence of
relationships, thus, statistical methods are used to measure
and test the existence and strength of relationships.
Internal validity: The third and final condition for a causal

relationship is internal validity. For a relationship between
A and B to be internally valid, there must not be a C that
causes both A and B such that the relationship between A
and B disappears once C is controlled. That is, in order to
establish internal validity possible alternative explanations
for the outcome must be eliminated, or at least decreased.
38
Why We Need Statistics!
Learning about stats will help you think in terms of

probabilities, and allow you to gain a better understanding
of research data. Discussions on research stats generally
involve two categories: Frequentist and Bayesian.
Frequentist methods refers to quantities that are
hypothetical frequencies of data distribution patterns under
an assumed statistical model. These hypothetical
frequencies that are predicted are called frequency
probabilities. These probabilities are not synonymous with
hypothesis probabilities. Bayesian statistics are also
concerned with probability and present mathematical
models of data. The formula use is different with Bayesian
vs. Frequentist models; a key difference is how probability
is conceptualized. My knowledge is in Frequentist stats; I
don't have the knowledge to talk about Bayesian models, so
discussions in this book, regarding stats, will be focused on
Frequentist stats.
To learn more about Bayesian vs. Frequentist refer to:

Frequentism vs. Bayesianism: Jake VanderPlas- video
https://www.youtube.com/watch?v=KhAUfqhLakw
All About The Bayes: Kristin Lennox- video

https://www.youtube.com/watch?v=eDMGDhyDxuY&t=3
041s
Statistical Modeling, Causal Inference and Social Science

https://statmodeling.stat.columbia.edu/
Myths about statistics

https://www.statisticsdonewrong.com/
39
Statistic: One number that summarizes a property or
characteristic of a set of numbers
Descriptive statistics are numerical measures that describe

a distribution by providing information on the central
tendency of the distribution, the width of distribution
(dispersion, or variability), the shape of distribution
(Jackson, 2009). Inferential statistics are procedures that
allow us to make an inference from a sample to the
population. That is, we are able to make generalizations
about a population based on the information derived from
the sample.
A key reason we need statistics is to be able to effectively

interpret research. Without statistics it would be very
difficult to analyze the collected data and make decisions
based on the data. Statistics give us an overview of the data
and allow us to make sense of what is going on. Without
statistics, in many cases, it would be extremely difficult to
find meaning or patterns of any sort in the data. Statistics
provides us with a tool to make an educated inference.
Most scientific and technical journals contain some form of

statistics; that is, if the research is quantitative. Without an
understanding of statistics, the statistical information
contained in the journal will be meaningless. An
understanding of basic statistics will provide you with the
fundamental skills necessary to read and evaluate most
results sections. The ability to extract meaning from journal
articles, and the ability to evaluate research from a
statistical perspective are basic skills that will increase your
knowledge and understanding of the article of interest.
Gaining knowledge in the area of statistics will help you
40
become a better-informed consumer. If you understand
basic statistical concepts, you will be in a better position to
evaluate the information you have been given.
Recently I asked Dr. Jonathan Gore (Hale, 2012) the

following question- Why is a basic understanding of stats
important for the public?
My answer to why stats is important is that

pretty much everything operates based on
probability. Even some of the "hard"
sciences are starting to realize that
phenomena that used to only require a basic
equation are now having to factor in
probability to account for all that they
observe. To understand events that occur in
our daily lives, including understanding
other people’s behaviors, the economy, and
health, we have to address probabilities
rather than basic equations. When I talk with
religious people about the importance of
statistics, and they question its relevance, I
say, "Statistics is the best tool for humans to
understand how God’s creation works." We
may never know the complete picture, but
statistics give us the best possible estimate.
Beware of Person-Who Statistics!
Results of scientific studies are stated in probabilistic

terms. Science is not in the business of making claims of
absolute certainty (refer to bead model of truth). When
science describes, predicts, models or explains something,
41
it is understood that the conclusion is tentative. This
willingness to admit fallibility and the need for change is
one of science’s biggest strengths. In virtually every other
area of knowledge acquisition, admitting fallibility is not a
virtue, but a weakness.
Person-who statistics: situations in which well-established

statistical trends are questioned because someone knows a
“person who” went against the trend (Stanovich, 2007). For
example, “Look at my grandpa, he is ninety years old, has
been smoking since he was in thirteen, and is still healthy”,
this statement is implying smoking is not bad for health.
Learning to think probabilistically is an important trait, and
can lead to more accurate thinking. Person-who statistics is
a ubiquitous phenomenon.
People like assertions that reflect certainty. Statistical,

scientific thinking is not about absolute certainty. The
conclusions drawn from scientific research are
probabilistic- generalizations that are correct most of the
time, but not every time. People often weight anecdotal
evidence more heavily than probabilistic information. This
is an error in thinking, leads to bad decisions, and often,
irrational thinking. It is important to accept statistical
predictions aren't perfect. These predictions are based on
samples (groups, categories intending to represent
populations) and will be correct more often than not.
42
When Experts are Wrong
We often consult with experts for advice. Their judgments

and predictions are often accepted without question. After
all, they are experts, shouldn’t we take their word?
Clinical vs. Statistical Methods
Experts rely one of two contrasting approaches to decision

making- clinical vs. statistical (actuarial) methods.
Research shows that the statistical method is superior
(Dawes, R., et al., 1989). Clinical methods rely on personal
experience and intuitions. When making predictions, those
using clinical methods claim to be able to use their personal
experience and go beyond group relationships found in
research. Statistical methods rely on group (aggregate)
trends derived from statistical records. “A simple actuarial
prediction is one that predicts the same outcome for all
individuals sharing a certain characteristic” (Stanovich,
2007, p.176). Predictions become more accurate when
more group characteristics are taken into account. Actuarial
predictions are common in various fields- economics,
human resources, criminology, business, marketing,
medical sciences, military, sociology, horse racing,
psychology, and education.
It is important to note that clinical judgment does not

equate to judgments made by only clinicians. Clinical
judgment is used in various fields- basically any field
where humans make decisions. It is also important to
realize “[a] clinician in psychiatry or medicine may use the
clinical or actuarial method. Conversely, the actuarial
method should not be equated with automated decisions
43
rules alone. For example, computers can automate clinical
judgments. The computer can be programmed to yield the
description “dependency traits” just as the clinical judge
would, whenever a certain response appears on a
psychological test. To be truly actuarial, interpretations
must be both automatic (that is, prespecifiied or routinized)
and based on empirically established relations” (Dawes, et
al., 1989, p.1668).
Decades of research investigating clinical versus statistical

prediction have shown consistent results- statistical
prediction is more accurate than clinical prediction (Dawes
et al., 1989; Stanovich, 2007; Tetlock, 2005).
While investigating the ability of clinical and statistical

variables to predict criminal behavior in 342 sexual
offenders, Hall (1988) found that making use of statistical
variables was significantly predictive of sexual re-offenses
against adults and of nonsexual re-offending. Clinical
judgment did not significantly predict re-offenses.
From Predicting Criminal Behavior (Hale, 2011):
Within the field of dangerousness risk

assessment (as it applies to violent
offenders), it has been recommended that
clinical assessments be replaced by actuarial
assessments. In a 1999 book from the
American Psychological Association-
Violent Offenders: Appraising and
Managing Risk- (Quinsey, Harris, Rice and
Cormier)-the authors argued explicitly and
strongly for the "complete replacement" of
44
clinical assessments of dangerousness with
actuarial methods. "What we are advising is
not the addition of actuarial methods to
existing practice, but rather the complete
replacement of existing practice with
actuarial methods" (p. 171).
When considering the accuracy of clinical

vs. statistical behavior- In regards to
predicting criminal repeat behavior- it is
quite clear that statistical predictions /
methods are superior to clinical predictions /
methods. "The studies show that judgments
about who is more likely to repeat are much
better on an actuarial basis than a clinical
one", says Robyn Dawes (Dawes,1996).
In a statistical analysis of 136 studies Grove and Meehl

(1996) found that only 8 of those studies favored clinical
prediction over statistical prediction. However, none of
those 8 studies were replicated (repeated) studies. In the
realm of scientific research studies need to be successfully
repeated before they are referred to as strong evidence.
In regards to the research showing that actuarial prediction

is more accurate than clinical Paul Meehl (1986) stated
“There is no controversy in social science which shows
such a large body of qualitativley diverse studies coming
out so uniformly in the same directions as this one” That is,
when considering statistical versus clinical, statistical wins
hands down. Yet, experts from various domains still claim
their “special knowledge” or intuition overrides statistical
data derived from research.
45
The supremacy of statistical prediction
Statistical data is knowledge consisting of cases drawn

from research literature, which is often a larger and more
representative sample than is available to any expert.
Experts are subject to a host of biases when observing,
interpreting, analyzing, storing and retrieving events and
information. Professionals tend to give weight their
personal experience heavily, while assigning less weight to
the experience of other professionals or research findings.
Consequently, statistical predictions usually weight new
data more heavily than clinical predictions.
The human brain is at the disadvantage in computing and

weighing in comparison to mechanical computing.
Predictions based on statistics are perfectly consistent and
reliable, while clinical predictions are not. Experts don’t
always agree with each other, or even with themselves
when they review the same case the second time around.
Even as clinicians acquire experience, the shortcoming of
human judgment can help explain why the accuracy of their
prediction lacks improvement. (Lilienfield, Lynn, Ruscio,
& Beyerstein, 2010).
When a clinician is given information about a client and
asked to make a prediction, and the same information is
quantified and processed by a statistical equation the
statistical equation wins. Even when the clinician has more
information in addition to the same information the
statistical equation wins. The statistical equation accurately
and consistently integrates information according to an
optimal criterion. Optimality and consistency supersedes
any informational advantage that the clinician gains
through informal methods (Stanovich, 2007).
46
Another type of investigation mentioned in the clinical-
actuarial prediction literature discusses giving the clinician
predictions from the actuarial prediction, and then asking
them to make any necessary changes based on their
personal experience with clients. When the clinician makes
changes to the actuarial judgments, the adjustments lead to
a decrease in the accuracy of the predictions (Dawes,
1994).
A common criticism of the statistical prediction model is

that statistics do not apply to single individuals. This line of
thinking contradicts basic principles of probability.
Consider the following example (Dawes, et al., 1989):
“An advocate of this anti-actuarial position would have to

maintain, for the sake of logical consistency, that if one is
forced to play Russian roulette a single time and is allowed
to select a gun with one or five bullets in the chamber, the
uniqueness of the event makes the choice arbitrary.”
(p.1672)
An erroneous belief that statistics don’t apply to a single
case is sometimes held by compulsive gamblers (Wagenaar,
1988). This faulty sense of prediction often leads them to
believe they can accurately predict the next outcome.
“Even as clinicians acquire experience, the shortcomings of

human judgment help to explain why the accuracy of their
predictions doesn’t improve much, if at all, beyond what
they achieved during graduate school” (Stanovich, 2007;
Dawes, 1994; Garb, 1998).
Application of statistical methods
Research demonstrating the general superiority of statistical
47
approaches should be calibrated to recognition of its
limitations and need for control. Albeit, surpassing clinical
methods actuarial procedures are not infallible, often
achieving only moderate results. A procedure that proves
successful in one setting should be periodically reevaluated
within that context and shouldn’t be applied to new settings
mindlessly (Dawes, et al., 1989).
In Meehl’s classic book- Clinical versus statistical

prediction (1996)- he thoroughly analyzed limitations of
actuarial prediction. Paul illustrated a possible limitation by
using what became known as the “broken-leg case.”
Consider the following:
We have observed that Professor A quite

regularly goes to the movies on Tuesday
nights. Our actuarial data support the
inference “If it’s a Tuesday night, then Pr
{Professor A goes to movies} _ .9.”
However, suppose we learn that Professor A
broke his leg Tuesday morning; he’s in a hip
cast that won’t fit in a theater seat. Any
neurologically intact clinician will not say
that Pr {goes to movies} _ .9; they’ll predict
that he won’t go. This is a “special power of
the clinician” that cannot, in principle, be
completely duplicated by even the most
sophisticated computer program. That’s
because there are too many distinct,
unanticipated factors affecting Professor A’s
behavior; the researcher cannot gather good
actuarial data on all of them so the program
can take them into account (Grove, W.M., &
48
Lloyd, M., 2006).
However, this example does not lend support to the idea

that avoiding error in such cases will greatly increase
clinicians accuracy as compared with statistical prediction.
For a more detailed discussion on this matter refer to
Grove, W.M., & Lloyd, M., 2006.
From Clinical versus actuarial judgment (Dawes, et al.,
1989):
When actuarial methods prove more

accurate than clinical judgment the benefits
to individuals and society are
apparent…Even when actuarial methods
merely equal the accuracy of clinical
methods, they may save considerable time
and expense. For example, each year
millions of dollars and many hours of
clinicians’ valuable time are spent
attempting to predict violent behavior.
Actuarial prediction of violence is far less
expensive and would free time for more
productive activities, such as meeting
unfulfilled therapeutic needs.
Actuarial methods are explicit, in contrast to

clinical judgment, which rests on mental
processes that are often difficult to specify.
Explicit procedures facilitate informed
criticism and are freely available to other
members of the scientific community who
might wish to replicate or extend research.
49
The use of clinical prediction relies on authority whose
assessments-precisely because these judgments are claimed
to be singular and idiosyncratic-are not subject to public
criticism. Thus, clinical predictions cannot be scrutinized
and evaluated at the same level as statistical predictions.
(Stanovich, K., 2007)
Conclusion
The intent of this article is not to imply that experts are not
important or do not have a role in predicting outcomes.
Expert advice and information is useful in observation,
gathering data and sometimes making predictions (when
predictions are commensurate with available evidence).
However, once relevant variables have been determined
and we want to use them to make decisions, “measuring
them and using a statistical equation to determine the
predictions constitute the best procedure.” (Stanovich,
2007, p.181)
The problem is not so much in experts making decisions

(that’s what they are supposed to do), but in experts making
decisions that run counter to actuarial predictions.
Decades of research indicate statistical prediction is

superior to clinical prediction. Statistical data should never
be overlooked when making decisions (assuming there is
statistical data in the area of interest- sometimes there is
not). Simultaneously, recognize there is variation within,
and between samples in studies and in populations.
Predicting individual outcomes is filled with error, and
even though statistical prediction is generally the better
option, it will involve errors.
50
I will leave you with these words (Meehl, 2003):
If a clinician says “This one is different” or

“It’s not like the ones in your table,” “This
time I’m surer,” the obvious question is,
“Why should we care whether you think this
one is different or whether you are surer?”
Again, there is only one rational reply to
such a question. We have now to study the
success frequency of the clinician’s guesses
when he asserts that he feels this way. If we
have already done so, and found him still
behind the hit frequency of the table, we
would be well advised to ignore him.
Always, we might as well face it, the
shadow of the statistician hovers in the
background; always the actuary will have
the final word (p.138).
51
Understanding Scientific Research Methods
In order to fully appreciate and apply the knowledge that

has been acquired through scientific processes it is
imperative to have a basic understanding of scientific
research methodology. Scientific Methodology- scientific
techniques used to collect and evaluate data.
It is important to understand that all research methods play

an important role in leading us to tentative conclusions
concerning how things work in the universe. But, it also
important to realize different types of research should be
interpreted and applied in a different manner. Different
areas of science use different methods, and there is
variation within the same fields of study. Different
methods have different strengths and weaknesses. It is an
exaggeration and over simplification to refer to a single
method as the gold standard. When considering the
preponderance of evidence various research strategies are
considered, often involving many people and resources.
Science is cumulative.
Recommended sources
The Rationality of Science
https://centerforinquiry.org/blog/rethinking-science-
education/
The Wide World of Science

https://skepticalinquirer.org/category/the_wide_world_of_s
cience/
52
The intent here is to discuss some different research
methods, utilized mostly by those involved with health,
social and biomedical sciences.
In our everyday lives we are exposed to a plethora of

events and information. Often, we form opinions and
beliefs based on how we interpret these events. We also
have a tendency to form beliefs based on what others
believe. Sometimes these beliefs are corroborated by
converging evidence (evidence from other methods of
inquiry), but often these beliefs are unsupported.
A basic understanding that everyday judgments, causal
determinations, and observations are often flawed
(Gilovich, 1991) leads to an appreciation of more rigorous
methods- scientific research methods- of knowledge
acquisition. According to Myers & Hansen (2002) there
are two key factors the thwart our ability to gather and
evaluate data in a systematic and impartial manner-
exposure to small samples, and “the conclusions we draw
from them are subject to a number of inherent tendencies,
or biases, that limit their accuracy and usefulness” (Myers
& Hansen, 2002, p. 4). Everyday experiences are
confounded (Morling, 2012) and are open to a wide array
of possible explanations.
Goals of Scientific Research
Many researchers agree that some of the primary goals of

scientific research are- description, prediction, and
explanation / understanding. Some like to add control and
application to the list of goals. Of course, it is possible to
add other goals (and sub-goals, and this can vary depending
on domain of science) or intents of science, but for now, the
focus is on description, prediction and explanation/
53
understanding.
Description
Description refers to the procedures used in which events

and their relationships are classified, categorized and
defined. Descriptions of events allow us to establish
generalizations and universals. By gathering information
on a large group of people a researcher can describe the
average member, or the average performance of a member
of the specific group being studied.
Describing observations of large groups of people does not

take away from the fact that there are important differences
among individuals. That is, researchers merely attempt to
describe events on the basis of average performance.
Alternatively, description allows researchers to describe a
single phenomenon and or observations of a single person.
In science, descriptions are systematic and precise.

Scientific research makes use of operational definitions.
Operational Definitions- defines events, qualities, and
concepts in terms of observable operations- procedures
used to manipulate and or measure them.
Researchers are interested in describing only things that are

relevant to the study. They have no interest in describing
observations that are irrelevant to the investigation.
Prediction
In addition to description researchers make predictions.

Descriptions of events often provide a basis for prediction.
Predictions are sometimes made in the form of Hypotheses-
54
tentative, testable predictions concerning the relationships
between or among variables. Hypotheses are frequently
derived from Theories- interrelated set of concepts that
explains a body of data and makes predictions.
Prediction of later performance is of particular importance

to researchers. For example:
Does eating a low-calorie diet increase chances of living

longer?
Does undergraduate GPA predict how well one will do in

graduate school?
Does high levels of intelligence predict avoidance of

cognitive biases?
When a variable can be used to predict another variable or

variables we can say the variables are correlated.
Correlation- exists when different measures vary together,
which makes it possible to predict values of one variable by
knowing values of another variable.
Keep in mind predictions are made in varying degrees.

However, correlation coefficients are used for determining
how well measures co-vary. Correlation Coefficient- states
the degree of relationship between variables in terms of
both strength and direction of relationship (Jackson, 2009).
Explanation / Understanding
An important goal of scientific research is explanation.

Explanation is achieved when the cause or causes of a
phenomenon are identified. In order to determine cause
55
and effect three pre-requisites are essential- covariation of
events, proper time- order sequence (temporal precedence)
and the elimination of plausible alternative causes (internal
validity). .
Covariation of events (relationship): The variables must

correlate. To determine the relationship of two variables, it
must be determined if the relationship could occur due to
chance. Lay observers are often not good judges of the
presence of relationships, thus, statistical methods are used
to measure and test the existence and strength of
relationships.
Proper time-order sequence (time precedence): For 1 to

cause 2, 1 must precede 2. The cause must precede the
effect.
Elimination of plausible alternative causes (non-

spuriousness- genuine): For a relationship between A and
B to be nonspurious, there must not be a C that causes both
A and B such that the relationship between A and B
vanishes once C is controlled.
The most difficult condition to be met when determining

cause and effect relationships is the elimination of other
plausible causes. It is also important to appreciate that
often outcomes are determined by multiple causes. The
concept of interaction is important, and it is important to
recognize cause or the extent of causal variables is not
perfect and is inferred. An interaction occurs when a
variable that influences the outcome may have different
effects when operating in conjunction with another variable
compared to when it is acting alone. It has a multiplicative
effect. The magnitude of the effect that one variable has
56
depends on the level of another variable.
Qualitative Research Matters
Qualitative research is important; important scientific

finding have occurred in the realm of qualitative research.
A primary difference between quantitative and qualitative
is that statistics and mathematical formulas are not used in
qualitative research. Although, methods are often similar
for qualitative and quantitative researchers. Data analysis,
in qualitative research, consists of detailed notes relative to
what was observed. The data are verbal in nature, and the
results of an early review might guide researchers regarding
the data collected later in study. Computers or word
processors can be used to help with data analysis by
searching through notes to identify patterns, certain words
or phrases that might help in the development of concepts
and general themes.
Qualitative research usually takes place in the environment

or natural setting of the participant. Data are collected in a
less structured fashion when compared with data collection
of quantitative researchers. Qualitative researchers go with
the flow of the research setting and may change what they
are observing based on changes that occur in field setting.
Quantitative researchers may regard this flexible, lack of
control towards research as a big threat to reliability and
validity. Some researchers blend the two approaches. As
an example, a quantitative researcher who uses semi-
structured interviews to collect data analyzes the data using
statistics, but may also report participant quotations to
support the statistics (Patten, 2004).
Should research be conducted using quantitative or
57
qualitative methods? It depends on context. Patten
compares six scenarios in the book- Understanding
Research Methods 4th edition (Patten, 2004). As examples,
Patten states when little is known about a topic qualitative
research is usually favored; when hard numbers are
required, such as those required by funding agencies,
quantitative is usually preferred.
Peer Review
In the Peer Review Process a paper is submitted to a

journal and evaluated by several reviewers (often reviewers
are individuals with an impressive history of work in the
area of interest- that is, the specific area that the article
addresses). After critiquing the paper the reviewers submit
their thoughts to the editor. Then, based on the
commentaries from the reviewers, the editor decides
whether to publish the paper, make suggestions for
additional changes that could lead to publication, or to
reject the paper.
Single Blind and Double Blind Reviews
In Single Blind Reviews authors do not know who the

reviewers are. In Double Blind Reviews authors do not
know who the reviewers are, nor do reviewers no the
identity of the authors. In many fields Single Blind
Reviews are the norm, while in others Double Blind
Reviews are preferred.
“Peer review is one way (replication is another) science

institutionalizes the attitudes of objectivity and public
criticism. Ideas and experimentation undergo a honing
process in which they are submitted to other critical minds
58
for evaluation. Ideas that survive this critical process have
begun to meet the criterion of public verifiability”
(Stanovich, 2007, p. 12).
Criticisms Peer Review Process:
Reviewers find it hard to remain Purely Objective due to

their own education, experience and biases
The process is slow
Critics point out, there are many examples of faulty

research published in peer-reviewed journals, which shows
the peer review process is often unsuccessful in weeding
out bad science. Sometimes good research is not
published, especially when findings are not statistically
significant.
Reviewers tend to be highly critical of articles that

contradict their own views, sometimes the reviewers may
might not have extensive knowledge relevant to the paper
they are reviewing, their knowledge might be minimal
relative to the specific knowledge of the author of the paper
they are reviewing, might be less critical of articles that
support their personal views (example of myside bias)
Well- known, established scientists are more likely to be

recruited as reviewers
Final word- Peer Review Process
When evaluating the worth of scientific data, in addition to

publication in a peer reviewed journal, it is important to
take into consideration: funding sources, if the study has
59
been replicated, study design, sample size, and conflicting
interest (design details and critiques will be discussed in
later articles). There are good studies that never get
published in peer review publications. The merits of the
paper should be weighed more heavily than the source of
publication. The peer review myth- a thought that peer
review automatically means quality or lack of peer review
publication indicates low quality- should be abandoned.
There is much more to quality science than peer review.
Retraction Watch publishes information on thousands of
reports that have been retracted, and most often these
reports were published in peer review publications.
When referencing scientific data it is common to reference

popular science magazines and books. Be extra cautious
when getting your science information from these sources.
There is some good science information published in
popular science sources. However, when the authors
cannot provide references for any of their statements, and
or their claims contradict those found in scholarly
publications don't place much weight on what they are
saying. It is important to consider the preponderance of
evidence, whether you are reading a textbook, popular
publication or scholarly journal.
Further reading
from When Does Peer Review Make No Damn Sense

http://andrewgelman.com/2016/02/01/peer-review-make-
no-damn-sense/
"What sort of errors can we expect peer review to catch?
60
I’m well placed to answer this question as I’ve published
hundreds of peer-reviewed papers and written thousands of
referee reports for journals. And of course I’ve also done a
bit of post-publication review in recent years.
To jump to the punch line: the problem with peer review is

with the peers.
In short, if an entire group of peers has a misconception,

peer review can simply perpetuate error. We’ve seen this a
lot in recent years, for example that paper on ovulation and
voting was reviewed by peers who didn’t realize the
implausibility of 20-percentage-point vote swings during
the campaign, peers who also didn’t know about the garden
of forking paths. That paper on beauty and sex ratio was
reviewed by peers who didn’t know much about the
determinants of sex ratio and didn’t know much about the
difficulties of estimating tiny effects from small sample
sizes.
OK, let’s step back for a minute. What is peer review good
for? Peer reviewers can catch typos, they can catch certain
logical flaws in an argument, they can notice the absence of
references to the relevant literature—that is, the literature
that the peers are familiar with. That’s how the peer
reviewers for that psychology paper on ovulation and
voting didn’t catch the error of claiming that days 6-14
were the most fertile days of the cycle: these reviewers
were peers of the people who made the mistake in the first
place!
61
Peer review has its place. But peer reviewers have blind
spots. If you want to really review a paper, you need peer
reviewers who can tell you if you’re missing something
within the literature—and you need outside reviewers who
can rescue you from groupthink. If you’re writing a paper
on himmicanes and hurricanes, you want a peer reviewer
who can connect you to other literature on psychological
biases, and you also want an outside reviewer—someone
without a personal and intellectual stake in you being
right—who can point out all the flaws in your analysis and
can maybe talk you out of trying to publish it.
Peer review is subject to groupthink, and peer review is

subject to incentives to publishing things that the reviewers
are already working on."
Evidence on peer review—scientific quality control or

smokescreen?
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1114539/
62
WHY SCIENCE MATTERS
By James Randi
Visit Randi’s site at www.randi.org
Science is not the mysterious, distant, smoking-test-tube

sort of a priesthood that many imagine it to be. Rather, it is
simply an organized, formal method of “finding out.”
Science works. We’re all much better off for having

vaccines, rapid international travel, fast access to
information, instant communication; and improved, safer
nutrition — all direct results of what scientists have
discovered about how our real world works. And have no
doubt about it: we’re living in a real world, one that doesn’t
really care about our comfort or even our survival. We have
to see to these matters, and we’ve gotten to be very good at
this.
That’s due to what we call “science.”
There are those who try to disparage efforts by science to

discover the secrets of the universe, preferring to depend on
mythology like faith healing, charms, incantations/prayers,
and various other magical motions. Science looks at the
evidence, evaluates it, proposes a likely scenario that can
accommodate it — a theory — and then tests that idea for
validity.
But science doesn’t really discover many cold, hard, facts.

Rather, it discovers statements that appear to explain
certain observed phenomena or problems. These statements
– s=ut+½ at², for example – are tested endlessly. Should
63
they fail, they are either re-written or scrapped.
You just may have recognized that formula above. It’s a

discovery made by Sir Isaac Newton, and expresses the
variables of the situation in which a cannonball is dropped
from a convenient Leaning Tower in Italy. The formula
works quite well, except when the cannonball is replaced
by something the size of an electron or a galaxy. Then, it
fails.
Does that mean that the eminent scientist Newton was

wrong all these years? Did science fail? No. Within the
parameters in which Sir Isaac worked, he was right; outside
of those limits, quantum physics takes over, and all’s right
with the world once more.
This self-correcting feature of science is not a weakness.

It’s one of the most important advantages of the discipline.
Scientists learn something new, when they’re wrong. And
they correct their findings, and we get closer to the truth.
Science has no dogmas…The bottom line: Science works,
we need it, and it improves our lives and the lives of those
dear to us. What more can we ask?
64
The Nonsense Detection Kit
The impetus for writing the Nonsense Detection Kit was

previous suggestions made by Sagan (1996), Lilienfeld et
al. (2012) and Shermer (2001). The Nonsense Detection
Kit is referring to nonsense in terms of “scientific
nonsense”. So, nonsense as it is referred to here refer to
“nonscientific information” that is often perpetuated as
scientific, when in fact it is not scientific.
The Nonsense Detection Kit provides guidelines that can be

used to separate sense from nonsense. There is no single
criterion for distinguishing sense from nonsense, but it is
possible to identify indicators, or warning signs. The more
warnings signs that appear the more likely that claims are
nonsense.
Below is a brief description of indicators that should be

useful when separating sense from nonsense. These
indicators should be useful when evaluating claims made
by the media, on the Internet, in peer-reviewed
publications, in lectures, by friends, or in everyday
conversations with colleagues.
Nonsense indicator- claims haven’t been verified by an

independent source
Nonsense perpetuators often claim special knowledge.

That is, they have made specific discoveries that only they
know about. Others lack know how, or do not have the
proper equipment to make the finding. These findings are
often reflected in phrases such as, “revolutionary
breakthrough”, “what scientists don’t want you to know”,
65
“what only a limited few have discovered”, and so on.
These findings are not subject to criticism or replication.
That is not how science works. When conducting studies it
is imperative that researchers operationalize (provide
operational definition- precise observable operation used to
manipulate or measure a variable) variables so the specifics
can be criticized and replicated. Non-scientists are not
concerned with others being able to replicate their findings;
because they know attempted replications will probably be
unsuccessful. If a finding cannot be replicated this is a big
problem, and it is unreasonable to consider a single finding
as evidence. It is also problematic when only those making
the original finding have replicated successfully. When
independent researchers using the same methods as those
used in the original study are not able to replicate this is a
sign that something was faulty with the original research.
Nonsense indicator- claimant has only searched for

confirmatory evidence
The confirmation bias is a cognitive error (cognitive bias)

defined as tendency to seek out confirmatory evidence
while rejecting or ignoring non-confirming evidence
(Gilovich, 1991). Confirmation bias is pervasive, and may
be the most common cognitive bias. Most people have a
tendency to look for supporting evidence, while ignoring or
not looking very hard for disconfirmatory evidence
(showing a dislike for disconfirmatory evidence). This is
displayed when people cherry pick the evidence. Of
course, when you’re a lawyer this is what you need to do.
You don’t want any evidence entering into the case that
may be incongruent with the evidence you present.
However, as a scientist it is important to look for
disconfirming evidence. In fact, it has been suggested that
66
a good scientist goes out of their way to look for
disconfirmatory evidence. Why look for disconfirmatory
evidence? Because when discovering reality is the
objective it is necessary to look at all the available data, not
just the data supporting one’s own assertions.
Confirmation bias occurs when the only good evidence,
according to the claimant, is the evidence that supports
their claim. Often, perpetuators of nonsense may not even
be aware of disconfirmatory evidence. They have no
interest in even looking at it.
A study by Frey & Stahlberg (1986) examined how people

cherry-pick the evidence. The participants took an IQ test
and were given feedback indicating their IQ was either high
or low. After receiving feedback participants had a chance
to read magazine articles about IQ tests. The participants
that were told they had low IQ scores spent more time
looking at articles that criticized the validity of IQ tests, but
those who were told they had high IQ scores spent more
time looking at articles that supported the claim that IQ
tests were valid measures of intelligence.
Scientific thinking should involve an effort to minimize

confirmation bias. However, science does involve
confirmation bias to a degree; this if often demonstrated in
publication bias and forms of myside bias. The late
Richard Feynman (Nobel Laureate, Physics) suggested that
science is a set of processes that detects self-deception
(Feynman, 1999). That is, science makes sure we don’t
fool ourselves.
Nonsense indicator- claimant does not adhere to the

standard rules of reason and research
67
A large number of nonsense advocates do not even know
what the standard rules of reason and research are, let alone
adhere to them. They often lack any training in research
methodology, and are ignorant to the accepted rules of
scholarly work (Shermer, 2001). Consider the following
example provided by Shermer (2001, p.21).
Creationists (mainly the young-earth creationists)

do not study the history of life. In fact, they have
no interest in the history of life whatsoever since
they already know the history as it is laid down in
the book of Genesis. No one fossil, on one piece of
biological or paleontological evidence has
“evolution” written on it; instead there is
convergence, they have to abandon the rules of
science, which isn’t difficult for them since most of
them, in fact, are not practicing scientists. The only
reason creationists read scientific journals at all is to
either find flaws in the theory of evolution or to find
ways to fit scientific ideas into their religious
doctrines.
Nonsense indicator- personal beliefs and biases drive

the conclusions
Everyone is biased to a degree. That is, if they have any

knowledge in a specific area they are probably at least a
little biased. Nonsense claims are often heavily persuaded
by personal biases and beliefs. Scientists recognize their
biases and personal beliefs, and use scientific processes in
order to minimize the effects. Scientists much like non-
scientists often find themselves looking for confirmatory
evidence. “[A]t some point usually during the peer-review
system (either informally, when one finds colleagues to
68
read a manuscript before publication submission, or
formally when the manuscript is read and critiqued by
colleagues, or publicly after publication), such biases and
beliefs are rooted out, or the paper or book is rejected for
publication.” (Shermer, 2001, p. 22)
Nonsense claimants often fail to recognize their biases

(consciously or unconsciously) and thus make little effort
to prevent this from influencing their claims.
Nonsense indicator- excessive reliance on authorities
A large portion of what we learn comes from authority

figures (including teachers, authors, parents, journalists,
etc.). Often, the authority provides accurate information.
The problem occurs when we begin to rely too heavily on
authority. Authority may provide a hint to what’s right, but
authorities are fallible. Authorities often assert different
beliefs. Which authority is right? They are both so-called
experts!
They are susceptible to a range of conscious and

unconscious biases, they make mistakes, and often have
vested interests just like non-experts.
Thoughts on Authority
“It is generally best to take the advice of “authorities” with

a grain of salt.” (Morling, 2012, p.37)
“Authorities must prove their contentions like everybody

else. This independence of science, its occasional
unwillingness to accept conventional wisdom, makes it
dangerous to doctrines less self-critical, or with pretensions
69
to certitude.” (Sagan, 1996, p.28)
Authority may be a hint as to what the truth is, but is not

the source of information. As long as it’s possible, we
should disregard authority whenever the observations
disagree with it” (Feynman, 1999, p.104)
Perpetuators of nonsense sometimes prefer authority claims

to evidence. I am sure you have heard it before my doctor
says, my preacher says, my coach says, etc. The only real
authority in science is the evidence.
Nonsense indicator- use of logical fallacies (common

fallacies of logic and rhetoric).
Those making nonsense claims often present bad

arguments. I am not saying that evidence based
information is not also sometimes presented by using
fallacious arguments. It is probably not as likely to occur
when evidence exists for a claim. However, some are just
not very good at arguing, so they may be asserting a true
claim, but lack the argumentative skills needs to present a
logical argument.
Common fallacies:
Ad hominen- attacking the arguer and not the argument
Argument from authority (as mentioned earlier)
Appeal to ignorance- the assertion that whatever has not

been proved false must be true, or because we don’t know
the answer we can attribute to whatever the claim of
interest might be
70
Appeal to tradition- that is the way it has always been done
so that must be the right way to do it
Appeal to emotion- I feel very strongly about it, so it has to

be right
Strawman fallacy- setting up a weak argument or position

held by an opponent that is easy to break down
Non sequitur- form Latin “It doesn’t follow”, often a failure

to recognize alternatives
Post hoc, ergo propter hoc- from Latin “It happened after,
so it was caused by” the illusion of cause
Confusing correlation with causation- association,

relationship (correlation) does not necessarily imply
causation
Nonsense indicator- overreliance on anecdotes
Anecdotes (short story of an event) are rarely enough to

conclude that a claim is true. Anecdotes are difficult to
verify, are often unrepresentative, almost always
susceptible to numerous explanations (Lilienfeld et al.,
2012), have been generated for almost every phenomena
that can be imagined, and are often devised to sound more
exciting. Anecdotes may be useful as hypothesis forming
statements, and might eventually be considered evidence.
Providing anecdotes as evidence is a favorite rhetorical tool

for many nonsense pushers.
71
Nonsense indicator- extraordinary claims
Extraordinary claims are appealing, exciting; they catch

people’s attention. With extraordinary claims comes the
need for extraordinary evidence. When claims contradict
virtually all-existing knowledge, then those claims need to
also provide extraordinary evidence. Nonsense
perpetuators often provide extraordinary evidence in the
form of extraordinary testimonials filled with emotion and
passion. What they provide is not evidence at all, but rather
another nonsense indicator. No amount of wishful thinking
or “just knowing” something is true makes it true. No
matter how much you wish or “just know” you have lost
weight, if the scale (shown to be reliable and valid) doesn’t
support your wish or knowing, you have to come to the
conclusion you haven’t lost weight.
Nonsense indicator – use of excessive “science

sounding” words or concepts
In an attempt to accurately distinguish among similar
concepts, scientists make use of a specialized vernacular.
Sometimes this practice is “abused, especially when the
terminology provides unsupported techniques with a cachet
of unearned scientific respectability”(Lilienfeld, 2012,
p.27)). Promoters of pseudoscience often use technical
words so they sound smart or highly knowledgeable, even
when the word usage is incorrect. In contrast to real
scientific language, this language frequently lacks meaning
or precision, or both (Lilienfeld & Landfield, 2008).
Using scientific sounding words is a powerful rhetorical

device. Scientific sounding does not necessarily imply
scientific.
72
I want to thank Sagan for his Baloney Detection Kit (1996),
Shermer for his Boundary Detection Kit (2001) and
Lilienfeld (2012) for his Pseudoscience Indicators.
73
Science Roundtable: Discussing Scientific Matters
1- Do you have tips for people that are interested in

enhancing their ability to read scientific research?
2- What is the biggest (or at least one of the biggest

misconceptions) misconception about science?
Andreas Zourdos has a BSc in Human Nutrition. He

works as a nutritionist and has also translated Jamie Hale's
Knowledge and Nonsense into Greek. His website is
www.metavolismos.com
1- Andreas Zourdos: For people that want to get better in

reading scientific research I would strongly recommend
reading two papers. The one is "Why most Published v
research findings are false" dy Dr John Ioannidis" and the
second one is "Arabian Nights: 1001 tales of how
pharmaceutical cater to the material needs of doctors: case
report". again by Dr Ioannidis. I think universities ought to
have these papers as essential reading in most scientific
fields.
What I would also strongly suggest is that new scientists

should be willing to let go their views on certain topics
when their views are not supported by data. This is very
hard to do, and no matter how intelligent a scientist maybe
he or she may insist on researching things that are myths.
Sir Isaac Newton's life is a very strong example on this.
2- A big misconception is the use of the word theory. In a

scientific context a theory is something that has been
proven many times as a fact, and therefore it has become a
74
theory. In a non-scientific context people use theory but
they mean hypothesis. People that are not scientifically
literate will go and say "theoretically speaking" but they
mean hypothetically speaking. Scientific theories are facts,
not a hypothesis. This reminds me of "Newspeak" in
George Orwell's 1984, which was pretty much the
deterioration of the english language in order to limit free
thinking.
Kurtis Frank is a recreational bodybuilder and powerlifter.

Kurtis has a passion for dietary supplements due to a desire
to harmonize the discord between the preventative and
rehabilitative potential of some dietary supplements and the
seemingly lack of interest of the medical community in
incorporating dietary supplements in to preventative
medicine. He is the lead researcher for Examin.com.
1- Kurtis Frank: Never view things as a 'Yes No'

dichotomy, and try to always see the context around a
situation. Seeing the context will really help you
understand how one study and another can say 'different'
things yet still make sense when they are both considered.
Two good paradigms would be;
A) Always look at the magnitude of the effect. Assuming

something did happen, was it large enough to really be
concerned about? A food that unexplicitly causes fat gain is
a pretty bad sounding thing, but if it is 0.5lbs in a year it
doesn't really matter too much in the grand scheme of
things.
B) Look at the the practical significance, or how much this

really matters to you. Sometimes things are quite
75
impressive, but the study is sort of a 'perfect storm' of
events. Something that improves insulin sensitivity in obese
postmenopausal women with diabetes isn't going to be
something that necessarily pumps your muscles full of
glucose and should be taken after a workout (now, it may,
but the study on diabetic obese women isn't what you
should be looking at).
Reading the studies is definitely a good skill to have, but

being able to look at all the studies holistically without
losing your sanity is also very important. Studies are good,
but they don't always apply to you so you shouldn't worry
about everything you hear, just that which matters.
2- Probably the biggest misconception is the idea that

science proves things, or that we can fully and absolutely
answer a question. That is definitely not what science does,
and how science answers questions can be sort of viewed
like:
1) What is X? (ie. you're looking at a new molecule)
2) How does X interact with Y (seeing how this molecule

works in humans)
3) How does X interact with Y assuming Z (seeing how this

molecule works in humans who are diabetic)
4) How does X interact with Y assuming R (same as the

aforementioned, but with hypertensives)
When we refer to the 'body of evidence', we refer to a large

amount of studies that look at certain drugs or supplements
in certain situations. A well researched drug or supplement
76
is one where we have investigated the particular question
and can use that evidence as proof that it should work as
the study says.
That being said, there are too many possible combinations

of variable to really prove all possible combinations and
situations (which is what many people think science 'does',
provide an absolute answer) and all we can do is replicate
the above data and research most situations in the hopes
that eventually most or all questions will lie in the
collection of studies we call the 'body of evidence'.
To be concise, science does not give 'the answer' but rather

whittles away bit by bit getting closer to the best answer we
can give at this moment in time; each whittle towards the
impossible and ideal goal will refine our answer a bit more
and make it a bit better so even if we don't get 'the' answer
we still find a pretty damn good one en route.
Alvaro Fernandez: The World Economic Forum named

Fernandez a Young Global Leader in 2012. He is the CEO
of SharpBrains.com- a leading independent market research
firm focused on applications of brain science.
1- Alvaro Fernandez: Two tips: First, practice is key. When

you see an intriguing media article about a new study, get
in the habit of reading the study itself. Many studies are
available online for free; for some you may need to visit a
library. Second, start by reading the Conclusions and
Results of the study, considering what are their practical,
current implications. That will help you prioritize what is
most relevant to you today, and what may be missing in the
study, instead of feeling overwhelmed by the more
77
theoretical/ abstract considerations.
2- That science is what only scientists can do. Instead,

science is a powerful system to ask questions and get
answers, and every person in the 21st century should be
science-literate, same way we learn math, reading, foreign
languages...
Scott Lilienfeld is a Professor of Psychology at Emory

University in Atlanta, Georgia. He received his A.B. at
Cornell University, and his Ph.D. at the University of
Minnesota; and he completed his clinical internship at
Western Psychiatric Institute and Clinics in Pittsburgh. He
is the co-author of “50 Great Myths of Popular Psychology:
Shattering Widespread Misconceptions About Human
Behavior” and most recently, “Brainwashed: The Seductive
Appeal of Mindless Neuroscience.”
1- Scott Lilienfeld: Here are few key tips to bear in mind

when evaluating scientific research:
Don’t confuse correlation with causation: merely because
two things go together doesn’t mean one causes the other.
Similarly, just because one event precedes another doesn’t

mean that it causes it.
Be aware of rival hypotheses for findings: Often the media

or even researchers themselves don’t tell you about
alternative explanations for their results that they haven’t
ruled out.
If the claim seems extraordinary – that is, if it flies in the

face of just about everything we know – don’t trust it
78
unless and until it’s been replicated by independent
investigators.
Most of all, be skeptical. That means finding a middle

ground between excessive open-mindedness and cynicism.
When evaluating scientific research, keep an open mind but
insist on compelling evidence for assertions.
2- That science is a body of knowledge; it’s not. Science is

a systematic approach to knowledge; specifically, it’s an
approach that tries to minimize human error in an effort to
get us a bit closer to the truth. So, for example, chemistry
itself isn’t a science, and neither is psychology (so when
some people say that “Psychology isn’t a science,” that’s a
tip-off that they don’t know what they’re talking about).
One can approach the study of chemistry or psychology
either scientifically or unscientifically. If one uses finely
honed research methods designed to rule out alternative
hypotheses for findings and to minimize the risk of human
error, one is doing science, regardless of what one is
studying.
Jonathan Gore received his Ph.D. in Psychology at Iowa

State University, and is an active member of the Society of
Personality and Social Psychology and the Southeastern
Society of Social Psychologists. Currently, he is an
associate professor in the Psychology Department with a
research focus on goal motivation, self-concept, and
culture. Since joining the faculty at EKU, Dr. Gore has
published over 40 articles and book chapters on these
topics as well as topics developed over the years with his
students.
79
1- Jonathan Gore: The best tip I have would be to locate
someone in the field who is willing to walk you through the
elements of empirical papers. We have a course in our
department, "Information Literacy in Psychology,"
specifically devoted to teaching students that very skill. It is
difficult to try to absorb everything in a scientific
publication without some guidance.
2-The biggest misconception I run across from people I talk

to and from students is that they think science is a belief
system. They have a perception that science involves the
beliefs that scientists have about how things work. They
don't understand the data collection and analysis aspects of
the scientific method, and when faced with a scientific
conclusion they feel that it's OK to say, "Well, I don't
believe that."
As with most misconceptions of science, I blame scientists

(myself included) for not doing a better job educating the
general public as to what science really involves (i.e., a
method of gathering information, not a belief system).
Another issue is that by the time students see me, I am
often the first teacher they've had who actually utilizes the
scientific method. Before that, they've had science teachers
who merely present science as a knowledge base or belief
system, and probably don't fully understand the data
collection and analysis procedures themselves. This is
especially true for psychology.
Richard Osbaldiston still fondly remembers his science

kit from when he was about six years old. His favorite thing
back then was the different colored flames that were
produced from burning metals. Since then, he has earned
80
advanced degrees in chemical engineering, environmental
studies, and social psychology. He teaches research
methods and statistics at Eastern Kentucky University in
Richmond, KY
1- Richard Osbaldiston: If you are interested in enhancing

your ability to read scientific literature, one tip for doing
that better is to always get an overview of whatever you are
about to read before your read it. If you were reading a
blog, newspaper story, or other article, you would typically
start with the first sentence and read through until you came
to the last sentence. That's fine for short, non-technical
reading, but for something more challenging, you have to
get the view from 30,000 feet before you go down to travel
the landscape. By flipping through the article first and
making note of the section headings and the tables and
figures, then you will have a good map in your head of
what info the article contains. Once you have the map, then
it is much easier to get to your destination.
2- One of the biggest misconceptions about science is the

belief that one (or several or even 100) counter-example to
a scientific result prove that the conclusion is wrong. For
example, scientific studies suggest that smoking causes
lung cancer. A non-scientific thinker might say My
grandfather smoked a pack a day until he was 95 years old,
and he hand no trace of cancer. That would be the
equivalent of drawing the ace of spades from a deck of
cards and then saying, Everyone who draws a card from
this deck will get the ace of spades.
All of life is a big game of probability, and science shows

us the ways to stack the odds in our favor. But the
outcomes are not guaranteed. Scientific thinkers don't look
81
for single examples; they look for big trends. And once you
identify the trends, then you are in position to use that
information wisely.
Bret Contreras, MA, CSCS, has a popular blog at

www.BretConteras.com and offers a strength training
research review service at
www.StrengthandConditioningResearch.com.
1- Bret Contreras: Realize that just like lifting weights,

learning a language, or playing sports, becoming a good
scientist takes time. I'm a way better scientist than I was 3
years ago, but there are still hundreds of guys out there who
make me look like an airhead in the science department.
Learn to utilize Pubmed and Google Scholar to find free
articles in your area of interest. Often you can find some
good literature on a particular topic just by searching for
free articles on these two sites.
When you first start out reading research, you'll probably

like the abstract the most, followed by the intro and
conclusions sections. As you advance as a researcher, you'll
start to like the methods and discussions sections even
more, and you'll even appreciate the charts and graphs.
Have patience!
2-That anecdotes don't "count." A good scientist uses all

forms of available information to inform his decisions and
conclusions, including the literature, anecdotes and
personal experiences, logic, expert opinion, and tradition.
Over time you learn as a scientist how to weight the various
forms of information, rely more on higher quality evidence,
and be more cautious with your claims.
82
Lars Avemarie, he works full time as a certified personal
trainer in Sweden, his focus is on using evidence-based
methods to fulfill clients goals. He works with a wide range
of clients- weekend warriors to exercise enthusiasts of all
ages.
1- Lars Avemarie: My first recommendation would be to

get a basic level of understanding in the area of interest by
reading an entry-level textbook, from a recommended
academic source (preferably followed by an intermediary
textbook). You could also attend a course at the local
college or a course from an open college or on courser.
My second recommendation would be to read a book

explaining scientific methods. If your area of interest is
nutrition and strength and conditioning, I strongly
recommend Mark Young's book "How to Read Fitness
Research" and "Girth Control" by Alan Aragon as well as
the book you are reading right now.
The next recommendation is evident. Start to read research,

try to read a full study front-to-back, as often as possible,
not just abstracts. Look up words you don't understand, but
brace yourself, this can be a very slow process in the
beginning.
Surround yourself with peers who agree that scientific

research is the most reliable method to evaluate results.
Discuss the implications and limitations, of the studies in
question. Train your critical thinking skills, so you can
detect when statements and claims are not based on
scientific research. A word of caution, do not put too much
validity in undocumented sources with no references, and
know that very often people are sharing their opinion, and
83
an opinion is not necessarily proof of anything.
2- That it is not reality. Now, there are different levels of

validity of studies from large systematic review or meta-
analyses of randomized controlled trials (RCT) to case
studies and expert opinion's.
But the notion that it is not real, is not a valid argument to

deem it untrue. While some studies are not done on
humans, but animals, and therefore have limited carry-over
to humans, you can still learn from these studies. Moreover,
a lot of studies in nutrition and strength and conditioning,
are done on humans, and depending on the study it can
have a large degree of carry-over.
In my experience the common misconception that scientific

research does not reflect reality is based on lack of
knowledge of the scientific method as well as being
confronted with compelling evidence against one's personal
belief.
Brad Schoenfeld, M.Sc., C.S.C.S., is an internationally

renowned fitness expert and widely regarded as one of the
leading authorities on body composition training (muscle
development and fat loss). Visit Brad’s website at
http://www.lookgreatnaked.com/blog/.
1- Brad Schoenfeld: There are several basic things that

everyone should consider when reading research, First and
foremost, don't be an "abstract quarterback" who simply
reads the abstract and then passes judgement about the
study. The abstract only provides a bare-bones overview of
the study. Often, the devil is in the details and you can end
84
up missing important aspects of the study or even get a
skewed impression of what the findings actually mean.
Second, you need to scrutinize the methodology. One of the

most important things is to look at the "generalizability" of
the research, which refers to our ability to extrapolate the
results to a given population. A study carried out in rodents
may well not have a great deal of applicability to humans
(although in many cases it does). But even human studies
have limitations for generalizability. Age, gender, injury
status, training experience, and many other factors must be
taken into account here. A study of muscle hypertrophy in
elderly women would be of limited value to young males
trying to increase muscle.
Third, and perhaps most importantly, always be skeptical.

As you read, think about what issues there might be with
each and every aspect of the study. For example, consider
whether the premise of the study is consistent with
common practice. A study that has someone perform 30
sets of leg extensions would have little to no relevance to
how people generally work out and thus might not be of
much value in drawing conclusions. Similarly, the use of
girth circumference measures (i.e. tape measurement) to
assess hypertrophy should send up a red flag as to the
reliability of these results in determining actual changes in
muscle growth. If you think with a critical eye, you gain a
greater perspective on the study and its implications.
2- I'd say the biggest misconception about research is that it

"proves" something. Research never "proves" anything. It
simply provides support for a given theory. A single
research study is merely a piece of a large puzzle. Some
studies carry more weight than others, but will invariably
85
have many limitations. It is therefore imperative to view
each study in the context of the prevailing body of literature
and weigh its merit based on methodology employed and
its relevance to the individual.
86
Guidelines for Reading Research Reports
This article presents general guidelines to consider when

reading research reports (Osbaldiston, 2011).
Consider where the report is published
I suggest that the majority of your reading should come

from peer-reviewed scientific journals. Of course, there is
useful information to be found elsewhere, but I would try to
focus mostly on reading reports published by peer-
reviewed journals. When reading a media press release on
a scientific study, be cautious about your interpretation of
the article. Reporters often do a poor job with perpetuating
the appropriate study results. Reports often lack training in
research methods and statistics and may not be qualified to
report on the study they are writing about. Other times they
may be guilty of promoting results that they feel make a
better story; even when they know the results they are
promoting are inconsistent with the study’s findings.
Why am I reading this report?
Ask yourself, why am I reading this report? It is important

to read articles that are relevant to your research interests.
There may be many articles published in your general area
of interest. However, many of these articles might not
pertain to your specific interest. Again, ask yourself why
am I reading this? Can I gain knowledge that will enhance
my understanding of my specific interests?
87
Read Abstracts
The abstract provides a basic overview of the paper. The

abstract generally contains fewer than 200 words and
provides basic information from the paper’s major sections
— introduction, methods, results and discussion. After
reading the abstract you will probably have a good idea of
whether you want to read the entire paper.
Read Headings and briefly look at other sections
If you have read the abstract and decided to read further,

then go ahead and briefly look at the various headings and
other sections. Try to get an overview of the full paper.
Focus on relevant sections
Once you have looked over the paper, you should have a
general idea of the specific areas that are relevant. It is
possible for the full paper to be relevant. As you focus in
on your areas of interest, take notes, highlight important
information and highlight references that may lead to
further reading. It is important to highlight references of
interest as you read through the paper. You probably won’t
remember the references by the time you have completed
reading. If you are reading on a computer or some other
electronic form that makes highlighting difficult, or doesn’t
allow highlighting, write the references on a piece of paper.
Locate highlighted references
Collect the highlighted references. When looking at the

references repeat the same process as mentioned above.
88
If this seems to you like a laborious task you are right. A
thorough investigation of the research is painstaking.
However, in order to gain a comprehensive understanding,
and knowledge in your area of interest, laborious activity is
essential.
89
Association Between Scientific Cognition and Scientific
Literacy: Implications for Learning Science (Hale,
Sloss, & Lawson, 2017)
Abstract- In the current research scientific literacy is

synonymous with general scientific knowledge. This form
of literacy is sometimes referred to as a form of derived
scientific literacy. Scientific cognition is not the same thing
as scientific literacy; scientific cognition involves multiple
components and sub-components. At the very least
scientific cognition involves philosophy of science,
scientific methodology, quantitative reasoning and logic.
The primary interests in the study were whether or not
scientific cognition and scientific literacy scores were
associated, and whether or not there would be gender
differences for total scores for each scale. The scientific
literacy and scientific cognition assessment consisted of
mostly questions derived from measuring devices used in
the past. The assessments were administered as part of an
online survey. The participants were 202 university
students. The study was approved by the university's
Institutional Review Board. The results indicate a positive
association between scientific literacy and scientific
cognition, and no gender differences for total scores from
the scales. Additional analyses indicate there was gender
differences for some of the items. There was gender
differences for one item from the scientific literacy
assessment and for two items from the scientific cognition
assessment. The research report includes a discussion
regarding future directions for relevant research,
implications of learning science and limitations of the
study.
Science is a large enterprise consisting of multiple
90
components. Science, although fallible, is the great reality
detector. Science is a systematic approach to knowledge.
Proper use of scientific processes lead to rationalism
(basing conclusion on intellect, logic and evidence).
Science combats dogmatism (adherence to doctrine over
rational and enlightened inquiry, or basing conclusion on
authority rather than evidence) and provides a better
understanding of the world. Scientific processes/ methods
are unmistakably the most successful processes available
for describing, predicting and explaining phenomena in the
observable universe.
The general scientific approach to knowledge is based on

systematic empiricism (Stanovich, 2007). Observation
itself is necessary in acquiring scientific knowledge, but
unstructured observation of the natural world does not lead
to an increased understanding of the world. “Write down
every observation you make from the time you get up in the
morning to the time you go to bed on a given day. When
you finish, you will have a great number of facts, but you
will not have a greater understanding of the world”
(Stanovich & Stanovich, 2003, p. 12).
Systematic Empiricism is systematic because it is structured

in a way that allows us to learn more precisely about the
world. After careful systematic observations, such as those
in controlled experiments, some causal relationships are
supported while others are rejected. Extending these
observations, scientists propose general explanations that
will explain the observations. “We could observe end-less
pieces of data, adding to the content of science, but our
observations would be of limited use without general
principles to structure them” (Myers & Hansen, 2002, p.
10).
91
The empirical approach (as used in everyday observation)
allows us to learn things about the world. However,
everyday observations are often made carelessly and
unsystematically. Thus, using everyday observations in an
attempt to describe, predict and explain phenomena is
problematic.
Discussions involving scientific literacy are ubiquitous.

Scientific literacy is conceptualized and operationalized in
various ways ( Norris & Phillips, 2003). Examples used in
defining scientific literacy include: understanding science
and its applications, knowledge of what counts as science,
general scientific knowledge, knowledge of risks and
benefits of science, making informed decisions regarding
science and technology, etcetera (DeBoer, 2000; Brennan,
1992) . Numerous scales, assessments or devices are used
to measure scientific literacy. A precise, standard
conceptualization of scientific literacy has not been
demonstrated since the origin of the concept (DeBoer,
2000). In the present study scientific literacy is
synonymous with general scientific knowledge. This form
of literacy is sometimes referred to as a type of derived
scientific literacy. Various forms of scientific literacy are
important, however other relevant science related concepts
are as important, or maybe even more important.
Scientific cognition is not the same thing as scientific

literacy; scientific cognition involves multiple components
and sub-components (Feist, 2006). Deanna Kuhn asserts
that the essence of scientific thinking is coordinating belief
with evidence (2011). At the very least scientific cognition
involves philosophy of science, scientific methodology,
quantitative reasoning and elements of logic. Scientific
cognition requires specific cognitive abilities and specific
92
elements of cognitive style (thinking disposition).
Cognitive style reflects types of thinking that occur during
typical performance conditions. That is, thinking or
engaging in tasks when not being explicitly cued to
maximize performance (Stanovich, West & Toplak, 2016).
Various scales have been developed to measure scientific
thinking / reasoning / cognition. Kahan developed a scale
called the Ordinary Science Intelligence Scale (OSI_2.0,
Kahan, 2014) and Drummond and Fischhoff (2015)
developed the Scientific Reasoning Scale (SRS).
Drummond and Fischhoff found that measures of scientific
reasoning were distinct from measures of scientific literacy,
even though there was a positive association to measures of
scientific literacy. The OSI_2.0 scale is intended to be a
measure of the capacity to recognize and making use of
scientific evidence in everyday decision making. The
OSI_2.0 scales consists of 18 items and can be divided into
four sets: scientific fact items, scientific methodology
items, quantitative reasoning and cognitive reflection items.
A measure of cognitive reflection is a measure of the
tendency to avoid providing an intuitive answer to a
problem, that on further analysis is shown to be incorrect.
Reflective processing allows one to override, by critical
processing the information, an incorrect fast response. The
SRS is intended to assess the skills needed to evaluate the
quality of scientific findings. The final version of the SRS
consisted of 11 items derived from concepts taken from
research method textbooks. Thus, scientific reasoning is a
measure of knowledge in research methodology, according
to Drummond and Fischhoff's scale. Dunbar's (2000)
research on scientific thinking used a different strategy.
Dunbar's research mostly involves examining cognitive
processes underpinning thinking during the research
process, rather than assessing scientific thinking with
93
prescribed types of measure.
Fugelsang and colleagues (2004) have examined strategies

that scientists and non-scientists use to evaluate data that is
consistent or non-consistent with expectations. Others have
conducted research on scientific reasoning, but the intent
here is not a comprehensive review of past research, but
rather a brief mention of research that had an impact on the
current study.
The primary interests in the current study were whether or

not scientific cognition and scientific literacy scores were
associated, and whether or not there would be gender
differences for total scores for each scale. A positive
association between scores on the scientific cognition
assessment and scientific literacy assessment was
predicted. We didn't have a prediction regarding whether
or not there would be differences, between men and
women, for total scores on the scientific cognition and
scientific literacy assessment.
Method
Participants
Participants in the study were 204 students from Eastern

Kentucky University. The sample consisted of 160 females
and 42 males. Participants received partial course credit in
exchange for their participation. Participation was
voluntary, and participants could terminate their
involvement at any time during the study without penalty.
Before participating in the study all participants reported
that they had no conditions that would prevent them from
being part of the study. All participants completed an online
94
survey that included a scientific cognition and a scientific
literacy assessment. The study was approved by the
university's Institutional Review Board.
Materials
The scientific literacy assessment contained 14 questions

that were presented as part of an online survey. Some of the
questions on the scale were derived from previous research
used to assess general scientific knowledge (Kahan, 2016;
National Science Board, 2014; Pew Research Center,
2013). The type of questions were true or false and multiple
choice. Sample items from the assessment include:
Some food does not contain chemicals? A) True B)

False
All mental processes are generated from which

organ? A) Heart B) Brain C) Not all mental
processes are generated from an organ
Lasers work by intense focusing of which of the

following: A) Sound waves B) Light waves
C) Neither
The scientific cognition assessment contained 14 questions

that were presented as part of an online survey. The
questions were derived from previous research used to
assess components of scientific thinking including, research
methodology, philosophy of science, quantitative reasoning
(Drummond & Fischhoff, 2015; Kahan, 2016; Stanovich).
The type of questions were true or false, multiple choice
and short answer. Sample items from the assessment
include:
95
A scientific theory is defined as: A) A
comprehensive explanation of some aspect of
nature that is supported by a vast body of evidence
B) An educated guess used to explain an aspect of
nature C) An explanation that has not been tested
A new medical treatment was designed to treat a

serious health problem. Using the information
provided below decide whether the treatment was
effective: 200 people were given treatment and
improved - 75 people were given the treatment
and did not improve - 50 people were not given
the treatment and improved - 15 people were not
given the treatment and did not improve A)
Treatment was effective B) Treatment was not
effective
If you roll a fair, six-sided die 1000 times how many

times do you think the die would come up as an
even number?
Procedure
The study took place online. Participants used an on-line

research sign-up system to sign up for the study. An
informed consent was signed before participating and
participants verified that there were no conditions, such as
learning disorders, that may influence performance. Before
answering questions from the survey participants were
given the following instructions:
"Please provide a response for every question. If you are

given the option to decline to answer a question, then
declining to answer is considered a response." There were
96
29 questions on the survey. The first 14 questions were
scientific literacy questions; the next 14 questions were
scientific cognition questions, and the last question was a
question asking if the participant was male or female.
Participants were given up to 25 minutes to complete the
study. Upon completion of the survey a debriefing
statement was provided.
Results
A bivariate analysis was conducted to test the hypothesis

that there would be an association between scientific
cognition and scientific literacy. The results of the analysis
support the hypothesis, r (200)= +.33, p < .01 (two-tailed),
r2 = .11. Independent samples t-tests were conducted to test
whether or not there was gender differences on scores from
the scientific cognition and scientific literacy scale. The
results of the independent samples t-test, for men (M =
9.36, SD = 2.63) and women (M = 9.60, SD = 2.31) using
total scores on scientific cognition as the dependent
variable was not statistically significant, t (200) = .59, p >
.05 (two-tailed), d = .10. The results of the independent
samples t-test, for men (M = 10.52, SD = 1.69) and women
(M = 10.31, SD = 1.64) using total scores on scientific
literacy as the dependent variable were not statistically
significant, t (200) = .76, p > .05 (two-tailed), d = .13.
Total correct responses ranged from 3-14 on the scientific

cognition and scientific literacy assessment. Each correct
answer on the assessment counted as one point; 14 points
was the highest score possible for each of the assessments.
The total percentage of correct answers, for individual
items, varied on the scientific literacy assessment from
20.9% - 94.5%. The question most often answered
97
incorrectly was an astrology question, and the question
most often answered correctly was a question involving the
earth's orbit of the sun. The total percentage of correct
answers, for individual items, varied on the scientific
cognition assessment from 46.7% - 82.6%. The question
most often answered incorrectly was a question involving a
covariation task, and the question most often answered
correctly involved estimating chances of winning a dollar.
Table 1 gives the total percentages of correct answers for
each item.
Table 1
Individual Items Correct % Responses
Item Sci-Literacy Sci-Cognition

1 36.1 56.4
2 87.6 72.1
3 89.6 69.7
4 93.6 80.1
5 85.9 82.6
6 88.6 46.7
7 82.2 75.9
8 90.6 82.4
9 80.1 70.4
10 94.5 77.2
11 32 56.9
12 85.1 66.3
13 20.9 52.5
14 72 77.2
Chi-square tests were conducted to test whether or not there

was gender differences, for individual item scores, from
98
both assessments. The results of the chi-square tests
indicate an association between gender (men vs. women)
and responses (correct vs. incorrect) for three items from
the online survey; one of the items from the scientific
literacy assessment and two of the items from the scientific
cognition assessment. The results of a chi-square test using
gender (men vs. women) and responses to scientific literacy
question no. 9 (correct vs. incorrect) as factors was
statistically significant, χ2 (1, N = 202) = 5.12, p < .05.
Men were more likely to produce a correct response for
scientific literacy question no. 9 than was expected.
Scientific literacy question no. 9 was "[w]hich of the
following are smaller than atoms a) proteins b) electrons c)
amino acids." The correct answer is b. The results of a chi-
square test using gender (men vs. women) and responses to
scientific cognition question no. 3 (correct vs. incorrect) as
factors was statistically significant, χ2 (1, N = 202) = 5.57,
p < .05. Women were more likely to produce a correct
response for scientific cognition question no. 3 than
expected. Scientific cognition question no. 3 was "[t]he
falsification criteria in the context of science suggests a) If
a scientific claim is proven then it is not false b) False
claims are not accepted c) In order for a claim to be
scientific it must be testable." The correct answer is c. The
results of a chi-square test using gender (men vs. women)
and responses to scientific cognition question no. 9 (correct
vs. incorrect) as factors was statistically significant, χ2 (1,
N = 202) = 7.05, p < .05. Men were more likely to produce
a correct response for scientific cognition question no. 9
than was expected. Scientific cognition question no. 9 was
"[i]n the universal lottery, the chances of winning a prize
are 1%. How many people do you think would win a prize
if 1000 people buy a single ticket?" The correct is answer is
10.
99
Discussion
The results show a positive association between scientific

cognition and scientific literacy. The association was
moderate in strength. The differences between men and
women for total scores on scientific cognition and scientific
literacy were not significant. The results indicate an
association between gender (men vs. women) and
responses (correct vs. incorrect) for three items from the
online survey; one of the items from the scientific literacy
assessment and two of the items from the scientific
cognition assessment.
Scientific cognition and scientific literacy are measured on

a continuum. Results from a study conducted by
Drummond and Fischhoff (2015) show a positive
association between the Scientific Reasoning Scale (SRS)
and two widely used measures of scientific literacy, the
Trend Factual Knowledge of Science Scale (TFKSS) and
the Understanding of Scientific Inquiry Scale (USIS). The
strength of the association was moderate, similar to our
findings regarding the association between scientific
cognition and scientific literacy. The SRS assesses skills
needed to evaluate scientific evidence; the scale consists of
items related to research methodology. The TFKSS
assesses knowledge of scientific concepts. The USIS
assesses knowledge or research methodology and
probability. The scientific literacy assessment used in this
study is similar to the TFKSS, as it is an assessment of
knowledge of general scientific concepts. These scientific
literacy scales have been used often in the field of public
understanding of science (Alum, Sturgis, Tabourazi, &
Brunton-Smith, 2008). The scientific cognition assessment
is similar to the SRS and the USIS, as it involves questions
100
regarding research methodology. Similar to the USIS it
also involves questions regarding probability (quantitative
reasoning). In addition, the scientific cognition assessment
involves items that require knowledge in the philosophy of
science.
In contrast to the finding that total scores, for men, on a

general scientific knowledge test were better than for
women (Sloss & Hale, Paper Forthcoming) we found no
significant differences. Also, there were no differences
between men and women for total scores on the scientific
cognition assessment. There were significantly different
scores between men and women for one item from the
scientific literacy assessment and two items from the
scientific cognition assessment. Men scored better on one
item from the scientific literacy assessment and one item
from the scientific cognition assessment. The item,
question no.9, for which men scored better from the
scientific literacy assessment involved a chemistry
question. Question no. 9 was "[w]hich of the following are
smaller than atoms a) proteins b) electrons c) amino acids."
The correct answer is b. Past research indicates a difference
in scores for some chemistry related items. A study
comparing the performance of boys and girls in the
Australian National Chemistry Quiz found no differences
on some of the questions, but on some of the questions
boys performed better than girls (Walding, Fogliani, Over,
& Bain, 1994). There were other chemistry items on the
scientific literacy assessment, but there were no gender
differences for chemistry items other than question 9. Men
scored better on an item (question no.9), involving
quantitative reasoning, from the scientific cognition
assessment. Question no. 9 was "[i]n the universal lottery,
the chances of winning a prize are 1%. How many people
101
do you think would win a prize if 1000 people buy a single
ticket?" The correct is answer is 10. A gender difference on
a task involving quantitative reasoning is in agreement with
the scientific literature that demonstrates better
performance of males regarding quantitative reasoning
(Friedman, 1989; Leahey, & Guo, 2001). The only gender
difference on quantitative reasoning occurred for question
no. 9; there were no differences for other items involving
quantitative reasoning. Women scored better on an item
(question no.3) involving the philosophy of science.
Question no. 3 was "[t]he falsification criteria in the
context of science suggests a) If a scientific claim is proven
then it is not false b) False claims are not accepted c) In
order for a claim to be scientific it must be testable." The
correct answer is c. The concept of falsification is one of
the most discussed concepts in the philosophy of science.
The concept is taught in low level research methods
courses and philosophy of science courses. We weren't able
to locate studies that investigated differences between
genders regarding philosophy of science. It is unclear why
the gender difference occurred on this task. There were
other philosophy of science questions on the scale, but
there were no gender differences on those tasks. Gender
differences are often found when comparing scoring for
individual items. A study investigating gender differences,
for Hong Kong students, didn't find significant differences
for total score in scientific literacy, but differences were
found for components of scientific literacy (Yan Yip, D.,
Ming Chiu, M., & Chu Ho, E., 2004). Scientific literacy as
conceptualized in that study was different than the
conceptualization used in the current study. Scientific
literacy ,in the study of Hong Kong students, consisted of
five components: "understanding concepts, recognizing
questions, identifying evidence, drawing conclusions,
102
communicating conclusions." Females scored significantly
higher in "recognizing questions" and "identifying
evidence" while boys scored higher in "understanding
concepts." These components demonstrate various elements
involved with scientific thinking. To reiterate, our
conceptualization of scientific literacy, is that scientific
literacy demonstrates general scientific knowledge.
Scientific literacy has a much broader definition in the
Hong-Kong study than the definition we used.
Another important finding in the current study was that
students confused science with pseudoscience. The
overwhelming majority of students (79%) in the current
study report that astrology is scientific, or is at least partly
scientific. Only twenty one percent of participants in the
study answered the following question correctly: "Which
of the following statements are true? A) Astrology is not at
all scientific B) Astrology is partly scientific C) Astrology
is a legitimate field of scientific study." The correct answer
is A. The astrology question is an item from the scientific
literacy assessment. The results from a study conducted by
Sugarman and colleagues (2011) found that majority of
students (78%) considered astrology at least sort or
scientific. Only 52% of science majors indicated that
astrology was “not at all” scientific. Those finding are
similar to what we found. Astrology has no scientific
validity, although at one time it was considered a science
by some. Newspapers and magazines dedicate sections to
horoscopes, and belief in astrology is prevalent in western
society. This exposure to astrology as a legit domain
probably has a strong influence regarding belief in the
scientific validity of astrology. Cognitive priming is often
powerful, and may modulate beliefs, even when priming is
used to promote pseudo-science. Some people may
confuse astrology with astronomy; astrology has origins
103
associated with positional astronomy. This confusion may
lead to an incorrect response regarding the scientific
validity of astrology. A high level of scientific literacy and
scientific cognition may serve as safeguards against these
sort of pseudo-scientific beliefs.
The question most often answered incorrectly, from the

scientific cognition assessment, was a question involving
a covariation task. The question was presented as "A new
medical treatment was designed to treat a serious health
problem. Using the information provided below decide
whether the treatment was effective: 200 people were given
the treatment and improved 75 people were given the
treatment and did not improve 50 people were not given the
treatment and improved 15 people were not given the
treatment and did not improve A) Treatment was effective
B) Treatment was not effective." The probability that the
treatment is effective is (200/275) .727. The probability that
the treatment is not effective is (50/65) .769. The answer
is B. Approximately 53% of the students answered the
question incorrectly. The incorrect response given to this
question stems from at least two key cognitive errors: too
much focus on the large number of people for which
improvement occurred following treatment and a focus on
the fact that more people who received treatment showed
improvement than showed no improvement (Stanovich,
2009).
The survey used in this study consisted of items derived

from other assessment tools, as well as questions designed
by the researchers, similar to those from past studies. Some
of the questions on the survey were designed by
researchers involved with this study. Thus, it follows that
the constructs of scientific literacy and scientific cognition
104
were validly measured. A more comprehensive measure
may require assessments consisting of more items. It is also
possible that measures of these concepts may yield
different results inside and outside the laboratory. Different
conceptualizations of scientific literacy and scientific
cognition require different measuring devices.
The study involved non-probability sampling. The

participants in the study were college students, who have a
great variability in scientific knowledge and variability in
the number of science courses completed. Some of the
students had taken higher level courses in research methods
and stats, and it is reasonable to suggest that some of the
students had probably taken philosophy of science courses
and other courses that may have had an impact on
performance. The external validity of this study is limited.
Non-student samples and samples of other students may
provide different results.
Future research could include using the scientific literacy

assessment and scientific cognition assessment in a variety
of contexts. The assessments could be revised and
expanded in an effort to increase sensitivity and make them
more comprehensive. Further investigation of gender
differences as related to specific items from the assessment
may be beneficial. A key focus for extensive investigation
is the development of a model that allows, at least a basic
framework, that can be used in teaching students and the
general public. This type of investigation requires a
multidisciplinary approach and a line of studies involving
different science related areas. The cognitive processes
underpinning scientific cognition are important and can be
extended to various situations. To reiterate, scientific
cognition is about much more that remembering scientific
105
theories, laws and principles. Scientific cognition is
essentially analytical thinking that can be used, and should
be used in a wide range of conditions. At the very least in
an effort to develop better scientific cognition students
should be educated in the areas of the philosophy of
science, research methodology, quantitative reasoning
(probabilistic reasoning) and logic. These components are
involved with scientific thinking. Science educators and the
media do a disservice when they promote science and its
wide range of relevant concepts as "just" being able to
remember scientifically derived information, or promoting
science as if it is all about a just having a sense of
"wonder." Being able to recollect scientific facts and
having a sense of wonder is important regarding science,
but those qualities alone do not ensure high levels of
scientific thinking. Assessment tools may help predict
scientific eminence and be used as screening tools when
hiring or considering admissions to college programs.
More research needs to be done regarding scientific literacy
and scientific cognition. Both of these concepts involve
related cognitive mechanisms, and being knowledgeable in
these areas will have positive consequences. Society is
heavily dependent on science and technology, and these
complex endeavors require complex thinking. We would
like to see future research indicating a high positive
association between scientific cognition and scientific
literacy. A moderate association is not satisfactory.
References
Alum, N., Sturgis, P., Tabourazi, D., Brunton-Smith, I.

(2008). Science knowledge and attitudes across
cultures: A meta-analysis. Public Understanding of
Science, 17(1), 35-54. doi:
106
10.1177/0963662506070159.
Brennan, R.P. (1992). Dictionary of Scientific Literacy.

New York, NY: John Wiley & Sons, Inc.
DeBoer, G.E. (2000). Scientific Literacy: Another look at

its historical and contemporary meanings and its
relationship to science education reform. Journal of
Research in Science Teaching, 37, 582-601.
Drummond, C., & Fischhoff, B. (2015). Development and

Validation of the Scientific Reasoning Scale.
Journal of Behavioral Decision Making. doi:
10.1002/bdm.1906.
Dunbar, K. (2000). How Scientists Think in the Real

World: Implications for Science Education. Journal
of Applied Developmental Psychology, 21(1), 49-
58.
Feist, G.J. (2006). The Psychology of Science and the

Origins of the Scientific Mind. New Haven, CT:
Yale University Press.
Fugelsang, J.A., Stein, C.B., Green, A.E., & Dunbar, K.N.

(2004). Theory and Data Interactions of the
Scientific Mind: Evidence From the Molecular and
Cognitive Laboratory. Canadian Journal of
Experimental Psychology, 58(2), 86-95.
Kahan, D. (2016 ). “Ordinary science intelligence”: A

science-comprehension measure for study of risk
and science communication, with notes on
evolution and climate change. Journal of Risk
107
Research, 1–22,
doi.org/10.1080/13669877.2016.1148067
Friedman, L. (1989). Mathematics and the gender gap: A

meta-analysis of recent studies on sex
differences in mathematical tasks. Review of
Education Research, 69, 185-213.
Leahey, E., & Guo, G. (2001). Gender differences in

mathematical trajectories. Social Forces, 80(2),
713-732.
Kuhn, D. (2011). What is scientific thinking and how does

it develop? In U. Goswami (Eds.), The Wiley-
Blackwell Handbook of Childhood Cognitive
Development 2nd Edition, 497-523. Hoboken, NJ:
Wiley-Blackwell.
Myers, A., & Hansen, C. (2002). Experimental Psychology.

Pacific Grove, CA: Wadsworth.
National Science Board. (2014). Science and engineering

indicators 2014. Arlington VA: National Science
Foundation (NSB 14-01).
Norris, S.P., & Phillips, L.M. (2003). How literacy in its

fundamental Sense is central to scientific literacy.
Science Education, 87(2), 224-240.
Pew Research Center for the People & the Press. (2013).
Public's Knowledge of Science and Technology.
Pew Research Center, Washington D.C.
Sloss, G.S., & Hale, J. (Paper Forthcoming). Knowledge in,
108
belief in and attitudes toward science.
Stanovich, K. (2007). How To Think Straight About

Psychology 8th Edition. New York, NY: Pearson
Stanovich, K. (2009). What Intelligence Tests Miss: the

psychology of rational thought. London: Yale
University Press.
Stanovich, P., & Stanovich, K. (2003). Using Research and

Reason in Education: How Teachers Can Use
Scientifically Based Research to Make Curricular &
Instructional Decisions. Natioanl Institute of
Literacy.
Stanovich, K.E., West, R.F., & Toplak, M.E. (2016). The

Rationality Quotient. Toward a Test of Rational
Thinking. Cambridge, MA: The MIT Press.
Sugarman, H., Impey, C., Buxner, S., & Antonellis, J.

(2011). Astrolgoy beliefs among undergraduate
students. Astronomy Education Review, 10, 010101-
1, 10.3847/AER2010040.
Walding, R., Fogliani, C., Over, R., & Bain, J.D. (1994).
Gender differences in response to questions on the
Australian national chemistry quiz. Journal of
Research in Science Teaching, 31(8), 833-846.
109
Yan Yip, D., Ming Chiu, M., & Sui Chu Ho, E. (2004).
Hong Kong student achievement in OECD-PISA
Study: Gender differences in science content,
literacy skills, and test item formats. International
Journal of Science and Mathematics Education, 2,
91–106.
110
Analytical Reading: Primary Scientific Literature- Key
points (Jones & Hale)
Abstract
The purpose of the current paper is to present a pedagogical
method for teaching students to read analyze, and evaluate
research methodology and conclusions in primary scientific
literature. Analytical reading of primary scientific literature
is an essential skill for advanced undergraduate and
graduate students. Evaluating research involves healthy
criticism and debate. Students should be introduced to this
process of criticism and analysis early and throughout their
college careers. These are skills students can use for their
own research papers, theses, and dissertations, and can also
ensure future clinical practice is evidence-based. The
present method is grounded in research on cognitive and
learning psychology and provides a structure for
developing analytical reading skills in the classroom. Our
conclusions are supported primarily by teaching
evaluations, personal communications with students, and
experience. The method presented is a practical method for
utilizing findings from educational, teaching, and
psychological research in the classroom.
Key points:
This paper describes a method to successfully develop

these skills in both advanced undergraduates and graduate
students in exercise physiology. It is inspired by the
“Learning Paradigm” of higher education described by Barr
and Tagg (1995). This method focuses the attention on
learning outcomes using many different types of teaching –
lecture, discussion, reading, and writing.
111
The analytical reading method begins with a lecture on the
layout of a scientific paper along with readings to provide
additional information and to serve as reference materials
throughout the course. Students are taught to be able to
distinguish among different sources, methodologies and
elements of scientific papers.
In an effort to assess the students level of ability in relevant

skills a short questionnaire about previous coursework and
or a self-assessment might be administered at the beginning
of the semester. These type of assessments may help
instructors design classes that coordinate difficulty with the
current ability (Lazarowitz & Lieb, 2006).
The scientific literature critique process involves

developing answers to questions about various studies.
Students can be provided with a worksheet to fill out or
answer the questions with a short writing assignment to be
submitted before class.
Students are provided with a question-based rubric that is

used to evaluate each work of primary literature. Ideally
students will complete these assignments and submit them
electronically prior to class, either via email or upload to a
learning management system such as Blackboard or
Moodle.
The format of the rubric is flexible, but several core

questions are essential for analysis. The rubric is divided
into different sections that correspond to the sections
typically found in a scientific research paper.
The method mentioned here has been used in graduate level

physiology, nutrition, and exercise physiology courses, and
112
in advanced undergraduate seminars at two different higher
education institutions. The first institution was a larger
research institution with graduate and undergraduate
students and the other a small liberal arts college. Student
feedback at both sites, from on standardized institutional
student evaluation and through personal communication
with students, has been consistently positive.
Research on the specific method described in this paper

should involve multiple sections of the same course taught
by the same instructor. Those sections that used analytical
reading method could be compared to sections that were
taught in a more traditional lecture format. Learning
outcomes could be assessed using qualitative, quantitative,
or mixed assessments.
Full paper available upon request.
113
Chapter 2
The Need for Rationality
In chapter two, rationality is discussed. Rationality consists

of two broad categories- instrumental and epistemic
rationality. Rational thinking skills are important. They are
as important as intelligence. Intelligence and rationality are
often dissociated. Research demonstrates that intelligence
is often a weak predictor of rationality. This has been
shown over a wide range of studies. Intelligence is
important, but there is more to good thinking than
intelligence. Intelligence reflects reasoning abilities across
a wide variety of domains, particularly novel ones. In
addition, intelligence reflects general declarative
knowledge acquired through acculturated learning.
Rationality reflects appropriate goal setting, goal
optimization, and holding evidence-based beliefs.
114
Developing The RQ Test
The following interview was conducted with Dr. Keith

Stanovich. Dr. Stanovich is the author of What Intelligence
Tests Miss: the psychology of rational thought (Stanovich,
2009), the recipient of many prestigious awards, and
recognized as one of the most important cognitive scientists
ever. Visit Dr. Stanovich’s site at
http://www.keithstanovich.com/Site/Home.html
Congratulations on the three-year grant that you and a

Richard West received from the John Templeton
Foundation to develop a comprehensive test of rational
thinking. Do you think the test will be completed in
three years? Will we see it being put to use in three
years?
Thanks very much. We were very happy and flattered to
receive the grant. We are fairly certain that we will have
completed a prototype of a comprehensive test that could
be used in scientific work by the end of the three-year
grant. There of course will still be substantial work to do
after that to make it useful in applied settings such as
education, business, and industry. For example, subsequent
to us producing the research prototype, there will be much
more standardization work to be done to make it useful in
applied settings. However, I think that it is totally realistic
to think that we will have a comprehensive instrument
ready for scientific use in just two and a half years–we are
about six months into the grant now.
115
Your work shows that individuals can rate high in
intelligence, and at the same time, rate low in
rationality. Is it likely that an individual will rate low in
intelligence but high in rationality?
Yes, that is a very good question. It is important to realize
that those two outlier states will not occur with equal
frequency. By outlier states I mean people who are low in
rationality and high in intelligence, and then also the
converse state, people who are high in rationality and low
in intelligence. The former will be much more frequent
than the latter. For many types of rational thinking
subcomponents intelligence is necessary but not sufficient.
Also, with respect to many different rational thinking
components, there are at least mild to moderate correlations
with intelligence. Only on a few rational thinking
components–myside bias for example–is it the case that the
rational thinking component is totally disassociated from
intelligence. On those few tasks there will indeed be as
many individuals high in rationality and low in intelligence
as there are low in rationality and high in intelligence. But
that will be the minority of cases.
Another way to put it is to say that we already know from

the past research that led up to the grant that there is a
profile of associations between intelligence and rational
thinking subcomponents that is quite varied. A few rational
thinking tasks such as belief bias in syllogistic reasoning
are quite highly correlated with intelligence. Most rational
thinking skills are modestly correlated with intelligence.
The use of base rates in probabilistic reasoning would be an
example. And finally there are those like myside bias that
are quite dissociated. Those differing profiles will lead to
somewhat different outlier groups. Rational thinking is
quite multifarious, much more so than intelligence, so any
116
given statement about individual differences may vary
quite a bit across the subcomponents. The answer to your
question here will probably vary quite a bit across the
different subcomponents.
In your excellent book What Intelligence Tests Miss: the

psychology of rational thought you point out the
irrational thinking habits of George Bush. Why did you
choose George Bush as the example? Have you received
any negative comments concerning the discussion of
Bush's irrational thinking tendencies?
I chose Bush because he was such a surprising example
dysrationalia: the failure to think rationally despite
adequate intelligence. He was a surprising example because
most people would not grant his intelligence. But as I point
out in the book, this is because they are confused about
what intelligence is. And that is equally true of his
supporters and his detractors.
Bush’s detractors described him as taking disastrously

irrational actions, and they seemed to believe that the type
of poor thinking that led to those disastrous actions would
be picked up by the standard tests of intelligence.
Otherwise, they would not have been surprised when his
scores were high rather than low. Thus, the Bush detractors
must have assumed that a mental quality (rational thinking
tendencies) could be detected by the tests that in fact the
tests do not detect at all.
In contrast, Bush’s supporters like his actions but admit that

he has “street smarts,” or common sense, rather than
“school smarts.” Assuming his “school smarts” to be low,
and further assuming that IQ tests pick up only “school
smarts,” his supporters were likewise surprised by the high
117
pro-rated IQ scores that were indicated. Thus, his
supporters missed the fact that Bush would excel on
something that was assessed by the tests. The supporters
assumed the tests measured only “school smarts” in the
trivial pursuit sense (“who wrote Hamlet?”) that is easily
mocked and dismissed as having nothing to do with “real
life.” That the tests would actually measure a quality that
cast Bush in a favorable light was something his supporters
never anticipated.
In the talks that I give on these topics, when I use the Bush
example I tried to head off questions and negative
comments by pointing out that there is an absolute
consensus that there was something wrong with his
thinking style and that this fact is not in dispute–that even
his supporters acknowledge this fact. For example, in a
generally positive portrait of the President, David Frum
nonetheless notes, “he is impatient and quick to anger;
sometimes glib, even dogmatic; often uncurious and as a
result ill-informed”. Conservative commentator George
Will agrees, when he states that in making Supreme Court
appointments, the President “has neither the inclination nor
the ability to make sophisticated judgments about
competing approaches to construing the Constitution” (p.
23, 2005). In short, there is considerable agreement that
President Bush’s thinking has several problematic aspects:
lack of intellectual engagement, cognitive inflexibility,
need for closure, belief perseverance, confirmation bias,
overconfidence, and insensitivity to inconsistency.
What is contaminated mindware? How do we

discourage the acquisition of contaminated mindware?
Harvard cognitive scientist David Perkins coined the term
118
mindware. Mindware refers to the rules, knowledge,
procedures, and strategies that a person can retrieve from
memory in order to aid decision making and problem
solving. Most mindware is helpful and good for us.
However, some acquired mindware can be the direct cause
of irrational actions that thwart our goals. This type of
mindware I have termed contaminated mindware.
In my writings, I have discussed four rules for avoiding

contaminated mindware:
1. Avoid installing mindware that could be physically

harmful to you, the host.
2. Regarding mindware that affects your goals, make sure
the mindware does not preclude a wide choice of future
goals.
3. Regarding mindware that relates to beliefs and models
of the world, beliefs, seek to install only mindware that is
true—that is, that reflects the way the world actually is.
4. Avoid mindware that resists evaluation.
Rules #1 and #2 are similar in that they both seek to

preserve flexibility for the person if his/her goals should
change. For example, there is in fact some justification for
our sense of distress when we see a young person adopt
mindware that threatens to cut off the fulfillment of many
future goal states (early pregnancy comes to mind, as do
the cases of young people joining cults that short-circuit
their educational progress and that require severing ties
with friends and family).
Rule #3 serves as a mindware check in another way. The

reason is that beliefs that are true are good for us because
accurately tracking the world helps us achieve our goals.
119
Almost regardless of what a person’s future goals may be,
these goals will be better served if accompanied by beliefs
about the world which happen to be true. Obviously there
are situations where not tracking truth may (often only
temporarily) serve a particular goal. Nevertheless, other
things being equal, the presence of the desire to have true
beliefs will have the long-term effect of facilitating the
achievement of many goals.
Parasitic mindware, rather than helping the host, finds

tricks that will tend to increase its longevity. Subverting
evaluation attempts is one of the most common ways that
parasitic mindware gets installed in our cognitive
architectures. Hence rule #4—avoid mindware that resists
evaluation. Here we have a direct link to the principle of
falsifiability that is so critical in philosophy of science. In
science, a theory must go out on a limb, so to speak. In
telling us what should happen, the theory must also imply
that certain things will not happen. If these latter things do
happen, then we have a clear signal that something is
wrong with the theory. An unfalsifiable theory, in contrast,
precludes change by not specifying which observations
should be interpreted as refutations. We might say that
such unfalsifiable theories are evaluation disabling. By
admitting no evaluation, they prevent us from replacing
them, but at the cost of scientific progress.
Why is rationality as important as intelligence?
In order to illustrate the oddly dysfunctional ways that

rationality is devalued in comparison to intelligence, I often
embellish on a thought experiment first imagined by
cognitive psychologist Jonathan Baron in a 1985 book.
Baron asks us to imagine what would happen if we were
120
able to give everyone an otherwise harmless drug that
increased their algorithmic-level cognitive capacities (for
example, discrimination speed, working memory
capacity)—in short, that increased their intelligence.
Imagine that everyone in North America took the pill
before retiring and then woke up the next morning with
more memory capacity and processing speed. Both Baron
and I believe that there is little likelihood that much would
change the next day in terms of human happiness. It is
very unlikely that people would be better able to fulfill their
wishes and desires the day after taking the pill. In fact, it is
quite likely that people would simply go about their usual
business—only more efficiently! If given more memory
capacity and processing speed, people would, I believe:
carry on using the same ineffective medical treatments
because of failure to think of alternative causes; keep
making the same poor financial decisions because of
overconfidence; keep misjudging environmental risks
because of vividness (Chapter 6); play host to contaminated
mindware of Ponzi and pyramid schemes; be wrongly
influenced in their jury decisions by incorrect testimony
about probabilities; and continue making many other of the
suboptimal decisions described in several of my books.
The only difference would be that they would be able to do
all of these things much more quickly! Instead, because of
inadequately developed rational thinking abilities—because
of the processing biases and mindware problems I have
discussed in my books—physicians choose less effective
medical treatments; people fail to accurately assess risks in
their environment; information is misused in legal
proceedings; millions of dollars are spent on unneeded
projects by government and private industry; parents fail to
vaccinate their children; unnecessary surgery is performed;
animals are hunted to extinction; billions of dollars are
121
wasted on quack medical remedies; and costly financial
misjudgments are made.
Unfortunately, these examples are not rare. We are all

affected in numerous ways when such contaminated
mindware permeates society—even if we avoid this
contaminated mindware ourselves. Pseudosciences such
as astrology are now large industries, involving newspaper
columns, radio shows, book publishing, the Internet,
magazine articles, and other means of dissemination. The
House of Representatives Select Committee on Aging has
estimated that the amount wasted on medical quackery
nationally reaches into the billions. Physicians are
increasingly concerned about the spread of medical
quackery on the Internet and its real health costs.
Pseudoscientific beliefs appear to arise from a complex
combination of thinking dispositions, mindware gaps, and
contaminated mindware. Pseudoscientific beliefs are
related to the tendency to display confirmation bias, failure
to consider alternative hypotheses, to ignore chance as
explanation of an outcome, to identify with beliefs and not
critique them, and to various fallacies in probabilistic
thinking. These rational thinking attributes are very
imperfectly correlated with intelligence.
122
Good Thinking: More Than Just Intelligence
Are intelligent people good thinkers? Some are, some are

not. Society is replete with examples of intelligent people
doing foolish things. There is a plethora of scientific data
showing intelligence does not necessarily predict
rationality. Intelligence shows a low to moderate
association with some critical thinking / rational thinking
skills, while showing little to no association with other
rational thinking skills. In one study Stanovich & West
(2008) investigated two key critical thinking skills-
avoidance of myside bias and avoidance of one side bias.
The abstract is provided below:
“Abstract
Two critical thinking skills—the tendency to avoid myside
bias and to avoid one-sided thinking—were examined in
three different experiments involving over 1200
participants and across two different paradigms. Robust
indications of myside bias were observed in all three
experiments. Participants gave higher evaluations to
arguments that supported their opinions than those that
refuted their prior positions. Likewise, substantial one-side
bias was observed—participants were more likely to prefer
a one-sided to a balanced argument. There was substantial
variation in both types of bias, but we failed to find that
participants of higher cognitive ability displayed less
myside bias or less oneside bias. Although cognitive ability
failed to associate with the magnitude of the myside bias,
the strength and content of the prior opinion did predict the
degree of myside bias shown. Our results indicate that
cognitive ability—as defined by traditional psychometric
indicators—turns out to be surprisingly independent of two
of the most important critical thinking tendencies discussed
123
in the literature.”
Key cognitive skills required for critical thinking are the

ability to evaluate evidence in an objective manner, and the
ability to consider multiple points of view when solving a
problem, or coming to a conclusion. Many people fail to
demonstrate these critical thinking tendencies. Myside bias
is displayed when people evaluate evidence and come to
conclusions that are biased towards their own beliefs and
opinions. One side bias is demonstrated when people prefer
one sided arguments over arguments presenting multiple
perspectives. Intelligent people are just as likely as less
intelligent people to demonstrate these thinking biases. It is
important to mention that intelligence in this context refers
to cognitive abilities measured by popular intelligence tests
and their proxies. These tests do a good job assessing
computational power and certain types of declarative
knowledge. However, they do not adequately assess critical
thinking skills. Avoidance of myside bias and one side bias
are not measured on intelligence tests. It seems that
intelligence tests are missing an important element of good
thinking- evaluating evidence in a unbiased manner, and
considering different perspectives when problem solving.
In a series of experiments Stanovich and West examined

the association between cognitive ability and two cardinal
critical thinking skills- avoidance of myside bias and
avoidance of one side bias. In Experiment 1 natural myside
bias was investigated in 15 different propositions. In
Experiment 2 myside bias and one-sided bias was studied.
In Experiment 3 associations between thinking
dispositions- in addition to cognitive ability- and one-sided
and myside bias were investigated.
124
In Experiment 1, the researchers concluded, there was "no
evidence at all that myside bias effects are smaller for
students of higher cognitive ability" (p.140). The main
purpose of Experiment 2 was to investigate the association
of cognitive abilities with myside and one side bias. "The
results... were quite clear cut. SAT total scores displayed a
nonsignificant 7.03 correlation with the degree of myside
bias and a correlation of .09 with the degree of one-side
bias (onebias1), which just missed significance on a
twotailed test but in any case was in the unexpected
direction" (p.147). It was also revealed that stronger beliefs
usually imply heavier myside bias. In Experiment 3 "the
degree of myside bias was uncorrelated with SAT scores",
and "[t]he degree of one-side bias was uncorrelated with
SAT scores" (p.156). Myside bias was weakly correlated
with thinking dispositions. One side bias showed no
correlation with thinking dispositions.
The final two sentences or the research report read: "Our

results thus indicate that intelligence—as defined by
traditional psychometric indicators—turns out to be
surprisingly independent of critical thinking tendencies.
Cognitive ability measures such as the SAT thus miss
entirely an important quality of good thinking" (p.161). The
good news is critical thinking abilities are malleable, and in
fact, probably more malleable than intelligence.
125
Intelligence and Rationality: different cognitive skills
Society is replete with examples of intelligent people doing

foolish things. This seems puzzling considering that
intelligent people (as indicated by intelligence tests and
their proxies- SAT, etc.) are generally thought of as
rational, smart people. So, it may come as a surprise to find
out that intelligent people are not necessarily rational
people. It may surprise some people to learn there is more
to good thinking than intelligence. In fact, intelligence is a
weak to moderate predictor of many rational thinking
skills. In some instances, intelligence shows zero
correlation with rational thinking skills. An example of the
dissociation of intelligence and rationality is seen with
myside bias. Myside bias is displayed when people
evaluate and gather evidence in a manner biased towards
their own beliefs and opinions.
In a series of experiments Stanovich and West (2008)

examined the association between cognitive ability and two
cardinal critical thinking skills- avoidance of myside bias
and avoidance of one side bias (One side bias is
demonstrated when people prefer one sided arguments over
arguments presenting multiple perspectives). In
Experiment 1 natural myside bias was investigated in 15
different propositions. In Experiment 2 myside bias and
one-sided bias was studied. In Experiment 3 associations
between thinking dispositions- in addition to cognitive
ability- and one-sided and myside bias were investigated.
In Experiment 1, the researchers concluded, there was "no

evidence at all that myside bias effects are smaller for
students of higher cognitive ability" (p.140). The main
126
purpose of Experiment 2 was to investigate the association
of cognitive abilities with myside and one side bias. "The
results... were quite clear cut. SAT total scores displayed a
nonsignificant 7.03 correlation with the degree of myside
bias and a correlation of .09 with the degree of one-side
bias (onebias1), which just missed significance on a
twotailed test but in any case was in the unexpected
direction" (p.147). It was also revealed that stronger beliefs
usually imply heavier myside bias. In Experiment 3 "the
degree of myside bias was uncorrelated with SAT scores",
and "[t]he degree of one-side bias was uncorrelated with
SAT scores" (p.156). Myside bias was weakly correlated
with thinking dispositions. One side bias showed no
correlation with thinking dispositions.
When discussing research on intelligence, we are referring

to narrow theories of intelligence- those mental abilities
measured by IQ tests and their proxies (SAT etc.). These
theories provide a scientific concept of intelligence
generally symbolized as g, or "in some cases where the
fluid / crystallized theory is adopted intelligence (Gf) and
crystallized intelligence (Gc)" (Stanovich, 2009, p. 13).
Fluid intelligence reflects reasoning abilities (and to a
degree processing speed) across a variety of domains,
particularly novel ones. Crystallized intelligence reflects
declarative knowledge acquired by acculturated learning-
general knowledge, vocabulary, and verbal comprehension,
etc. Mental abilities assessed by intelligence tests are
important, but the assessment of a variety of important
mental abilities is missed by intelligence tests.
Cognitive scientists generally identify two types of

rationality: instrumental and epistemic. Instrumental
rationality can be defined as adopting appropriate goals,
127
and behaving in a manner that optimizes one's ability to
achieve goals. Epistemic rationality can be defined as
holding beliefs that are in line with available evidence. This
type of rationality is concerned with how well our beliefs
map into the structure of the world. In order to optimize
rationality one needs adequate knowledge in the domains of
logic, scientific thinking, probabilistic thinking, and causal
reasoning. A wide variety of cognitive skills fall within
these broad domains of knowledge. Many of these skills are
not assessed on IQ tests.
Keith Stanovich coined the word Dysrationalia "meaning

the inability to think and behave rationally despite having
adequate intelligence" (Scientific American Mind, 2009,
p.34; What Intelligence Tests Miss, 2009, p. 18). Rationality
encompasses good judgment and decision-making, and it is
just as important as intelligence.
Why do we act and behave irrationally? Two broad

categories contribute to this problem: a processing problem
and a content problem. When choosing the cognitive
strategies to apply when solving a problem we generally
choose the fast, computationally inexpensive strategy.
Although we have cognitive strategies that have great
power, they are more computationally expensive, are
slower, and require more concentration than the faster
cognitively thrifty strategies. Humans naturally default to
the processing mechanisms that require less effort, even if
they are less accurate. Individuals with high IQs are no less
likely to be cognitive misers than those with lower IQ's. A
second source of irrational thinking- content problem- can
occur when we lack specific knowledge to think and
behave rationally. David Perkins, Harvard cognitive
scientist, refers to "mindware" as rules, strategies, and other
128
cognitive tools that must be retrieved from memory to think
rationally (Perkins, 1995; Stanovich, 2009). The absence of
knowledge in areas important to rational thought creates a
mindware gap. These important areas are not adequately
assessed by typical intelligence tests. Mindware necessary
for rational thinking is often missing from the formal
education curriculum. It is not unusual for individuals to
graduate from college with minimal knowledge in areas
that are crucial for the development of rational thinking.
There have been a variety of tests developed to assess

rational thinking skills. Utilizing tests of rationality are just
as important as tests of intelligence. Rational thinking skills
are learnable, and with the development of rational thinking
skills we can expect better judgment and decision making
in everyday life. Because of irrational thinking "physicians
choose less effective medical treatments; people fail to
accurately assess risks in their environment; information is
misused in legal proceedings;" (Stanovich, 2009).
Moreover, millions of dollars are spent on government and
private industry; millions and millions of dollars are spent
on dietary supplements and so on.
Stanovich and colleagues recently introduced a taxonomy

of irrational thinking tendencies and their relation to
intelligence. As mentioned earlier, intelligence has shown a
weak to moderate correlation with some rational thinking
skills while nearly zero with others. Good thinking is more
than intelligence; it is rationality. Intelligence tests do not
adequately assess rational thinking skills.
How does one improve their rational thinking skills? In a

recent interview I asked the Stanvoich research lab to
address this question (Hale, 2010). They answered with the
129
following: “[a] good first start is education, which readers
have already started here by reading this blog entry. Having
an understanding of how cognitive scientists have
expanded what is meant by rationality is important, namely
that rationality is about two critical things: What is true and
what to do.”
130
The Ultimate Goal of Critical thinking
Many researchers suggest that a key characteristic of

critical thinking is the ability to recognize one’s own
fallibility when evaluating and generating evidence-
recognizing the danger of weighing evidence according to
one’s own beliefs. The expanding literature on informal
reasoning emphasizes the importance of detaching one’s
own beliefs from the process of argument evaluation
(Kuhn, 2007; Stanovich & Stanovich, 2010).
The emphasis placed on unbiased reasoning processes has

led researchers to highlight the importance of
decontextualized reasoning. For example (Stanovich &
Stanovich, 2010, p. 196):
Kelley (1990) argues that “the ability to step back

from our train of thought . . . . is a virtue because it
is the only way to check the results of our thinking,
the only way to avoid jumping to conclusions, the
only way to stay in touch with the facts” (p. 6).
Neimark (1987) lumps the concepts of decentering
and decontextualizing under the umbrella term
detachment. She terms one component of
detachment depersonalizing: being able to adopt
perspectives other than one’s own. This aspect of
detachment is closely analogous to Piaget’s (1926)
concept of decentration.”
Various tasks in the heuristics and biases branch of the

reasoning literature involve some type of decontextualized
reasoning (Kahneman, 2003; Stanovich, 2003). These
tasks are designed to see whether reasoning processes can
131
function without interference from the context (prior
opinions, beliefs, vividness effects).
In a series of studies, Klaczynski and colleagues

(Klaczynski & Lavallee, 2005; Klaczynski & Robinson,
2000; Stanovich & Stanovich, 2010) presented individuals
with flawed hypothetical experiments leading to
conclusions that were either consistent or inconsistent with
their prior positions and opinions. The study participants
then critiqued the flaws in the experiments. More flaws
were found when the experiment’s conclusions were
inconsistent with the participants’ prior opinions than when
the experiment’s conclusions were consistent with their
prior opinions and beliefs.
In the education field, educators often pay lip service to the

idea of teaching “critical thinking”. But, when asked to
define “critical thinking” answers are often weak and
sometimes so ambiguous they are virtually worthless.
Common responses to the critical thinking questions
includes, “teaching them how to think”, “teaching them
formal logic”, or “teaching them how to solve problems”.
They already know how to think, logic is only a portion of
what is needed to increase critical thinking, and teaching
them how to solve problems is an ambiguous answer that is
context specific. Stanovich argues, “that the superordinate
goal we are actually trying to foster is that of rationality”
(Stanovich, 2010, p.198). Ultimately, educators are
concerned with rational thought in both the epistemic sense
and the practical sense. Certain thinking dispositions are
valued because they help us base our beliefs on available
evidence and assist us in achieving our goals. Many
educators express to students and administrators the
importance of critical thinking, yet, many of those
132
expressing the importance of critical thinking lack critical
thinking skills themselves. In fact, many educators are
simply in the business of repeating what others say-
Critical thinking is important.
Understanding Rationality
Rationality is concerned with two key things: what is true

and what to do (Manktelow, 2004). In order for our beliefs
to be rational they must be in agreement with evidence. In
order for our actions to be rational they must be conducive
to obtaining our goals.
As mentioned earlier, cognitive scientists generally identify

two types of rationality: instrumental and epistemic
(Stanovich, 2009). Instrumental rationality can be defined
as adopting appropriate goals, and behaving in a manner
that optimizes one's ability to achieve goals. Epistemic
rationality can be defined as holding beliefs that are
commensurate with available evidence. This type of
rationality is concerned with how well our beliefs map onto
the structure of the world. Epistemic rationality is sometimes
called evidential rationality or theoretical rationality.
Instrumental and epistemic rationality are related. In order to
optimize rationality one needs adequate knowledge in the
domains of logic, scientific thinking, and probabilistic
thinking. A wide variety of cognitive skills fall within
these broad domains of knowledge.
In order for educators to successfully teach critical thinking

/ rational thinking it is imperative that they understand what
critical thinking actually is and why it matters. What are
the goals of critical thinking? How can critical thinking be
assessed? Does my curriculum contain information
133
regarding scientific and probabilistic thinking?
Critical thinking is about what is true and what to do!
134
Man is an Irrational Animal!
Contrary to Aristotle’s belief (man is a rational animal);

man is not a rational animal. Often, irrational actions
(behaviors and thoughts) are demonstrated in a predictable
manner.
Rational- of or based on reasoning (from Webster’s New

World Dictionary)- this ambiguous definition is similar to
what is given by many people when asked to define
rational. This type of definition is ambiguous and is open
to an endless array of interpretations. Referring to this
broad definition of rationality it can be argued that
everyone is rational. In order to teach and express the
importance of rational thinking it is important to provide a
precise definition.
What is rationality?
Rationality is concerned with two key things: what is true

and what to do (Manktelow, 2004). In order for our beliefs
to be rational they must be in agreement with evidence. In
order for our actions to be rational they must be conducive
to obtaining our goals.
To reiterate, cognitive scientists generally identify two

types of rationality: instrumental and epistemic (Stanovich,
2009). Instrumental rationality can be defined as adopting
appropriate goals, and behaving in a manner that optimizes
one's ability to achieve goals. Epistemic rationality can be
defined as holding beliefs that are commensurate with
available evidence.
135
Characteristics of rational thought
Adaptive behavioral acts
Judicious decision-making
Efficient behavioral regulation
Realistic goal prioritization
Proper belief formation
Reflectivity
(Characteristics taken from Stanovich, 2009, p.15)
Intelligence & Irrationality
Why do we act and behave irrationally? Two broad

categories contribute to this problem: a processing problem
and a content problem. When choosing the cognitive
strategies to apply when solving a problem we generally
choose the fast, computationally inexpensive strategy.
Although we have cognitive strategies that have great
power, they are more computationally expensive, are
slower, and require more concentration than the faster
cognitively thrifty strategies. Humans naturally default to
the processing mechanisms that require less effort, even if
they are less accurate. Individuals with high IQs are no less
likely to be cognitive misers than those with lower IQ's. A
second source of irrational thinking- content problem- can
occur when we lack specific knowledge to think and
136
behave rationally. David Perkins, Harvard cognitive
scientist, refers to "mindware" as rules, strategies, and other
cognitive tools that must be retrieved from memory to think
rationally (Perkins, 1995; Stanovich, 2009). The absence of
knowledge in areas important to rational thought creates a
mindware gap. These important areas are not adequately
assessed by typical intelligence tests. Mindware necessary
for rational thinking is often missing from the formal
education curriculum. It is not unusual for individuals to
graduate from college with minimal knowledge in areas
that are crucial for the development of rational thinking.
Another type of content problem- mindware contamination-
occurs when one has acquired mindware that thwarts our
goals and causes irrational action.
There are a variety of tests used to assess rational thinking

skills. Rational thinking skills are learnable, and with the
development of rational thinking skills we can expect better
judgment and decision making in everyday life. Because of
irrational thinking "physicians choose less effective
medical treatments; people fail to accurately assess risks in
their environment; information is misused in legal
proceedings;" (Stanovich, 2009), millions of dollars are
spent on government and private industry; millions and
millions of dollars are spent on dietary supplements, faith
healing, financial quackery, magic fitness routines,
revolutionary beauty products, get smart quick schemes and
so on.
Does Intelligence Predict Rationality?
Some people rank high in intelligence while low in

rationality. There is more to good thinking than
intelligence.
137
Below is a list of rational thinking tasks and their
association with cognitive ability / intelligence from
Stanovich (2010, p.221):
TASKS THAT FAIL TO SHOW ASSOCIATIONS

WITH COGNITIVE ABILITY
Noncausal base-rate usage (Stanovich & West, 1998c,
1999, 2008)
Conjunction fallacy between subjects (Stanovich & West,
2008)
Framing between subjects (Stanovich & West, 2008)
Anchoring effect (Stanovich & West, 2008)
Evaluability less is more effect (Stanovich & West, 2008)
Proportion dominance effect (Stanovich & West, 2008)
Sunk cost effect (Stanovich & West, 2008; Parker &
Fischhoff, 2005)
Risk/benefit t confounding (Stanovich & West, 2008)
Omission bias (Stanovich & West, 2008)
Perspective bias (Stanovich & West, 2008)
Certainty effect (Stanovich & West, 2008)
WTP/WTA difference (Stanovich & West, 2008)
My-side bias between and within S (Stanovich & West,
2007, 2008)
Newcomb’s problem (Stanovich & West, 1999; Toplak &
Stanovich, 2002)
TASKS THAT SHOW .20–35 ASSOCIATIONS WITH

COGNITIVE ABILITY
Causal base-rate usage (Stanovich & West, 1998c, 1998d)
Outcome bias (Stanovich & West, 1998c, 2008)
Framing within subjects (Frederick, 2005; Parker &
Fischhoff, 2005; Stanovich & West, 1998b, 1999)
Denominator neglect (Stanovich & West, 2008; Kokis et
138
al., 2002)
Probability matching (Stanovich & West, 2008; West &
Stanovich, 2003)
Hindsight bias (Stanovich & West, 1998c)
Ignoring P(D/NH) (Stanovich & West, 1998d,
1999)Covariation detection (Stanovich & West, 1998c,
1998d; Sá et al., 1999)
Belief bias in syllogistic reasoning (Stanovich & West,
1998c, 2008)
Belief bias in modus ponens (Stanovich & West, 2008)
Informal argument evaluation (Stanovich & West, 1997,
2008)
Four-card selection task (Stanovich & West, 1998a, 2008)
EV maximization in gambles (Frederick, 2005; Benjamin
& Shapiro, 2005)
Rationality is a multi-dimensional concept and it can be

assessed by the use of numerous rationality tasks.
Rationality requires three different classes of mental
characteristic. “First, algorithmic- level cognitive capacity
is needed in order that override and simulation activities
can be sustained. Second, the reflective mind must be
characterized by the tendency to initiate the override of
suboptimal responses generated by the autonomous mind
and to initiate simulation activities that will result in a
better response. Finally, the mindware that allows the
computation of rational responses needs to be available and
accessible during simulation activities. Intelligence tests
assess only the first of these three characteristics that
determine rational thought and action. As measures of
rational thinking, they are radically incomplete”
(Stanovich, 2010, pp.217-218).
139
Implications of Research and Future Research
Rationality is often defined in such broad terms that

virtually any type of thinking can be considered rational
thinking. Similar to definitions of intelligence, the
definitions are so ambiguous they can be interpreted to
many virtually anything. Cognitive science provides a
definition of rationality that differs from that of
intelligence. Many rational thinking tasks can be assessed
and a substantial body of research shows intelligence and
rationality are often dissociated. It is a mistake to label
rationality as just another form of intelligence. This further
contributes to the problem of associating all good thinking
qualities with intelligence. Intelligence and rationality
encompass different cognitive abilities and they should be
differentiated.
Preliminary indicators have shown that rationality may be

more malleable than intelligence. In an effort to enhance
rational thinking skills it is important to acquire specialized
mindware. In an effort to promote the acquisition of the
mindware necessary for rational thinking educators need to
acquire the mindware themselves.
Irrationality is often due to mindware gaps. Knowledge in

the domains of scientific thinking, probabilistic thinking,
and logic should assist individuals in decreasing irrational
thoughts and behaviors.
The ability to override the autonomous mind is problematic

when proper mindware is not available. Fully disjunctive
reasoning- tendency to consider all possible states of the
world when deciding among options or when choosing a
problem solution in a reasoning task- is a rational thinking
140
strategy that can be taught (Reyna & Farley, 2006). The
teaching of considering alternative hypotheses is a
relatively easy strategy that promotes rational thinking. To
perpetuate the idea of thinking about alternative hypotheses
a simple instruction of “think of the opposite” is given.
Studies have demonstrated this strategy can help prevent
the occurrence of various thinking errors (Sanna &
Schwartz, 2006). Probabilistic thinking has been shown to
be more difficult to teach than the previously mentioned
strategies, yet still teachable (Stanovich, 2009). Causal
reasoning, another important element in achieving
rationality is teachable.
Acquiring specialized mindware is needed for rational

thinking, but avoiding contaminated mindware is also
important. “[T]he principle of falsifiability provides a
wonderful inoculation against many kinds of nonfunctional
beliefs” (Stanovich, 2009). The principle is taught in many
low-level research methodology courses and should be
taught to high school students. Many pseudoscientific
claims can be dismissed when applying the falsifiability
principle. Contaminated mindware is an impediment to
rationality. There are four subtests on the CART
(Comprehensive Assessment of Rationality) used to
measure contaminated mindware (Stanovich, West, &
Toplak, 2016). Three of the tests (superstitious thinking,
anti-science attitudes, and conspiracy beliefs) measure
clusters of pseudoscientific belief. A fourth test taps
instrumental rationality and is concerned with the presence
of personal beliefs that block one’s ability to attain goals.
141
Common Myths About Rationality
Rationality has been a popular topic of discussion for many

years. There is a large body of literature-popular and
scholarly- that addresses rational thinking skills.
Rationality is often misunderstood, and the word loses its
importance when it is defined in terms so broad or
ambiguous that it can mean virtually anything. This
confusion has contributed to myths concerning rationality.
Two popular myths are discussed here: One) rationality is
synonymous with logic two) emotional thinking inhibits
rationality.
There is much more to rational thinking than logical

thinking. Rational thinking is important to a person’s
happiness and well being (Stanovich, 2009a). Dictionary
definitions of rationality are often ambiguous, and thus
meaningless. Some individuals downplay the importance
of rationality and have perpetuated the idea that rationality
is the same thing as syllogistic reasoning. This meaning
misrepresents the depth of cognitive skills needed for
rationality. Modern cognitive science provides a more
precise, comprehensive definition of rationality.
As mentioned previously, cognitive scientists recognize

two types of rationality: instrumental and epistemic. A
simple definition of rationality is behaving in the world so
that you get exactly what you most want, given the
resources (physical and mental) available to you. The other
aspect of rationality studied by cognitive scientists is
termed epistemic rationality. This aspect of rationality
concerns how well beliefs are corroborated by actual
evidence. Instrumental and epistemic rationality are related.
142
In order to engage in actions that fulfill our goals, we need
to base those actions on beliefs that are supported by
evidence.
Many people feel that the ability to solve textbook logic

problems has no applications in real life. However, most
people want their beliefs to be in some way reflective of
reality, and they also want to maximize goal achievement.
Epistemic rationality is about holding evidence-based
beliefs and instrumental rationality is about goal setting and
goal achievement. For beliefs to be rational they must be
congruent with reality (most of the time). For our actions
to be rational they must be the best means towards
achieving our goals—they must be the best things to do.
Rationality is useful and practical for a person’s life.

Holding the view that rational thinking is practical and
useful stands in contrast to restricted views of rationality
(rationality = logic).
A second myth concerning rationality is the belief that

emotion thwarts rationality. Supposedly, the absence of
emotion is needed to think rationally. This idea is not
consistent with definition of rationality in modern cognitive
science. Instrumental rationality is behavior consistent
with maximizing goal attainment. There is no specific
psychological process at work here. Emotions may
enhance instrumental rationality, or they may impede it.
Emotions provide an approximation of the correct response.

If more accuracy than that is required, then a more precise
type of analytic cognition will be required (Stanovich,
2009a) It is possible to rely too much on the emotions. We
can base responses on an approximation when what is
143
really needed is a more precise type of analytic thought.
More often than not, processes of emotional regulation
enhance rational thinking and behavior.
People with damage to an area in the prefrontal cortex

(located in frontal lobes of the brain), the ventromedial
area, are often irrational. This is because their processes of
emotional regulation are deficient (integration of cognition
and emotion). Logic itself is one of many tools of rational
thought, but so is emotion.
144
Dysrationalia: Intelligent People Behaving Irrationally
The following interview features the Stanovich, West,

Toplak Research Lab.
Your research shows that intelligence does not imply

rationality. Could you please briefly explain your
general findings?
Those findings are easy to summarize briefly. They are

simply that the correlations between measures of
intelligence and various tasks from the cognitive
psychology literature that measure aspects of rationality are
surprisingly low. We use the term “surprisingly” here,
because for many years it has been known that virtually all-
cognitive ability tasks correlate with each other. Indeed
many show quite high correlations. So, being
psychologists, the surprise is in the context of this wide and
vast cognitive ability literature, which has the technical
name “Spearman’s positive manifold.” This positive
manifold--that performance on cognitive tasks tends to
correlate, and often quite highly--is more than 100 years
old.
Thus, it was in this particular context, when we started

observing fairly modest or low correlations between
measures of intelligence and rational thought, that we
thought this quite startling. Indeed, in restricted samples of
educated adults this correlation can be virtually zero on
certain tasks in the literature. Most often the correlation is
positive, but, again, in light of 100 years of correlations
between cognitive ability tasks, the correlations are often
surprisingly low.
145
Of course one of the implications of this is that it will not
be uncommon to find people whose intelligence and
rationality are dissociated. That is, it will not be uncommon
to find people with high levels of intelligence and low
levels of rationality, and, to some extent, the converse. Or,
another way to put it is that we should not necessarily
expect the two mental characteristics to go together. The
correlations are low enough--or moderate enough--that
discrepancies between intelligence and rationality should
not be uncommon. For one type of discrepancy, that is for
people whose rationality is markedly below their
intelligence, we have coined the term dysrationalia by
analogy to many of the disabilities identified in the learning
disability literature:
http://en.wikipedia.org/wiki/Dysrationalia
What is the definition of rationality?
Dictionary definitions of rationality tend to be of a weak

sort—often seeming quite lame and unspecific. For
example, a typical dictionary definition of rationality is:
“the state or quality of being in accord with reason”. The
meaning of rationality in modern cognitive science has a
much stronger sense, it is much more specific and
prescriptive than typical dictionary definitions. The weak
definitions of rationality derive from a categorical notion of
rationality tracing to Aristotle, who defined “man as the
rational animal”. As de Sousa (2007) has pointed out, such
a notion of rationality as “based on reason” has as its
opposite not irrationality but arationality. Aristotle’s
characterization is categorical—the behavior of entities is
either based on thought or it is not. Animals are either
rational or arational.
146
In its stronger sense, the sense employed in cognitive
science and in this book by de Sousa (2007), rational
thought is a normative notion. Its opposite is irrationality,
and irrationality comes in degrees. Normative models of
optimal judgment and decision making define perfect
rationality in the noncategorical view employed in
cognitive science. Rationality and irrationality come in
degrees defined by the distance of the thought or behavior
from the optimum defined by a normative model. This
stronger sense is consistent with what recent cognitive
science studies have been demonstrating about rational
thought in humans.
We would also warn that some critics who wish to

downplay the importance of rationality have been
perpetuating a caricature of rationality that involves
restricting its definition to the ability to do the syllogistic
reasoning problems that are encountered in Philosophy 101.
The meaning of rationality in modern cognitive science is,
in contrast, much more robust and important. Syllogistic
reasoning and logic problems are one small part of rational
thinking.
Cognitive scientists recognize two types of rationality:

instrumental and epistemic [As mentioned previously]. The
simplest definition of instrumental rationality, the one that
is strongly grounded in the practical world, is: Behaving in
the world so that you get exactly what you most want,
given the resources (physical and mental) available to you.
Somewhat more technically, we could characterize
instrumental rationality as the optimization of the
individual’s goal fulfillment.
The other aspect of rationality studied by cognitive
147
scientists is termed epistemic rationality. This aspect of
rationality concerns how well beliefs map onto the actual
structure of the world. The two types of rationality are
related. In order to take actions that fulfill our goals, we
need to base those actions on beliefs that are properly
calibrated to the world.
Although many people feel that they could do without the

ability to solve textbook logic problems, virtually no person
wishes to eschew epistemic rationality and instrumental
rationality, when properly defined. Virtually all people want
their beliefs to be in some correspondence with reality, and
they also want to act to maximize the achievement of their
goals. Psychologist Ken Manktelow (2004) has emphasized
the practicality of both types of rationality by noting that
they concern two critical things: What is true and what to
do.
Epistemic rationality is about what is true and instrumental

rationality is about what to do. For our beliefs to be rational
they must correspond to the way the world is—they must
be true. For our actions to be rational they must be the best
means toward our goals—they must be the best things to
do.
De Sousa, R. (2007). Why think? Evolution and the rational

mind. Oxford: Oxford University Press.
Manktelow, K. I. (2004). Reasoning and rationality: The

pure and the practical. In K. I. Manktelow & M. C. Chung
(Eds.), Psychology of reasoning: Theoretical and historical
perspectives (pp. 157-177). Hove, England: Psychology
Press.
148
What are some of the rational thinking skills that are
positively associated with intelligence? How about
rational thinking skills that are not associated with
intelligence?
Various probabilistic reasoning tasks have moderate

correlations with intelligence. However, myside bias (the
tendency to view evidence from one’s own side) is pretty
much independent of intelligence in university samples.
There are many, many domains of rational thinking
measures and they each have important characteristics that
will impact whether they are associated with intelligence.
Stanovich’s Yale book contains a theoretical explanation of
why some rational thinking tasks correlate with intelligence
and others do not:
Stanovich, K. E. (2009). What intelligence tests miss: The

psychology of rational thought. New Haven, CT: Yale
University Press.
In a TV interview you (Toplak) mentioned the need for

RQ testing. Do you think we can expect to see RQ
testing within the public domain, in the near future?
Yes, this would be a great thing, but it is not likely to

happen in the near future. The development of such an
instrument would be a logistically daunting task, partly
because rational thinking is such a big construct with so
many parts. We use the term “multifarious” to describe this,
and a metaphor we use is that it is like going to your family
doctor for a check-up: there is not one test that will tell you
that your health is good, rather the doctor checks multiple
things to make this assessment.
149
The purpose of our work, and many of our recent
publications, has been to speed the development of an RQ
test along. We have done this by showing that there is no
impediment, theoretically, to designing such a measure.
The tasks that would be on such a measure have been
introduced into the recent literature. In several recent
publications we have been working on bringing them
together into a coherent structure. Of course there are
many, many, more steps that are needed before one has an
actual standardized test. Standardization samples would
need to be run and items would need to be piloted. In terms
of the corporations that produce mental tests, it’s an
endeavor that, if one were to measure it in dollars, would
be millions of dollars.
Again, the purpose of some of our recent work has been to

sketch out what such an endeavor would look like, to show
that there is no theoretical or empirical impediment to such
a thing, and to recruit others into this endeavor of working
on such an instrument. We would like to include others in
this endeavor, because we believe that it is way beyond the
capabilities of a single laboratory. Our hope is that such an
instrument might someday stand in parallel to the
intelligence tests. This has been one of the motivations in
our recent books and chapters, such as the following:
Stanovich, K. E., West, R. F., & Toplak, M. E. (in press).

Intelligence and rationality. In R. J. Sternberg & S. B.
Kaufman (Eds.), Cambridge handbook of intelligence (3rd
Edition), Cambridge,UK: Cambridge University Press.
We need to emphasize, however, that there is no reason for

this to be an all or nothing, rather than an incremental,
150
process. There clearly would be immediate practical uses of
less all-encompassing instruments that focused on
important components of rational thinking (e.g., economic
thinking, probabilistic thinking, scientific thinking, reduced
myside biased thinking).
Is rationality more important than intelligence?
No, we would never make such a blanket statement. We

would only say that the magnitude of its importance at least
approaches that of intelligence. Differences in rational
thought have real world consequences that cash out in
terms of important outcomes in peoples lives. We don’t
want to get into a contest of which is more important. We
acknowledge that intelligence, as assessed by standardized
tests, is one of the most important psychological constructs
that has ever been discovered. But outlining the nature of
rational thought, how to theoretically conceive it, and how
to measure it empirically, is certainly up there with
intelligence in terms of the most important five or six
mental constructs that psychologists have investigated.
Can a person be highly rational, but rank low in

intelligence?
Yes. This was addressed in our response to question

number 1, that the whole point of our research showing that
the correlation between the two is not excessively high is
that you can have discrepancies, and that one can be high
on one and low in the other.
Tell our readers how they can improve their rational

thinking skills.
151
A good first start is education, which readers have already
started here by reading this blog entry. Having an
understanding of how cognitive scientists have expanded
what is meant by rationality is important, namely that
rationality is about two critical things: What is true and
what to do.
There are numerous books that deal with rational thinking.

Some of the chapters and books in our own research lab
have contributed to this, and we will list them at the bottom
of this entry.
Do you think a good starting point would be becoming

educated on basic logic?
Basic logic would be part of a rational thinking skills

curriculum, but not necessarily the first part. Again, rational
thinking in cognitive science encompasses decision theory,
epistemic rationality, and many areas beyond simply the
study of basic logic in philosophy 101. It is very important
to understand that rational thinking in cognitive science is
rooted in good decision-making. Good decision making
skills and good skills of knowledge acquisition do have
logical thinking as one subcomponent. But there are many
subskills that are even more important than logic. The
subskills of scientific thinking, statistical thinking, and
probabilistic reasoning, for example. Many of these are
listed in the books that we will recommend here.
Baron, J. (2008). Thinking and deciding (Fourth Edition).

Cambridge, MA: Cambridge University Press.
Hastie, R., & Dawes, R. M. (2001). Rational choice in an

uncertain world. Thousand Oaks, CA: Sage. (a new 2010
152
edition is just out)
A recent chapter of ours contains a large number of

citations to successful attempts to teach the skills of
rational thought:
Toplak, M. E., West, R. F., & Stanovich, K. E. (2011).

Education for rational thought. In M. J. Lawson & J. R.
Kirby (Eds.), The quality of learning. New York:
Cambridge University Press.
Is there a particular book that you recommend- for

people interested in increasing their rationality- for the
lay public?
Yes, some of the books that we have already mentioned.

We will be so immodest as to recommend a small textbook
of our own.
Stanovich, K. E. (2010). Decision making and rationality in

the modern world. New York: Oxford University Press.
Weblinks with bios and further information:
http://web.mac.com/kstanovich/iWeb/Site/Home.html
http://www.yorku.ca/mtoplak/
http://web.me.com/westrf1/Site_2/Welcome.html
Video- Stanovich Grawemeyer Lecture- Third link from the

top of the page
http://web.mac.com/kstanovich/iWeb/Site/Audio_Visual.ht
ml
153
Rationality Quotient
Up until publication of- The Rationality Quotient -

components of rational thinking had been tested using
various tasks, but a comprehensive test was not available. I
first discussed the development of such a test with
Stanovich, in 2013. The interview featured in this article is
my most recent interview with Stanovich; it was conducted
in 2016.
What are some of the initial reactions, regarding the
RQ, from academics?
Uniformly positive so far, and I believe that is because we
were careful in the book to be explicit about two things.
First, we were clear about what our goals were and the
goals were circumscribed. Secondly, we included an entire
chapter contextualizing our test (the Comprehensive
Assessment of Rational Thinking, CART) and discussing
caveats regarding its use as a research instrument or
otherwise. In fact, I think we have already entirely achieved
our aims. We have a prototype test that is a pretty
comprehensive measure of the rational thinking construct
and that is grounded in extant work in cognitive science.
Now, this is not to deny that there is still much work to be
done in turning the CART into a standardized instrument
that could be used for practical purposes. But of course a
finished test was not our goal in this book. Our goal was to
show a demonstration of concept, and we have done that.
We have definitively shown that a comprehensive test of
rational thinking was possible given existing work in
cognitive science. This is something that I have claimed in
previous books but had not empirically demonstrated with
the comprehensiveness that we have here by introducing
the CART. As I said, there are more steps left in turning the
154
CART into an “in the box” standardized measure, but that
is a larger goal than we had for this book.
I think that, at least so far, most academics have understood
our goals and the feedback has been good. We wrote a
summary article on the CART in a 2016 issue of the journal
Educational Psychologist (51, 23-34) and the feedback
from that community has been good.
Are there components of the RQ that can be expected to

show a strong positive correlation with intelligence?
The CART has 20 subtests and four thinking dispositions

scales (the latter are not part of the total score). Collectively
they tap both instrumental rationality and epistemic
rationality. In cognitive science, instrumental rationality
means behaving in the world so that you get exactly what
you most want, given the resources (physical and mental)
available to you. Epistemic rationality concerns how well
beliefs map onto the actual structure of the world. The two
types of rationality are related. In order to take actions that
fulfill our goals, we need to base those actions on beliefs
that are properly calibrated to the world.
The CART assesses epistemic thinking errors such as: the

tendency to show incoherent probability assessments; the
tendency toward overconfidence in knowledge judgments;
the tendency to ignore base rates; the tendency not to seek
falsification of hypotheses; the tendency to try to explain
chance events; the tendency to evaluate evidence with a
myside bias; and the tendency to ignore the alternative
hypothesis. Additionally, CART assesses instrumental
thinking errors such as: the inability to display disjunctive
reasoning in decision making; the tendency to show
inconsistent preferences because of framing effects; the
155
tendency to substitute affect for difficult evaluations; the
tendency to over-weight short-term rewards at the expense
of long-term well-being; the tendency to have choices
affected by vivid stimuli; and the tendency for decisions to
be affected by irrelevant context.
Importantly, the test also taps what we call contaminated

mindware. This category of thinking problem arises
because suboptimal thinking is potentially caused by two
different types of mindware problems. Missing mindware,
or mindware gaps, reflect the most common type—where
Type 2 processing does not have access to adequately
compiled declarative knowledge from which to synthesize
a normative response to use in the override of Type 1
processing. However, in the book, we discuss how not all
mindware is helpful or useful in fostering rationality.
Indeed, the presence of certain kinds of mindware is often
precisely the problem. We coined the category label
contaminated mindware for the presence of declarative
knowledge bases that foster irrational rather than rational
thinking. Four of the 20 subtests assess contaminated
mindware.
My purpose in digressing here to describe the CART is to

point out that given the number and complexity of rational
thinking skills, it is likely that the subtests will have
correlations with intelligence that are quite variable. The
four subtests with the highest correlations are: the
Probabilistic Reasoning Subtest; the Scientific Reasoning
Subtest; the Reflection Versus Intuition Subtest; and the
Financial Literacy Subtest. Correlations with these subtests
tend to .50 or higher. Most of the subtests of the CART
correlate with intelligence in the range of .25 to .50 (a few
have even lower correlations). Some very important
156
components of rational thinking do show considerable
dissociation from intelligence. Overconfidence (measured
by the Knowledge Calibration Subtest of the CART) shows
only a .38 correlation with intelligence. This represents a
substantial amount of dissociation for a key component of
rational thinking. Kahneman, for example, devoted
substantial portions of his best-selling book to this
component of rational thinking. Myside bias (measured by
our Argument Evaluation Subtest) likewise shows a
correlation of .38, indicating a substantial dissociation.
This thinking bias is at the center of many discussions of
what it means to be rational. Some of the subtests that
most directly measure the components of the axiomatic
approach to utility maximization show relatively mild
correlations with intelligence. For example, the Framing
Subtest shows a fairly low .28 correlation. Framing
measures a foundational aspect of rational thinking
according to the axiomatic approach.
Finally, some subtests of immense practical importance

show very low correlations with intelligence in the CART.
The skill of assessing numerical expected value shows a
correlation of only .21, and the ability to delay for greater
monetary reward shows a correlation of only .06. The
tendency to believe in conspiracies shows a modest
correlation of .34.
Do you think rationality will acquire the same high level

status as intelligence in the near future?
Not in the near future, no. Our goal with the book was
more modest—to simply raise awareness of the importance
of rational thinking and the ability of modern cognitive
psychology to measure it. The result of our efforts will, we
157
hope, redress the imbalance between our tendency to value
intelligence versus rationality. In our society, what gets
measured gets valued. Our aim in developing the CART
was to draw attention to the skills of rational thought by
measuring them systematically. In the book, we are careful
to point out that we operationalized the construct of rational
thinking without making reference to any other construct in
psychology, most notably intelligence. Thus, we are not
trying to make a better intelligence test. Nor are we trying
to make a test with incremental validity over and above IQ
tests. Instead, we are trying to show how one would go
about measuring rational thinking as a psychological
construct in its own right. We wish to accentuate the
importance of a domain of thinking that has been obscured
because of the prominence of intelligence tests and their
proxies. It is long overdue that we had more systematic
ways of measuring these components of cognition, that are
important in their own right, but that are missing from IQ
tests. Rational thinking has a unique history grounded in
philosophy and psychology, and several of its
subcomponents are firmly identified with well-studied
paradigms. The story we tell in the book is of how we have
turned this literature into the first comprehensive device for
the assessment of rational thinking (the CART).
Why does society need a comprehensive assessment of

rational thinking?
To be globally rational in our modern society you must

have the behavioral tendencies and knowledge bases that
are assessed on the CART to a sufficient degree. Our
society is sometimes benign, and maximal rationality is not
always necessary, but sometimes, in important situations,
our society is hostile. In such hostile situations, to achieve
158
adequate degrees of instrumental rationality in our present
society the skills assessed by the CART are essential. In
Chapter 15 of The Rationality Quotient we include a table
showing that rational thinking tendencies are linked to real
life decision making. In that table, for each of the
paradigms and subtests of the CART, an association with a
real-life outcome is indicated. The associations are of two
types. Some studies represent investigations where a
laboratory measure of a bias was used as a predictor of a
real-world outcome. Others are reports of real-world
analogues of biases that were originally discovered in the
lab. Clearly more work remains to be done on tracing the
exact nature of the connections—that is, whether they are
causal. The sheer number of real-world connections,
however, serves to highlight the importance of the rational
thinking skills in our framework. Now that we have the
CART, we could, in theory, begin to assess rationality as
systematically as we do IQ. If not for professional inertia
and psychologists’ investment in the IQ concept, we could
choose tomorrow to more formally assess rational thinking
skills, focus more on teaching them, and redesign our
environment so that irrational thinking is not so costly.
Whereas just thirty years ago we knew vastly more about
intelligence than we knew about rational thinking, this
imbalance has been redressed in the last few decades
because of some remarkable work in behavioral decision
theory, cognitive science, and related areas of psychology.
In the past two decades cognitive scientists have developed
laboratory tasks and real-life performance indicators to
measure rational thinking tendencies such as sensible goal
prioritization, reflectivity, and the proper calibration of
evidence. People have been found to differ from each
other on these indicators. These indicators are structured
differently from the items used on intelligence tests. We
159
have brought this work together by producing here the first
comprehensive assessment measure for rational thinking,
the CART.
160
Chapter 3
FAQ: Research Methods and Statistics
In this chapter questions and answers are provided to a

wide range of questions about research methods and
statistics. The information provided here ranges from basic
to complex. The questions are not organized in a specific
manner. Most of the questions are questions that have been
asked by university students taking research methods and
statistics courses. There is some overlap in the information
provided. Some of the answers are short, while others are
long. The questions are not presented in any specific order.
Most of the information in this chapter is derived from the
applications of research methods and statistics as they are
used in medical, health, social, behavioral, and cognitive
sciences. Other domains of science may use and define
some aspects of research methods and statistics differently
than the ones used here.
161
When using frequency distribution tables when is it
appropriate to use group frequency distribution tables?
A frequency distribution table presents all of the individual

scores in the distribution. Disorganized scores are placed
in order from lowest to highest, grouping together
individuals who have the same score. The frequency
distribution table allows a quick look at the entire range of
scores. The frequency distribution also allows you to see
the location of a single score relative to the other scores.
When there is a large range of scores it is recommended

that a grouped frequency table be used. Keep in mind; a
key purpose for constructing a frequency table is to reflect
a relatively simple, organized picture of the entire range of
scores. However, when the number of scores is large using
a frequency table is not practical, is time consuming and
not simple to read. Presenting the scores in a relatively
simple, organized manner requires a group frequency
distribution table. When using the group frequency
distribution table groups of scores are presented rather than
individual scores. The groups are called class intervals.
Intervals are often presented when individual scores aren’t
as important as the range of scores, such as when teachers
check to see how many students received As, Bs, Cs, etc.
on an exam.
Are there any specific guidelines for designing grouped

frequency distribution tables?
The table should have approximately 10 class intervals. If

a table has many more than 10 intervals, it becomes
burdensome and defeats the purpose of the table. To
reiterate, the purpose of a frequency distribution is to help
162
the researcher get a quick view of the data. If there are too
few or too many intervals, the table might not reflect a clear
picture. The table should be relatively easy to read and
understand.
The width of each interval should be a relatively simple

number. As an example, it is easy to count 2s, 5s, and 10s.
These numbers are very easy for most people to
understand, and allow individuals to understand how you
have divided the entire range of scores.
The bottom score should be a multiple of the width. As an

example, when using a width of 2 points, the intervals
should start with 2, 4, 6, 8 and so on.
All intervals should be the same width, with no gaps or

overlaps.
By randomly assigning participants to a group,

researchers try to make the groups as similar as
possible, but wouldn’t it make more sense for the
researcher to split them? What if one group ended up
with all the knowledge/skills pertaining to the
experiment and the other group didn’t. Wouldn’t this
confound (confuse) the research findings?
The researcher does split them. They are placed in groups

according to random assignment (equal chance of being in
either group). This is the logic of an experiment. It needs
to start with equal groups (equivalent groups, no systematic
differences).
If there is a large difference between them to start, then we

can say we have started with a Nonequivalent Control
163
Group-, which is a threat to internal validity. Researchers
attempt to balance group differences with random
assignment.
Research findings are often weak when comparing

nonequivalent groups. "[R]andom assignment is typically
considered sufficient to address the potential problem of a
nonequivalent control group." (Jackson, 2009, p.208)
Random assignment is required in order for an experiment
to be considered a true experiment.
Random assignment helps to avoid a selection effect, which

can occur when participants choose which group to be in or
when the researchers assign one type of person (based on
any particular characteristic) to one condition and another
type to the other. By allowing the researchers to divide the
groups themselves, the door is open to a selection effect,
even if it is accidental. With random assignment a
participant in one condition or group could have easily
been in the other group. With random assignment everyone
has an equal chance of being in any condition or level of
the study.
Is a discussion of internal validity irrelevant when

talking about non-experimental research?
Internal validity is only relevant to experimental research,

according to some researchers, as this is the only type of
research method that allows us to determine cause and
effect.
Internal Validity: An experiment is internally valid if we

can say the outcome is due to the treatment (the
manipulation of the IV) and not due to an unwanted factor
164
that differed between conditions. That is, an experiment is
internally valid if we can say with a high degree of
confidence that we properly determined causation, and the
experiment is not confounded (flawed, confused). There is
numerous threats to internal validity, some of the most
common include: nonequivalent control group, history
effect, maturation effect, testing effect, regression to the
mean, instrumentation effect, mortality or attrition,
diffusion of treatment, experimenter and participant effects,
and floor and ceiling effects.
Sometimes when discussing frequency and association

claims, matters of internal validity are mentioned.
However, there is little need to discuss internal validity
with methods that do not allow determination of cause and
effect. There are simply too many threats to internal
validity to be controlled with non-experimental studies.
Some researchers suggest that quasi-experimental design, if

done correctly, can demonstrate causality. “I think a quasi-
experimental design can demonstrate causality if it is done
well, and potential confounds are held constant during
analysis (i.e. identifying and controlling for covariates).”
(Gore, 2013, commentary while editing Chapter Three,
FAQ: Research Methods and Statistics)
Complex correlational statistics such as multiple

regression, partial correlation and path analysis can be used
in making causal inference (to some extent; it varies).
Is there any situation where a confounded experiment

could still be considered valid?
If a study is confounded it is confused. That is, the results
165
may have possible alternative explanations. If it lacks
internal validity we cannot rule out other explanations, so
we cannot say the cause and effect was properly
determined.
Consider the following:
Some have concluded that low carbohydrate diets are better

for weight loss than high carbohydrate diets. This finding is
confounded due to not controlling calorie levels
(uncontrolled extraneous variable that systematically varies
with the IV- Diet (levels- low carbohydrate and high
carbohydrate). Thus, the finding that a low carbohydrate
diet, per se, is superior in terms of weight loss to a high
carbohydrate diet cannot be validated.
There will always be possible confounds when looking at

setting up an experiment. Internal validity tells us the
degree of certainty (never reach absolute certainty) we have
that the outcome is due to manipulation of the independent
variable rather than a flaw in the experiment. It is
important to hold conditions constant, with the only
difference between groups being the level of the
independent variable that participants are exposed to.
When considering internal validity researchers are trying to
ensure that the independent variable is cause of the
dependent variable (effect, outcome).
How can one distinguish causal claims from association

claims?
Only experiments can make true causal claims (some

suggest, to reiterate). Association claims notes a
relationship or correlation between two variables, whereas
166
a causal claim suggests that a variable(s) is causing a
change in another variable. Morling (2012, p.64, Table 3.2)
provides verb phrases that can help you distinguish
between association and causal claims. Below is a list of
some the verbs provided by Morling (2012).
Association claim verbs
Is linked to
Goes with
Predicts
Is tied to
Is at risk for
Causal claim verbs
Promotes
Causes
Leads to
Helps
Increases
It is important to point out that journalists often promote

association claims as if they were causation claims. Causal
claims attempt to describe, predict and explain phenomena.
Association claims attempt to describe and predict
phenomena.
Causal claims are a step above association claims. They

make a stronger statement, thus they are held to higher
standards. In order to deem causation three criteria must be
met. First, covariation (correlation, relationship) must be
established. Second, the causal variable must precede the
effect (temporal precedence). Third, alternative
167
explanations for the relationship must be eliminated
(internal validity).
How does a quasi-independent variable differ from a

true independent variable?
A true independent variable occurs when participants are

randomly assigned to different groups (levels, conditions).
For example, suppose you're doing a study in which you
want to find out if playing violent video games makes
people more aggressive. So, 60 people show up for the
study and you flip a coin, or use a random number
generator in advance to determine which participant
number will be assigned to each group. Any method that
allows you to assign randomly will generally work.
In one group participants are told to play video games and

you give them violent video games to play. In the other
group you tell them to play video games and the games are
non-violent. Afterward you ask them to fill out an
"aggressiveness" scale. This is an example of a true
independent variable. The independent variable was
manipulated (different levels were created and participants
were randomly assigned to these levels). The experimenter
has a lot of control over the situation and if the group
watching the violent video games scores higher on the
aggressiveness scale, it is likely that playing the violent
games caused this. The only difference between these
groups is exposure to different levels of the IV.
If, on the other hand, you ask people who play video games
to participate in a study and 60 people show up and then
you ask them what kind of video games they play (violent
or non-violent) and if they say violent you put them into
168
one group and if they say non-violent you put them into the
other. This is not really a true independent variable. It's a
quasi-independent variable (this is not a great quasi-
independent design). In a sense, your participants put
themselves into the groups. They were already formed
groups. A specific characteristic was required to be a
member of either group. If the subjects who watched the
violent video games scored higher on the aggressiveness
scale it could be because they had just played violent video
games, or it could be that they are more aggressive people
in the first place (third variable problem, could be an
alternative explanation). Another example of a quasi-
experiment is seen when investigating the effects of a new
reading program. One school is exposed to the new
program while another school is not. At the end of the year
the schools are tested, and compared on reading
comprehension. Participants were assigned to either of the
groups depending on the school they attend. Thus, a
characteristic placed them in either of the groups. To
reiterate, with random assignment participants have a equal
chance of being in either group.
If participants are not randomly assigned (if they do not

have an equal chance of being in either group) then we
cannot say we have a true independent variable. If there
are any criteria required to be in any of the groups then the
IV is not a true IV. IVs used in non-experimental research
are quasi IVs, even though they are often referred to as IVs.
Is an operational definition the effort in which

something is done or the amount of the work done in a
task?
An operational definition reflects the precise, observable,
169
operations used to measure or manipulate a variable.
Juxtapose this with a conceptual definition, which is more
abstract, and does not provide specified operations
concerning manipulation or measurement. Operational
definitions allow others to replicate studies, by using the
same operations used in the study they are attempting to
replicate. Operational definitions also allow studies to be
critiqued.
Operationism (using operational definitions) removes the

concept from the feelings and intuitions of an individual
and allows it to be tested by anyone with the resources to
carry out the measures (Stanovich, 2007).
As an example, consider the following:
Identify the DV and how you would measure it

(operationalized)? Researchers studying human memory
presented people with two lists of words. One list included
the names of objects; the other list contained abstract
nouns. The researchers found that people could remember
more words from the list with object names. DV= memory
Operationalization of DV= A test was administered after
participants were exposed to either of the lists of words.
The scores were compared between groups.
Another way to think of an operational definition is given

by Mitchell & Jolley (2010, p.627) "A publicly observable
way to measure or manipulate a variable; a "recipe" for
how you are going to measure or manipulate your factors"
When considering the operationalization of the manipulated

variable it is important that all levels of the IV
(manipulated variable) are shown. It is also important to
170
show how they are created. This allows researchers to
investigate construct validity. Which asks (regarding the
IV) – How well was the IV manipulated?
What is the specific purpose of an IRB board?
The IRB (Institutional Review Board) ensures that all

ethical guidelines are met before giving the ok to conduct
the study. Before conducting a study, in many institutional
settings, an IRB proposal and approval is required before
the study commences. An effective IRB prevents
researchers from conducting studies that may violate
people's rights, or research that poses unreasonable risk.
Ideally the IRB attempts to balance the welfare of research

participants against the researcher's goal of contributing
important knowledge to the field (Morling, 2012).
How does arbitrary assignment, differ from random

assignment. How does the outcome of arbitrary
assignment affect the results of the experiment?
Arbitrary (arbitrary rule established) assignment = reason

for being in group
Random assignment= no reason, by chance / or random

occurrence (equal chance of being in either group).
With an arbitrary rule such as "arrival time" (as an

example: first 10 that sign up get assigned to the control
group, while the second 10 are assigned to the experimental
group) there may be a difference between participants who
show up earlier vs. those showing up later. As an example,
those showing up early may be highly motivated, and
171
generally high achievers, while those showing up later may
not be so motivated or concerned with high levels of
achievement. This may be a threat to internal validity, and
result in non-equivalent groups.
So, with arbitrary assignment they belong to one of the

groups due to a specific reason, while with random
assignment there is no reason why they belong to a group.
Random assignment helps ensure balancing between
groups, and is required for a true experiment.
What is construct validity?
Construct validity- an assessment of how well a variable

was measured or manipulated in a study. That is, construct
validity reflects the degree to which the researcher(s)
manipulated (IV) or measured (DV) what they intended to.
Notice when the word validity is used alone (without the

construct appendage) it is generally defined similar to
construct validity in regards to the measured variable.
Consider the definition of validity:
“Validity- A measure of the truthfulness of a measuring

instrument. It indicates whether the instrument measures
what it claims to measure.” (Jackson, 2009, p.424).
Why is a control variable called a variable if it doesn’t

vary? A variable is defined as something (trait or
characteristic) that can have one or more levels (values).
A control variable is any variable that is held constant be

the researcher. However, the term control variable is a little
confusing. It is confusing because control variables are not
172
really variables (in the context of the study) - because they
do not vary.
No need to worry about semantics, just keep in mind when

discussing research methods a control variable is any
variable that an experimenter holds constant on purpose.
Control variables are important for establishing internal
validity (the degree to which the results of an experiment
can be attributed to manipulation of the IV rather some
confound).
The point of a control variable being called a variable is, it

has the possibility to vary, but the researcher chooses to
keep constant throughout the study. It has potentially
varying levels, but the levels are kept constant, throughout
the study to establish internal validity.
Sometimes it is hard to distinguish between the

dependent and independent variables. Is there a
definite way to distinguish between the two?
Think of the IV as the manipulated variable. Participants

are assigned to receive different levels (values) of the IV.
Only experiments make use of true IVs. The DV is the
measured variable (outcome, effect). The IV is what causes
the DV to change, given the relationship between the two.
For example, a study investigating the effects of caffeine on
sustained attention showed that participants receiving
caffeine had better attention than those receiving placebo.
In this example, the IV is presence or absence of caffeine
(levels/conditions - caffeine, placebo) and the DV is
sustained attention.
Cause: IV
173
Effect: DV
The dependent variable depends on the independent

variable. With the independent variable researchers have
some degree of independence as they assign participants to
different groups (conditions, levels).
What is the definitional formula for SS?
SS (sum of squares) is the sum of squared deviation scores.

The proper sequence to find SS using the definitional
formula is:
Find each deviation score
Square each deviation score
Add (sum up) the squared deviations
You need the mean score. Then you subtract mean from
the individual score. Then square the difference, and then
add the squared deviations. The SS is very important
regarding measures of dispersion (width, variability). The
SS is needed in order to calculate the variance.
What is the distinction between a parameter and a

statistic?
Parameter- is a value, usually numerical, that describes a

characteristic of the population. Populations yield different
parameters, depending on the characteristic of interest.
Populations yield parameters, and samples yield statistics.

Parameters and statistics are numerical measures that
174
represent characteristics of populations and samples. A
parameter is generally derived from measurements of
individuals in a population (entire group- people, non-
human animals or objects- a researcher is interested in). A
statistic is generally derived from measurements of
individuals in a sample (participants/ subjects in a study
used to represent population of interest).
What is the purpose of the correction factor when

calculating sample variance?
Samples are consistently less variable than populations.

That is, scores are not spread out as much with samples. A
correction factor (adjustment) is used to correct for this bias
in sample variability. The effect of the correction factor is
to increase the number you obtain. This can be done by
dividing SS by a smaller number- n (number of participants
in study)-1, instead of just using n. Dividing a number by a
smaller number produces a larger number, and makes
sample variance a more accurate estimator of population
variance. Remember, we want to be able to make
inferences from the sample to the population (answer
general questions and come to general conclusions about
the population).
What types of graphs are used to illustrate the results of

an experiment?
Bar graphs are preferred or acceptable when comparing

groups. Remember with an experiment two or more groups
are being compared. Bar graphs are standard when
presenting results of experimental research. Consider the
following when using graphs to represent frequency
distributions:
175
When data consist of numerical scores measured on
continuous scale polygons or histograms are generally
used. When scores are measured on a nominal or ordinal
scale, bar graphs are used. Also, error bars are often
included to demonstrate the significance of the difference
between groups.
How do descriptive statistics describe a distribution?
Descriptive statistics are numerical measures that describe

a distribution by providing information on the central
tendency of the distribution, the width of distribution
(variability or dispersion), and the shape of the distribution.
Descriptive statistics describe, organize, and summarize
information.
Understanding descriptive statistics will make learning how

to use inferential statistics much easier. Inferential
statistics often use descriptive statistics when making
calculations.
What is implied by effect size?
Effect size is a measurement intended to reflect the

magnitude of the effect, or simply, the size of the difference
between means. Cohen's d, a measure of effect size,
measures the size of the effect in terms of standard
deviation. For example, a value of 1.00 indicates that the
effect size is equal to one standard deviation.
Values of d: .20 small, .50 medium, .80 Large, 1.10 Very

Large, 1.40 Extremely Large
When considering statistical validity it is important to
176
consider effect size. When interpreting effect size consider:
a small effect size might represent and important result, and
or a large effect size might represent and unimportant
result.
When can a small effect size represent and important

result? Suppose researchers haven’t been able to find a d of
.20 when examining various treatments for treatment of a
deadly disease. If a subsequent researcher discovers a
treatment that has a d of .20 this might be considered an
important finding (Patten, 2004).
When can a large or extremely large effect size be

considered an unimportant result? A large effect size may
be of limited importance when results lack practical
significance in terms of cost, public and public
acceptability, and legal and ethical concerns. Refer to
Patten (2004, Topic 53) for a detailed discussion regarding
effect size and practical significance.
What is alpha level?
Alpha level refers to the level at which researchers find

statistical significance. It is standard to allocate statistically
significant at .05 alpha level. A much lower alpha level of
.01 is sometimes used.
The alpha level (level of significance) is a probability

value, that is used to define the concept of “very unlikely.”
Very unlikely means the test statistic is occurs towards the
tail of the distribution.
177
How well does the sample represent a given population?
Samples are variable as they have different individuals,

different scores, vary on other characteristics. Samples
from the same population will not have the exact same
characteristics. Generally, the larger the sample the closer
we approach an approximation of the population.
Increasing sample size increases the precision of the
research. When referring to highly precise results, we are
saying that the results only vary from a small amount from
sample to sample, which is what happens if each sample is
large enough. Always recognize samples are sometimes
consist of large variation, and variation in samples vary
from study to study, even when using a sample in direct
replication, and or sample expected to have very similar
characteristics. Determining how well, or to the extent, a
sample represents a population is relatively subjective.
Sometimes small effect sizes are considered important,

while at the same time in some contexts large effect sizes
are not considered important. Why is this?
Suppose researchers haven't been able to find an effective

treatment (not even small effect size) for a particular
medical disorder. Suppose a researcher finds a treatment
that is significant, even though the effect size is small. This
small decrease in feeling ill, or treating this ailment, is very
important for the individuals suffering from the disorder.
Finding a large effect size may be of little importance when

the results lack practical significance in terms of cost,
public and political acceptability, ethical and legal
concerns. Statistical significance tells us nothing about
practical significance.
178
What is the difference between standard deviation and
standard error?
There is a small difference in equations for the standard

deviation and the standard error. However, this small
difference changes the meaning of what is being reported.
Basically, standard error is an estimate of how close your
sample mean is likely to be to the population mean.
Standard deviation is the degree to which individuals
within the sample differ from the sample mean. Standard
error should decrease with larger sample sizes, as the
estimate of the population mean improves. Standard
deviation will not be affected systematically by sample
size.
How do we find the standard error of the mean?
To find the standard error of the mean, a number of samples

from the population are taken, the mean is determined from
each sample, and then the standard deviation is calculated
for this distribution of sample means. This process is
highly impractical. Fortunately, there is a way for finding
the standard error of the mean without engaging in this
impractical process. The way to do this is based on the
central limit theorem. The central limit theorem involves a
specific description of the distribution that would be
obtained if every possible sample was selected, every
sample mean was calculated, and the distribution of the
sample means was constructed. “According to the central
limit theorem, to determine the standard error of the mean
(the standard deviation for the sampling distribution), we
take the standard deviation for the population… and divide
by the square root of the sample size.” (Jackson, 2009,
p.174)
179
2 additional key points when considering the Central Limit
theorem:
1- it describes the distribution means for any population
2- sample means gets close to the normal distribution
relatively fast. By the time the sample size reaches 30 the
distribution is close to perfectly normal
Why is the critical region important for hypothesis

testing?
Extremely unlikely values, as indicated by alpha level,

make up the critical region. Extreme values are in the tails
of a distribution. That is, these outcomes are extreme
scores in a normal distribution (thus high percentile
ranking, low proportion of scores at this level or higher).
When data from a study produce a sample mean that is
located within the critical region, the null hypothesis
(which states there is no or little difference) is rejected,
meaning that there is a statistically significant difference.
This type of testing makes use of binary null hypothesis
statistical testing (either a statistical difference, or not- thus
binary, dichotomous) We can say that the score occurs in
the extreme, and by a low proportion of the distribution.
For example, if the computed test statistic of z exceeds the

critical value and falls in the region of rejection, the null
hypothesis is rejected, thus a statistically significant
difference occurred.
In research studies sometimes a N is used as a symbol

and other times a n is used. What do these symbols
represent?
N refers to population size (set of all members of interest in
180
a particular study); and n, to sample size (subjects /
participants selected from a population, generally intended
to represent the population of interest). It is not always
possible to measure everyone in the population, so samples
are used as representations of the population. It is possible
to determine what the distribution of sample means looks
like without acquiring thousands of samples. This can be
done by way of the central limit theorem. It provides a
precise description of the distribution that would be
obtained if we were to acquire every possible sample, every
sample mean and constructed the distribution of the sample
mean.
What is the difference between a one-tailed and two-

tailed hypothesis?
In an experiment, a one-tailed (directional) hypothesis is an

alternative hypothesis that predicts the direction of the
expected difference between groups. As an example, it is
hypothesized that participants in the low-carb group will
lose more weight than those in the high-carb group. A two-
tailed (non-directional) hypothesis is an alternative
hypothesis that predicts that the groups being compared
will differ, but does not predict the direction of difference.
As an example, it is hypothesized that there will be a
difference in the amount of weight loss between
participants receiving a low-carb and those receiving a
high-carb diet.
What is the difference between a null and alternative

hypothesis?
Null hypothesis- there is no difference, no effect (no

stat...signif...difference) .
181
Alternative (research) hypothesis- there is a difference, an
effect (stat..signific..difference)
When the Null is rejected we infer there is evidence for the

alternative. We are attempting to falsify the null- Reject the
null. If the findings are not statistically significant we
FTR- fail to reject the null.
What is cohen’s d?
Cohen's d - an inferential statistic for measuring effect

size. Inferential statistics are procedures for drawing
conclusions about a population based on data collected
from a sample.
Effect size can be defined as the magnitude of variance in

the dependent variable that is accounted for by
manipulation of the independent variable.
Values of d
.20 small
.50 medium
.80 large
1.10 very large
1.40+ extremely large
Are there any general guidelines on how to read a

research paper?
Consider the publication source- Most of the research you

read should come from scholarly journals.
Ask yourself- Why am I reading this? What is my purpose

for reading this report? It is important to read articles that
182
are relevant to your interests or questions.
Read abstracts- You can get a basic overview of the paper

by reading the abstract.
Read headings and briefly look at other sections- If you

found the abstract interesting or relevant, then quickly
review the various headings.
Focus on relevant sections- Once you have looked over the

paper, you should have a general idea of the specific areas
that are relevant. Sometimes there will be a small portion
of the paper that is relevant, while other times the entire
paper is relevant.
Look at relevant references- Repeat the same process when

reviewing references.
How do we know if a device measures what it claims to?
In other words- Is the measuring device valid? Validity-

an indication of whether a device measures what it claims
to measure.
Ask 2 basic types of questions:

1- do new measures in the procedure relate to other valid
existing measures? As an example, do measures on a new
intelligence test correlate strongly with measures on
existing valid intelligence tests? A strong correlation is .7
or above in most cases. However, some researchers
suggest that a strong correlation is .8 or above.
2- another method involves Known Group Differences

Method- you ask do the scores on a test differ across people
183
with different levels of the trait being measured? If
measuring depression, you can compare scores on the test
for those who have depression with those not suffering
from depression. There should be differences in scores if
the measure is valid.
When is it appropriate to use an ANOVA (analysis of

variance)?
ANOVA- an inferential statistical test used for comparing

the means of two or more groups. Often, researchers use a
t test when comparing two groups, but an ANOVA can also
be used. You cannot use a t-test on more than two groups.
For any data that you analyze with a t-test you could also
analyze with an ANOVA. Why continue to use t-tests if an
ANOVA does everything a t-test does and more? Dr.
Jonathan Gore answered this question with the following
answer: “The F-statistic doesn’t indicate the direction of the
difference, whereas a t-statistic will, based on whether it is
a positive or negative. That’s really the only reason.”
When is it appropriate to use Tukey’s HSD?
Tukey's HSD (honestly significant difference) is a post hoc

test used, when statistical significance if found, with
ANOVAs. Tukey’s allows researchers to make all pairwise
comparisons when conditions have equal n. That is, all
different possible pairings are compared. Tukey's HSD
allows you to compare all of the groups with each other in
order to see where the difference occurs.
Do not use Tukey's HSD when the groups contain uneven

numbers of participants. If the number of participants in
each group is not equal a different post hoc test should be
184
used.
How are experimental within-participants and between-

participants designs different?
Within-participants design: all participants are exposed to

every level of the IV (all groups, treatments or conditions).
Between-participants design: each participant is exposed to
only one level of the IV. That is, an individual participant
does not receive, or take part in more than one level of the
treatment.
On another note, the phrases within- subjects and between-

subjects are generally only used when referring to non-
human animals. Of course, some researchers still use the
terms within-subjects and between-subjects design when
referring to human studies.
In factorial designs (study with more than one

independent variable) researchers are interested in
main effects and interaction effects? What is the
difference between these types of effects?
Main effect- the effect of a single independent variable
Interaction effect- the effect of each independent variable

across the levels of the other independent variable.
When there is an interaction effect it is multiplicative. That

is, the effect of the two variables together is greater than the
sum of effects, which would be an additive effect.
When considering whether or not there are interaction
effects there must be at least two IVs (independent
variables).
185
What is the difference between a independent measures
ANOVA and a two-factor ANOVA?
The independent measures ANOVA is when you have 2 or

more treatments. ANOVA can be used in place of a t-test,
but a t-test cannot necessarily be used in place of ANOVA,
if there are more than two groups.
With the two-factor ANOVA there are 2 IVs and one DV

(more than one DV is called a MANOVA). There are three
possible effects with a two-factor ANOVA (aka. Two way
ANOVA).
Possible effects:
Main effect: IV 1
Main effect: IV 2
Interaction effect: IV 1 * IV 2 (multiplicative effect)
When writing a research report is it necessary to state

both the null hypothesis and alternative hypothesis?
Only the alternative hypothesis is stated in the research

report. It is generally stated at the end of the introduction
section. The hypothesis should be clearly stated. When
someone reads the hypothesis there should be no
guesswork regarding the hypothesis.
What are the APA guidelines for citing authors in the

text?
If there are more than two authors cite all the authors the
first time they are cited, then for subsequent citing of the
work use the last name of the first author followed by - et
al., date. If there are one or two authors you cite one or
186
both of them each time they are cited.
If there are more than six authors cite the first author
followed by et al., date the first citing and all subsequent
times.
Statistics are difficult to learn for many people. Are

there any suggestions that can be used to enhance
learning?
Focused attention - minimize distractions (be attentive to

desired sensory inputs while ignoring distractors- unwanted
sensory inputs)
Deep processing- think deeply about the meaning of the

material you are studying
Memory connections- try to connect the material your are

attempting to learn to other items you already have in
memory
Spaced Study effects- multiple, short study sessions

promote learning better than long marathon like sessions.
As an example, 3 1-hour sessions will be more beneficial
than 1 3-hour session.
Testing- test yourself on a regular basis
Minimize stress- high stress levels are detrimental to

working memory and the formation of explicit long term
memory
In the words of Dr. Osbaldiston (Research Methods and

Statistics Teacher, Eastern Kentucky University)
187
“Repetition is the mother of all skills”
Is the standard score (z- score) the same as the standard

deviation?
No, they are not the same. The confusion usually occurs
when one reads that the standard score is used to indicate
the number of standard deviations and individual score is
from the mean of the distribution. That is, different than
saying the standard score and standard deviation are
synonymous.
The standard score represents a single score. The standard

score allows researchers to see where an individual score
falls in the distribution, and it allows researchers to
compare scores taken from different distributions. To
reiterate, a standard score is a measure of how many
standard deviation units an individual score is from the
mean of the distribution (entire set of scores). Standard
scores tell researchers whether and individual score is
above the mean (a positive score) or below the mean (a
negative score). Keep in mind a standard score of 0 doesn’t
mean the score is low, it indicates the score is even with the
mean.
The standard deviation is the most common measure of

variation. It is defined as the average difference between
the scores in the distribution and the mean or central point
of the distribution. Precisely, the standard deviation is the
square root of the average squared deviation from the
mean. The standard deviation is the average movement
away from the center of the distribution. The standard
deviation is commonly reported in research reports.
188
Why are z-scores important?
One, of the primary purposes is to describe the exact

location of an individual score relative to the other scores in
the distribution. The z-score does this by converting each
individual score into a signed number (+ or -). The sign
tells whether the score is above (+) the mean or below (-)
the mean. The number indicates the distance between the
score and the mean in terms of units (numbers) of standard
deviations. To get the z-score, the score is subtracted from
the mean, then the difference is divided by the standard
deviation. Thus, the z-score reflects the distance in terms
of standard deviation units. It is important to note that the
distribution of z-scores will always have a standard
deviation of 1.
Another purpose or advantage of the z-score is that enables

scores to be compared from completely different
distributions. Generally, if two scores come from different
distributions they can’t be directly compared. However, the
z-score makes the comparison possible. As an example,
Bill received a score of 60 on a history test and score of 56
on a english test. In order to make comparisons you must
know the mean and standard deviation for each
distribution. Suppose the english scores has a mean of 48
and a standard deviation of 4, and the history scores has a
mean of 50 and a standard deviation of 10. When the z-
score is calculated it shows:
60-50 / 10 = +1.0 history test
56-48 / 4 = +2.0 english test
In terms of relative standing (others taking test) Bill
189
performed better on the english test.
What is a primary goal of inferential statistics?
Inferential statistics uses sample data in order to answer

questions about the population. The primary goal of
inferential statistics is to be able to start with a sample, and
from that sample make an inference that allows a general
question about the population to be answered. The
relationship between samples and populations are generally
defined in terms of probability.
What three forms are used to express probability

values?
The term probability is used in a situation where several

different outcomes are possible. The probability for any
specific outcome is defined as a proportion or fraction of all
the possible outcomes.
Fractions, decimals and percentages are used to express

probability values.
Consider the following example: p= 1/5=.20=20%
When testing the null hypothesis there are four possible

outcomes. What are they?
Type 1 error- rejecting the null hypothesis when it is true
Correct- rejecting the null hypothesis when it is false
Type 2 error- failing to reject the null hypothesis when it is

false
190
Correct- failing to reject the null hypothesis when it is true
What are some general guidelines to follow when

writing and abstract?
Abstract is a brief overview of the article
Provide a concise summary of the article
The length is approximately 120-150 words
Provide the hypothesis or hypotheses, method, and major

results
What are some general guidelines to follow when

writing the method section of a research report?
Provide an explanation of how the research was conducted
Generally, include subsections (depending on the field, type

of paper) - Participants, Materials, and Procedure
The method section should provide a detailed description of

how the variables were operationally defined
Others should be able to replicate the study after reading

method section
When researchers say that variables are correlated

what do they mean?
Correlation between variables implies as one variable

changes, the other variable has a tendency to change. That
doesn’t mean the one causes the other. A Correlation claim
191
involves at least two variables, and the variables are
measured, but not manipulated. The correlation coefficient
is the measure of the degree of relationship between scores.
It can vary between –1.00 and +1.00.
A positive correlation occurs when variables change

together in the same direction. As an example, if one goes
up the other also goes up, and if one goes down the other
also goes down. A negative correlation (inverse association)
occurs when variables move in opposite directions. As an
example, when one goes up the other goes down, or when
one goes down the other goes up.
What is statistical validity?
When researchers question a study’s statistical validity they

ask a few questions. If the study found a difference what is
the probability that the conclusion was a false alarm? If the
study’s finding found no significant difference what is the
probability that a relationship went unnoticed? What is the
effect size? Is the difference between groups statistically
significant? Basically, statistical validity means that the
conclusions are verified by the results of the statistical
analysis. Don't exaggerate the statistical findings; the stats
have limitations and must be considered in the context of
the study, and background literature regarding researched
topic.
How do researchers conducting observational research

combat reactivity (observer effects)?
Reactivity occurs when individuals change their behavior

when they know someone is watching them. Researchers
often combat this problem in different ways. One, they
192
make unobtrusive observations (they hide). Two, they wait
it out. That is, they let the participants or subjects get used
to their presence before recording observations. Third,
instead of measuring behavior researchers may measure
traces as of behavior. As an example, in an art exhibit,
wear-and-tear on the floor tiles can tell you which areas are
most traveled.
What is the difference between random sampling and

random assignment?
With random sampling researchers select a sample using a

random method so that every member of the population has
the same chance of being in the sample. This enhances
external validity. Random assignment is used in
experimental research. Participants or subjects are
randomly assigned to different levels of the independent
variable. Random assignment improves internal validity.
Can we make a causal inference from a correlation?
Many people have an automatic tendency to make a casual

inference from any correlation claim they read about. This
is problematic. Three criteria need to be met to establish
causation:
Covariation. There must be a correlation between the cause

and effect.
Temporal precedence. The cause must precede the effect.
Internal validity. There can’t be plausible alternative

explanations that explain the relationship.
193
The temporal precedence rule is sometimes referred to as
the directionality problem. This is because we don’t know
which variable came first. The internal validity rule is
sometimes referred to as the third-variable rule. When an
alternative explanation for the association between
variables is plausible, the alternative is the third variable.
Correlation alone cannot be used to establish cause and
effect. Two other criteria – temporal precedence, and
internal validity- are also needed to deem cause and effect
relationship.
However, with complex stats a degree of causality may be

inferred- e.g. partial correlation, multiple regression and
path analysis. Labeling a variable(s) as definite causal
factors is often problematic; the findings are also possibly
influenced by an array of factors (including unsystematic
processes). Consider interaction factors, and how a causal
variable may change the functioning of another variable.
What are two basic types of experimental within-groups

designs?
When using within-groups designs all participants are

exposed to all levels of the independent variable. The two
types of designs are concurrent- measures designs, and
repeated- measures designs.
With concurrent-measures designs participants are exposed

to all levels of the independent variable at approximately
the same time. A single preference is the dependent
variable.
With repeated-measures designs participants are measured

on a dependent variable more than once. This occurs after
194
exposure to each level of the independent variable.
What are the differences between small-N design, and

Large-N designs?
In Large-N designs data from individual participants are not

of specific interest; data from all participants in each
condition (group) is combined and studied together. Every
participant, in a Small-N design, is treated as a different
experiment. Small-N designs are mostly repeated-measures
designs. Researchers observe how the participant or
subject reacts to different systematically designed
conditions.
With Large-N designs data are presented as group averages.

With Small-N designs individuals’ data are presented.
With Large-N designs researchers decide if a result is

replicable by doing a statistical significance test. With
Small-N designs researchers decide whether a result is
replicable by repeating the experiment on a new
participant.
What is a Meta-analysis?
A meta-analysis is a mathematical procedure that averages

the results of studies that have tested the same or similar
variables. With a meta-analysis researchers hope to see
what conclusion the whole body of evidence supports.
Steps used in a meta-analysis:
First, researchers collect relevant studies. Then, they

average effect sizes to find an overall effect size.
195
Researchers may sort a group of studies into categories,
which allows them to compute separate effect size averages
for each category.
A potential problem with meta-analyses is missing some

relevant studies due to publication bias. To combat the
problem researchers contact colleagues or others who may
assist, and attempt to locate published and unpublished
data. Keep in mind there are good studies that are never
published. A meta-analysis can be a good tool for
assessing the weight of the evidence on a specific topic.
What are the differences between quasi-experimental

research and experimental research?
In experimental designs participants or subjects are

randomly assigned to conditions. Participants are not
randomly assigned to conditions in quasi-experimental
designs. Sometimes it is not possible, or ethical to assign
participants to groups. In these cases quasi-experimental
designs are often used. Quasi-experimental designs have
an intermediate value for cause-and-effect, according to
Patten (2004). That is, quasi-experimental designs are not
as valuable as experimental designs, but are more valuable
that descriptive designs (archival, case, surveys,
observational studies) in regards to determining cause-and-
effect.
What is the difference between nominal, ordinal,

interval, and ratio scales of measurement?
With a nominal scale, objects or individuals are assigned to

categories that do not have numerical properties. Variables
measured on a nominal scale are often referred to as
196
categorical variables because the measuring involves
dividing objects or individuals into different categories.
Examples of data measured on a nominal scale are gender
and ethnicity.
With an ordinal scale, objects or individuals are categorized

to form a rank order along a continuum. Data on this scale
are referred to as ranked data because the data are presented
/ ordered from highest to lowest, or biggest to smallest. An
example of ordinal scale data is reporting how students did
on a test based on their rank (highest, second highest, and
so on).
With an interval scale, the units of measurement (intervals)

between the numbers on the scale are all equal in terms of
size. As an example, the Fahrenheit scale is an interval
scale. The interval scale does not have an absolute zero.
Because the interval scale does not have an absolute zero
ratios cannot be based on this scale (example, 80 degrees is
not twice as warm as 40 degrees). You can still perform
statistical computations on interval data.
With a ratio scale, in addition to order and equal units of

measurement, an absolute zero is indicative of absence of
the specific variable being measured. Examples of ratio
scales include time, height and weight. With ratio scales, it
is possible to form ratios. As an example, 120 pounds is
twice as much as 60 pounds. Statistical computations can
be performed on ratio data.
The four measurement scales above are presented in order

of sophistication. The four levels of measurement are
based on the characteristics of the data (information
provided). The characteristics include identity, magnitude,
197
equal unit size and absolute zero.
Nominal has the characteristic of identity. Ordinal has

characteristics of identity, and magnitude. Interval has
characteristics of identity, magnitude, and equal unit size.
Ratio has all four characteristics- identity, magnitude, equal
unit size, and absolute zero.
What are measures of central tendency?
A measure of central tendency is a number that

characterizes or represents the center (middle) of the entire
distribution. The three measures of central tendency are the
mean, median and mode.
The most commonly used measure of central tendency is

the mean, which represents the average of a group of
scores. The mean is appropriate to use for interval and
ratio data, but it is not appropriate for ordinal or nominal
data. What is the mean for the following group of scores-
2, 3, 4, 6,15? Answer: 6
The median, another measure of central tendency, is often
used in situations where the mean might not be
representative of the distribution of scores. The median is
the middle score in a distribution after the entire set of
scores have been arranged either from highest to lowest or
from lowest to highest. The median is not affected by
extreme scores due to it is a positional value. The median
can be used with ratio and interval data and is not
appropriate to use with nominal data. The median can be
used with ordinal data. What is the median for the
following group of scores- 3,4,7,8,10? Answer: 7
Another measure of central tendency is the mode- the score
198
in a distribution of scores that occurs with the greatest
frequency. With some distributions, all scores occur with
the same frequency, which means there is no mode in this
situation. In some distributions numerous scores may
occur with equal frequency, thus meaning more than one
mode. What is the mode for the following group of scores-
2,3,5,5,6? Answer: 5
How do researchers control for the threat to internal

validity called- Nonequivalent control group?
A basic concern for researchers conducting experimental

research is that participants in different groups (conditions,
levels) are equivalent at the beginning of the study. For
example, if researchers wanted to test the effectiveness of
an alcohol drinking cessation program, and compared a
group of alcohol drinkers who voluntarily signed up to a
group of alcohol drinkers who did not voluntarily sign up,
the groups would not be equivalent. It is possible that these
groups may be very different in a number of ways. As
another hypothetical example, researchers assigned the first
twenty people to sign up for the study to the experimental
group while signing the last twenty people who signed up
to the control group. This is problematic, as there may be
differences between those who sign up early and those who
sign up late. Individuals are placed in either of the two
groups based on a specific characteristic. This means they
are not randomly assigned, thus this is not an experiment,
and presents a large threat to internal validity.
Using random assignment and sometimes using random

sampling typically combat this problem. When random
assignment does not occur, participant selection or
assignment problems might result. If random assignment is
199
not used then the study is a quasi-experiment.
How do researchers control for the threats to internal

validity called- Experimenter and Participant effects?
With experimenter and participant effects, either the

experimenters, participants or both consciously or
unconsciously affects the study’s outcome. These threats to
internal validity are often controlled for by using a single-
blind or double-blind procedure.
In a single-blind procedure the participants are blind to the

manipulation. In a double-blind procedure neither the
experimenter nor the participant knows the condition (level,
group) that the participant is being exposed to.
What is the difference between and Independent –

groups t test and a Correlated-groups t test?
An Independent-groups t test is a parametric inferential test

used for comparing means of two independent groups of
scores. When using the independent-groups t test,
researchers are determining how far the difference between
the samples means falls from the difference between the
population means. If the difference between sample means
is large the difference between groups will be statistically
significant, indicating there is a true difference between
groups.
A Correlated-groups t test is a parametric inferential test

used to compare the means of two related samples. The
same participants are used in each group (within-
participants design) or different participants are matched
between groups (matched-participants design). The test
200
shows if there is a difference between sample means and
whether the difference is greater than would be expected by
chance.
What is the difference between Cross-sectional and

Longitudinal designs?
With cross-sectional designs researchers study individuals

that are different ages at the same time. An advantage of
this type of design is that a variety of ages can be studied in
a short period of time. A key issue being addressed is
whether or not there are differences across different ages.
In addition to testing individuals of different ages
researchers are testing individuals born in different
generations. The researcher wants to conclude that any
differences in measured variables is due to age, but it is
possible that some or all of the differences may be due to a
generational effect.
With a longitudinal design the same individuals are studied

over the course of time. Longitudinal studies vary in
length; ranging from months to decades. This study
eliminates generational effects, such as those seen with
cross-sectional designs. Longitudinal designs are
problematic when comparing to cross-sectional designs for
a couple of key reasons. One, they are more expensive and
time-consuming than cross-sectional designs. Two,
researchers need to be attentive to attrition problems, due to
those who drop out of the study likely differ in some way
than those who stay in the study.
What is an ex post facto design?
An ex post facto design is a type of quasi-experimental
201
design that resembles an experiment is some ways, but is
different in others. Ex post facto designs and experiments
involve comparison groups, but ex post facto designs do
not involve manipulation of independent variables. In ex
post facto designs researchers choose a variable of interest
and select participants who already differ on this variable.
Both designs (experimental and ex post facto) involve the

measurement of dependent variables. As an example,
researchers might be interested in studying the level of
obesity (participant variable) and food selection (dependent
variable) by selecting a group of participants who are obese
and by choosing a comparison group who are not obese to
determine difference in food selection.
What is epidemiology?
Epidemiological studies investigate factors contributing to

enhanced health or the occurrence of a disease in a specific
population. Epidemiologists make fundamental
contributions to health studies by identifying risk factors
for diseases. A risk factor is any condition that occurs with
more frequency in people who have a disease than in
people who do not have that specific disease.
Epidemiologists have studied, and continue to study,
demographic and behavioral factors that are related to
cancer, heart disease, and various chronic diseases.
Epidemiological studies were the first to detect a
relationship between the behavior of smoking and heart
disease (Brannon & Feist, 2010).
How does science ensure that it produces objective

findings?
202
Scientific researchers are careful, not to let personal biases
about the world blind them to reality. Scientists rely
primarily on objective information. History shows us that
subjective experience is confounded. Many people do not
understand the problems with subjective interpretations.
Instead of acknowledging our tendency to fool ourselves
many people focus on what feels right.
To avoid biased perceptions, often leading to unrealistic

beliefs, scientists base their beliefs on observable evidence
that others can critique or double-check. That is, scientists
look for independent evidence to support their claim(s).
They look to others to make similar findings. Their studies
are presented in a precise manner so others can attempt to
replicate their findings. If others carrying out the same
procedures do not replicate their finding then it is necessary
to re-evaluate their findings. There is probably a flaw in
the original design. Scientists put their assertions to the test
and allow independent parties to do the same. However,
scientists do demonstrate varying levels of bias and their
thinking and implications of research can be less than
optimal. Parts of scientific processes do demonstrate
subjectivity. Scientists and or scientific thinkers aren't
perfectly objective, it is just they are generally more
objective than non-scientific counterparts.
Why is it important to understand research

methodology?
Scientific information is derived from scientific research.

So, in order to really have a full understanding of science it
is important to not only understand the philosophy of
science, but also understand the methods used to collect
data. In order to understand how to interpret the data
203
collected a basic understanding of statistics is needed.
Understanding research methodology and statistics will
assist you in:
Reading scientific journals
Distinguishing science from pseudoscience (in popular

science articles)
Protection from quacks
To be a better thinker
To be scientifically literate
Being an independent consumer of research information

(you can decide the credibility of the information)
Why can’t descriptive research methods test causal

hypotheses?
One big problem is determining which came first- cause or

effect. This is often referred to as the directionality
problem. This presents a problem when determining
causation; this is a big threat to temporal precedence. To
establish temporal precedence cause needs to occur before
effect.
Another problem, is another variable (trait or

characteristic= with more than one level) may be
influencing the relationship of the variables being studied.
This scenario is often referred to as the third variable (may
be present that is creating the relationship) problem. There
may be an array of other factors that have lead to the
204
outcome. There are plausible possible alternative
explanations for the outcome.
Even if descriptive methods find associations they still

cannot correctly deem causality due to the directionality
and third variable problem.
What are some advantages and disadvantages of survey

research?
Advantages of survey research are:
Relatively inexpensive way to collect information
It allows researchers to collect a large amount of

information from a large sample
Disadvantages of survey research are:
If answers provided to surveys are inaccurate, then we have

poor construct validity
Causation cannot be determined using survey research
What do researchers mean when they counterbalance,

when using within-participants designs?
Counterbalancing occurs when participants are randomly

assigned to systematically varying sequences of conditions
so that order effects are balanced out. As an example, in
one study participants watched a video while eating M&Ms
from a bowl containing different colors of M&Ms, or a
bowl containing the same color of M&Ms (Hale & Varakin,
2013-not yet published). Half of the participants watched
205
the video when the multi-color bowl was placed in front of
them on the left, and the other half watched a video while
the multi-color bowl was placed to the right. The single
color bowl in both conditions was placed opposite side of
the multi-color bowl.
Are there general recommendations for stating a

hypothesis in a research report?
Make sure your hypothesis is stated clearly, usually at the

end of the introduction section. Be careful with the
wording chosen for stating your hypothesis. When stating
the hypothesis do not use words (leads to, makes, impacts,
etc.) that imply cause-effect, if you are not conducting an
experiment. You only use words that imply cause and
effect when conducting an experiment. As an example, if
you were conducting survey research, your hypothesis
should not be “A sedentary lifestyle causes anxiety”, but
rather “A sedentary lifestyle is associated with anxiety.”
Is an experiment the same type of research method as a

randomized controlled trial?
They are similar, but they might not be exactly the same.
With a randomized controlled trial participants are
randomly assigned to two or more groups. The only
difference between the two groups (regarding relevant
factors) is the levels (conditions) of the variable, which is
being studied, that they are exposed to. In a randomized
controlled trial at least one of the groups is a control group.
In experiments, the different groups might be exposed to
different treatments, while there is no control group. An
experiment may or may not have a control group.
206
A randomized controlled trial, as with an experiment, must
avoid the problem of self-selection. This problem is
avoided by not allowing participants to choose which level
of the variable (being studied) they are exposed to. To
reiterate, they are randomly assigned.
A research design that tests the effects of a medical, or drug

treatment is called a clinical trial. Many clinical trials are
randomized controlled trials.
When referring to the shape of a distribution what is a

normal distribution?
A normal distribution is a theoretical frequency distribution

that has special characteristics. First, it is bell shaped and
symmetrical- each half is a mirror image of the other half.
Second, the measures of central tendency are equal and
located at the center of the distribution. Third, it only has
one mode (unimodal). Fourth, most of the observations are
grouped together around the center of the distribution, with
far less observations being at the ends of the distribution.
Last, when standard deviations are plotted on the x-axis,
the percentage of scores that occur between the mean and
any point on the x-axis is the same with all normal curves.
Why do researchers sometimes use the words true

experiment? Is there a difference between an
experiment and a true experiment?
True experiments make use of random assignment when

creating different levels (groups, conditions) of the variable
of interest (or independent variable). True experiments
involve two or more groups and a high degree of control;
with the only difference between groups being the level of
207
the independent variable(s) (variable of interest) they are
exposed to. Some researchers suggest that if a treatment is
administered or arrangements are made for treatment
administration the study is labeled an experiment (Patten,
2004).
What is implied by an experiment may vary, depending on

which area of science is being discussed. Also, the word
experiment in everyday language refers to something
drastically different than a scientific experiment. In
everyday language experiment is generally used to imply
trying something out.
When referring to a distribution of scores, the words

extreme scores are sometimes used. When is a score
considered extreme (outlier)?
An extreme score is located far from the mean.

Researchers have different ideas about what constitutes an
extreme score. Extreme scores are not generally
determined by a mathematical rule. In the studies that I
have been involved with scores have been omitted that are
beyond three standard deviations. In a normal distribution
approximately less than one percent of scores fall outside
of three standard deviations. Of course, other researchers
may drop scores from their statistical analyses that are less
than or even slightly more than three standard deviations.
The researcher’s specific treatment of extreme scores
should be mentioned in the research report.
What is a factorial design?
A factorial design is one in which there are two or more

independent variables (independent variables are
208
sometimes referred to as factors). In the most common
factorial design, researchers use two independent variables.
Researchers study each possible combination of the
independent variables.
When considering the effect of an additional independent

variable, you are asking about the interaction of two
independent variables. That is, does the effect of the
original independent variable depend on another
independent variable?
When analyzing factorial designs there are three outcomes

to inspect: two main effects and one interaction. A main
effect is the overall effect of one independent variable on
the dependent variable. An interaction is the effect of each
independent variable across levels of the other independent
variable(s).
What is a p-value?
The probability, assuming the null hypothesis, of observing

or collecting a score equal to or more extreme than the one
observed. It is common to say that the difference is
statistically significant if p < .05. This choice is not due to
logical or statistical reasons, but it has become common
practice. There is no statistical tool to tell the truth value of
the hypothesis (Reinhart, 2015), even though we use the
terms "reject the null" and "failure to reject the null."
What do researchers and statisticians mean when they

refer to the "statistical wars"?
This term is often used in the context of discussion on

Bayesian vs. Frequentist stats. There is disagreement about
209
which statistical methods are superior and the implications
of these stats. This is a big topic. Refer to the sources I
mentioned earlier in the section titled- Why We Need
Statistics.
What are examples of different types of qualitative

research?
Case study, archival method, interviews and focus group

interviews, field studies, action research, naturalistic
observation and laboratory observation are different types
of qualitative research. The strategies used for each
method varies. Naturalistic and observational are also
sometimes considered quantitative research; it depends on
data analysis. Qualitative research doesn't use numbers;
quantitative research does. Often, research makes use of
mixed methods, both qualitative and quantitative. Both
methods are important and their reliability, validity, basic
and applied nature should be considered.
Is science experiencing a replication crisis?
Ideally, scientific research should be replicable

(reproducible). The research should use processes that can
be used by others wanting to conduct a similar or the same
study. When referring to the replication crisis it is often
understood that what is meant is lack of replicating
statistically significant findings. It would be more precise
to say there is a "statistically significant replication crisis."
It is possible that original studies that fail to show
significance may demonstrate a type 2 error- missing an
effect. This could occur do to a number of methodological
or statistical issues. As an example, when I conducted a
study on expectations influence on food liking the finding
210
was insignificant; when I ran a statistical power analysis it
revealed I needed a larger sample, considering effect size
and p-value to find significance. Statistically significant
and insignificant finding should be replicated, and they
should involve different type of replications using samples
with varying characteristics.
What are some different types of replication studies?
There are least 3 general types of replication studies- direct

replication, conceptual replication and replication-plus-
extension. In direct replication, researchers attempt to
conduct research using methods that are as close as they
can to those used by original researchers. The more
transparent the original research the easier it will generally
be to directly replicate. In conceptual replication,
researchers address same topics, questions, but use
different methods. Variables are manipulated and measured
using different strategies, but conceptualization remains
intact. In a replication-plus-extension study, researchers
replicate original studies, but also add variables, that may
include different operationalizations.
What are the implications of replication studies?
Extra weight is often given to studies that are replicated

(also find significance) outside of the original lab, or when
conducted by researchers other than the ones making the
original findings. A red flag is indicated if only a specific
group or lab is able to make a finding. Why is it others
can't make the finding? It is essential that researchers are
transparent with their methods and all relevant research
materials. Strong evidence is the result of various studies;
not a single study, or series of studies that can only be
211
found by one research group. To reiterate, scientific
progress is cumulative; it develops as a product of the
work, of sometimes many people. In some cases it is
necessary to repeat studies that didn't find significance.
The original study might be flawed.
What is the positive-outcome bias?
Positive- outcome bias is the increased likelihood that a

study, with a statistically significant finding, will be
published versus a study of the same value, but with an
insignificant finding. A study was conducted by Gwendolyn
et al. (2010) to determine if positive-outcome bias was
present during peer review. The researchers were interested
in whether peer reviewers would "recommend publication
of a 'positive version' [statistically significant] of a
fabricated manuscript over an otherwise identical 'no
difference' version". The results indicate reviewers were
more likely to recommend publication of the positive
version, even though the papers were almost identical,
with the only difference being the direction of the finding.
This demonstration of publication bias is detrimental to
science.
When is it appropriate to use the statistical technique,

partial correlation?
Occasionally a researcher may suspect that a relationship

existing between two variables might be caused, influenced
by another variable. This is referred to as the third variable
problem. Partial correlation allows a researcher to measure
the relationship between two variables while eliminating or
controlling for (holding constant) a third variable. This
statistical technique allows computations of three
212
individual correlations.
A partial correlation can show that an apparent relationship

between two variables is really caused by a third variable.
Thus, indicating no direct relationship between the original
two variables. Partial correlation can demonstrate there is a
relationship between two variables even after a third is held
constant. Thus, there is a real relationship between the
original two variables that is not caused by the third
variable.
What is multiple regression analysis?
Multiple regression analysis is the process of using multiple

predictors in an effort to acquire accurate outcome
predictions. It offers an explanation regarding the
percentage of outcome allocated to the various predictors.
Multiple regression analysis evaluates the contribution of

individual predictors after the influence of other predictors
has been considered. This allows determination of whether
each predictor contributes to the relationship by itself or
duplicates the contribution already made by another
variable.
213
Recommended Sources
Gravetter, F.J., Wallnau, L.B. (2013). Statistics for the

Behavioral Sciences (9th edition). Australia:
Wadsworth Cengage Learning.
Jackson, S.L. (2009). Research Methods and Statistics: A

critical thinking approach (3rd edition). Australia:
Wadsworth Cengage Learning.
Keshav, S. (2007). How to read a paper. ACM Sigcomm

Computer Communication Review, 37(3), 83-84.
Little, J.W., & Parker, R. (2010). How to read a scientific

paper. Online
http://www.biochem.arizona.edu/classes/bioc568/pa
pers.htm
Mitchell, M.L., & Jolley, J.M. (2010). Research Design

Explained (7th edition). Belmont, CA: Wadsworth
Cengage Learning.
Morling, B. (2012). Research Methods in Psychology:

Evaluating a World of Information. New York, NY:
W.W. Norton & Company, Inc.
214
Myers, A., & Hansen, C. (2002). Experimental Psychology
(5th edition). Australia: Wadsworth Thomson
Learning.
Patten, M.L. (2004). Understanding Research Methods: An

overview of the essentials (4th edition). Glendale,
CA: Pyrczak Publishing.
Pyrczak, F., & Bruce, R.R. (2003). Writing Empirical

Research Reports: A basic guide for students of the
behavioral sciences. Los Angeles, CA: Pyrczak
Publishing.
Reinhart, A. (2015). Statistics Done Wrong: The Woefully

Complete Guide. San Francisco, CA: No Starch
Press, Inc.
Shaughnessy, J.J., & Zechmeister, E.B. (1990). Research

Methods in Psychology (2nd edition). New York,
NY: McGraw-Hill Publishing Company.
Warner, R.M. (2008). Applied Statistics: From Bivariate

Through Multivariate Techniques. Los Angeles,
CA: Sage Publications.
215
References
Brannon, L., & Feist, J. (2010). Health Psychology: An

Introduction to Behavior and Health. Belmont, CA:
Wadsworth.
Dawes, R., Faust, D., & Meehl, P. (1989). Science. New

series, 243 (4899), 1668-1674.
Dawes, R. (1994). House of Cards: psychology and

psychotherapy built on myth. New York: Free Press.
Dawes, R. (1996). House of Cards: psychology and

psychotherapy built on myth. Australia: Simon and
Schuster.
Feynman, R. (1999). The Pleasure of Finding Things Out:

The Best Short Works of Richard P. Feynman.
Cambridge, MA: Basic Books.
Frey, D., & Stahlberg, D. (1986). Selection of information

after receiving more or less reliable self-
threatening-information. Personality and Social
Psychology Bulletin, 12, 434-441.
Garb, H.N. (1998). Studying the Clinician: Judgment

research and psychological assessment.
Washington, DC: American Psychological
Association.
Emerson, G.B., Warme, W.M., Wolf, F.M., Heckman, J.D.,

Brand, R.A., & Leopold, S.S. (2010). Testing for
the Presence of Positive-Outcome Bias in Peer
Review. Arch. Intern. Med. 170 (21), 1934-1939.
216
Gilovich, T. (1991). How We Know What Isn’t So: The
Fallibility of Human Reason in Everyday Life. New
York: The Free Press.
Godlee, F., & Jefferson, T. (2003). Peer review in health

sciences (2nd ed.). London: BMJ Books.
Grove, W.M., & Lloyd, M. (2006). Meehl’s Contribution to

Clinical Versus Statistical Prediction. Journal of
Abnormal Psychology, 115(2),192–194.
Grove, W.M., & Meehl, P. (1996). Comparative efficiency

of informal and formal prediction procedures: The
clinical-statistical controversy. Psychology, Public
Policy and Law, 2, 293-323.
Hale, B. (2011). Predicting Criminal Behavior. College

term paper.
Hale, J. (2011). Scientific Measures: Reliability and

Validity. Retrieved from
http://psychcentral.com/blog/archives/2011/10/16/sc
ientific-measures-reliability-and-validity/
Hale, J. (2010). Dysrationalia: Intelligent People Behaving

Irrationally. Retrieved from
http://jamiehalesblog.blogspot.com/2010/10/dysrati
onalia-intelligent-people.html.
Hale, J. (2012). Stats Made Ez: Stats and Public

Understanding. Retrieved from
http://jamiehalesblog.blogspot.com/search?q=gore
217
Hall, G.C. (1988). Criminal Behavior as a Function of
Clinical and Actuarial Variables in a Sexual
Offender. Journal of Consulting and Clinical
Psychology, 56 (5), 773-775.
Hawking, S.,& Mlodiknow, L. (2010). The Grand Design.

New York, NY: Bantam Books.
Jackson, S.L. (2009). Research Methods and Statistics: A

Critical Thinking Approach 3rd Edition. Belmont,
CA: Wadsworth, Cengage Learning.
Johnson, G.B. (2000). The Living World. New York:

McGraw Hill.
Labossiere, M.C. (1995). The Nizkor Project Fallacies.

Retrieved from
http://www.nizkor.org/features/fallacies/
Lilienfeld, S., Ammirati, R., & David, M. (2012).

Distinguishing science from pseudoscience in
school psychology: Science and scientific thinking
as safeguards against human error. Journal of
School Psychology, 50, 7-36.
Lilienfeld, S., & Landfield, K. (2008). Science and

pseudoscience in law enforcement: A user friendly
primer. Criminal Justice and Behavior, 35, 1215-
1230.
Lilienfeld, S., Lynn, S. J., Ruscio, J., & Beyerstein, B.L.

(2010). Great Myths of Popular Psychology:
Shattering Widespread Misconceptons about
218
Human Behaivor. Malden, MA: Wiley-Blackwell.
Lyttleton, R.A. (1977). The Nature of Knowledge. In R.

Duncan, & M. Weston-Smith (Eds.), The
Encyclopaedia of Ignorance: Everything you ever
wanted to know about the unknown (pp.9-17).
Oxford: Pergamon Press.
Manktelow, K. I. (2004). Reasoning and rationality: The

pure and the practical. In K. I. Manktelow & M. C.
Chung (Eds.), Psychology of reasoning: Theoretical
and historical perspectives (pp. 157-177). Hove,
England: Psychology Press.
Matthew. (1997). Logic and Fallacies Constructing a

Logical Argument. Retrieved from
http://www.infidels.org/library/modern/mathew/logi
c.html
Meehl, P.E. (1986). Causes and effects of my disturbing

little book. Journal of Personality Assessment, 50,
370-375.
Meehl, P. E. (1996). Clinical versus statistical prediction: A

theoretical
analysis and a review of the evidence. Northvale,
NJ: Jason Aronson. (Original work published 1954).
Meehl, P.E. (2003). Clinical versus statistical prediction: A

theoretical
analysis and a review of the evidence. Copyright
2003 Leslie J. Yonce. (Copyright 1954 University of
Minnesota)
219
Mitchell, M.L., & Jolley, J.M. (2010). Research Design
Explained 7th Edition. Belmont, CA: Wadsworth,
Cengage Learning.
Morling, B. (2012). Research Methods in Psychology:

Evaluating a World of Information. New York:
W.W. Norton & Company, Inc.
Myers, A., & Hansen, C. (2002). Experimental

Psychology. Pacific Grove, CA: Wadsworth.
Noble, W.S. (2007). Coach Hale Interview By Will Noble.

Retrieved from
http://maxcondition.com/page.php?82
Osbaldiston, R. (2011). How to read research reports.

Personal Collection of R. Osbaldiston, Eastern
Kentucky University, Richmond, KY.
Patten, M.L. (2004). Understanding Research Methods.

California: Pyrczak Publishing.
Perkins, D. (1995). Outsmarting IQ: The emerging science

of learnable intelligence. New York: Free Press.
Popper, K. R. (1959). The logic of scientific discovery.

Oxford, England: Basic Books.
Randall, L. (2012). Knocking On Heaven's Door: How

physics and scientific thinking illuminate the
universe and the modern world. New York, NY:
ECCO.
220
Reber, A.S. (1985). The Penguin Dictionary of Psychology.
London, England.: Penguin Books.
Reyna, V.F., & Farley, F. (2006). Risk and rationality in

adolescent decision making. Psychological Science
in the Public Interest, 7, 1-44.
Sagan, C. (1996). The Demon Haunted World Science As a

Candle in the Dark. New York: Ballantine Books.
Sanna, L.J., & Schwartz, N. (2006). Metacognitive

experiences and human judgment: The case of
hindsight bias and its debiasing. Current Directions
in Psychological Science, 15, 172-176.
Shakespeare, G. (2009). 5 Ways “Common Sense” lies to

you Everyday. Retrieved from
http://www.cracked.com/article_17142_5-ways-
common-sense-lies-to-you-everyday.html.
Shaughnessy, J.J., & Zechmeister. E.B. (1990) Research

Methods in Psychology. New York: McGraw Hill.
Shermer, M. (1997). Why People Believe Weird Things.

New York: Owl Books.
Shermer, M (2006). Discover Skepticism. Retrieved from

http://www.skeptic.com/about_us/discover_skeptici
sm.html
Shermer, M. (2001). The Borderlands of Science: Where

Sense Meets Nonsense. Oxford: Oxford University
Press.
221
Stanovich, K. (2007). How To Think Straight About
Psychology 8th Edition. New York, NY: Pearson.
Stanovich, K. (2009). What Intelligence Tests Miss: the

psychology of rational thought. London: YALE
UNIVERSITY PRESS.
Stanovich, K. (2009). Rational and Irrational thought: The

Thinking that IQ Tests Miss.
Retrieved from
http://www.scientificamerican.com/article.cfm?id=r
ational-and-irrational-thought
Stanovich, K. (2009a.). Decision Making and Rationality

in the Modern World. USA: Oxford University
Press
Stanovich, K. E., & Stanovich, P. J. (2010). A framework

for critical thinking, rational thinking, and
intelligence. In D. Preiss & R. J. Sternberg (Eds.),
Innovations in educational psychology:
Perspectives on learning, teaching and human
development (pp. 195-237). New York: Springer.
Stanovich, P., & Stanovich, K. (2003). Using Research and

Reason in Education: How Teachers Can Use
Scientifically Based Research to Make Curricular &
Instructional Decisions. National Institute of
Literacy
Stanovich, K. & West, R. (2008). On the failure of

cognitive ability to predict myside and one-sided
thinking biases. Thinking & Reasoning, 14(2), 129-
167.
222
Stanovich, K., West, R.,& Toplak, M. (2016). The
Rationality Quotient: Toward A Test of Rational
Thinking. Cambridge, MA: The MIT Press.
Tetlock, P.E. (2005). Expert Political Judgment. Princeton,

NJ: Princeton University Press.
Wagenaar, W.A. (1988). Paradoxes of Gambling Behavior.

Hove, England: Erlbaum.
Webster’s New World Dictionary. (2004). New York, NY:

Wiley Publishing Inc.
Wikipeida. (2010). Common sense. Retrieved from

http://en.wikipedia.org/wiki/Common_sense.
223
Appendix A
Practice Problems (Osbaldiston, 2011)

Practice analyzing research scenarios
For each of the following brief research descriptions,

determine the IV and DV. Most of the IV’s are categorical,
please say what the categories are. And most of the DV’s
can be easily operationalized; say how you would measure
the DV’s.
1- Researchers studying human memory presented people

with two lists of words. One list included the names of
objects; the other list contained abstract nouns. The
researchers found that people could remember more words
from the list with object names.
IV:
DV:
Operationalization of DV:
2- A group of researchers wanted to determine whether

animals would be slower in learning a maze when they had
been exposed to a particular drug. Half of the animals
received low doses of the drug, and the other half did not
receive the drug. The researchers then counted how many
times the animals had to run through the maze before they
learned it.
IV:
DV:
224
3- A group of researchers wanted to determine whether
people would eat more food in a cool room than in a hot
room. Half the participants ate in a warm room (75 degrees
Farenheit) and half the participants ate in a cool room (65
degrees Farenheit). The researchers then measured how
much food was consumed in each of the two rooms.
IV:
DV:
4- Researchers studying plant growth raised plants in two

different rooms. One room had soft music playing 24
hours a day; the other room was silent. The researchers
found that the plants grew better in the room where the
music was played.
IV:
DV:
5- Dr. Wilson sets up an experimental study to investigate

how self-esteem is affected by feedback from teachers.
During the study, third-grade teachers administer a short
quiz where each child earns the same score (5 out of
possible 10 points). Half the children are told that this is a
very good score; while the rest are told that this is an
average score.
IV:
DV:
225
An answer key is provided at:
http://jamiehalesblog.blogspot.com/2013/11/answer-key-in-
evidence-we-trust.html
Osbaldiston, R. (2011). Worksheet- Practice analyzing

research scenarios. Personal Collection of R.
Osbaldiston, Eastern Kentucky University,
Richmond, KY
226
Practice Problems (Osbaldiston, 2011: Problems 1-8)
Multiple choice: please select the best answer for each of

the following questions.
1- The most important difference between experimental

designs and correlational designs is:
A- experimental designs allow the researcher to use
more powerful statistical techniques
B- correlational designs allow the researcher to use
unobtrusive measures.
C- experimental designs allow the researcher to infer a
causal relationship
D- correlational designs allow the researcher to
randomly assign subjects to treatment groups.
2- There is a rule that describes what percent of people

score in each part of the normal distribution. What is this
rule?
A- + 1, +2, + 3 standard deviations

B- 68-95-99
C- has a value less than .05
D- Type 1 error rate
3- A researcher wanted to see if a new program would have

the same effect on workers paid hourly versus workers paid
on salary. What would be the best research design to use in
this situation?
A- One-way between groups ANOVA

B- One-way within groups ANOVA
C- Correlational, survey of the workers
D- Two-way ANOVA
227
4- A researcher wanted to see if a new program would have
the same effect on workers paid hourly versus workers paid
on salary. What would be an appropriate outcome variable
for this research?
A- Number of tasks performed during each task

B- Amount of pay workers received
C- Between groups variance
D- Time (or pre-post amount of change)
5- What is the relationship between , the type 1 error rate,

and , the type 2 error rate?
A- As  goes up,  goes up

B- As  goes down,  goes up
C- They both depend on the type of research design
D- They are unrelated
6- A confound is:
A- A problem to do with measuring the DV

B- A thing you have to correct for with statistics
C- Hidden IV [Flaw in experiment that may confuse
findings]
D- A measured outcome
7- The logic of an experiment begins with what underlying

situation?
A- Choosing an appropriate sample from the

population of all possible participants
B- Ensuring that the groups are initially equivalent
C- The null hypothesis is that there is no relationship
between the variables
228
D- Deciding if the procedure will be single-blind or
double-blind
8- A researcher is interested in studying the effects of

fashion trends and popularity. The researcher hypothesizes
that students who wear fashionable clothing are more likely
to be popular among their classmates. She asks each
student to rate his/her own “fashionableness” and to also
say how many friends he/she has at that school. The
relationship between these variables is statistically
significant. What would be an appropriate conclusion for
this study?
A- Fashionable cloting causes popularity

B- No conclusion is warranted because the validity of
the variables is in question
C- Fashionableness and popularity are related
D- Popularity causes students to be perceived as being
more popular.
9- External validity is concerned with:
A- determination of cause and effect

B- how well the results generalize to other contexts
C- what type of statistics are used
D- informed consent
10- What is the best type of research method that allows

researchers to determine cause and effect:
A- case study
B- correlational study
C- quasi-experimental study
D- experimental study
229
Answer key is provided at:
Osbaldiston, R. (2011). Comprehensive exam, Practice

Problems. Personal Collection of R. Osbaldiston,
Eastern Kentucky University, Richmond, KY
230
Calculating mean, median and mode:
A- Please calculate the mean for the following scores: 2, 4,

4, 6, 8, 16
B- Please calculate the mean for the following scores: 1, 1,

2, 2, 1, 5, 5, 3, 3, 2
C- Please calculate the median for the following scores:

1,1,4,3, 2
D- Please calculate the median for the following scores:

1,3, 2, 2, 1, 5, 5, 2, 2, 3
E- Please calculate the mode for the following scores: 1, 2,

1, 4, 5
F- Please calculate the mode for the following scores: 1, 2,

1, 1, 3, 3, 2, 2, 2, 2
Answer key is provided here :

231
Z - scores
Calculate the z-scores (standard scores) for the following:
Suppose you administered a test to a sample of individuals

and computed the mean and standard deviation of the raw
scores:
Mean (M) = 45Standard deviation (SD) = 4
Suppose that the individuals that took the test had the
following scores:
Person Score
Roberto 49
Dellis 45
Pamelisa 41
Hennis 39
Remember the z-score formula? Score – mean / standard

deviation
Calculate the z – score for each the individuals:
Roberto =
Dellis =
Pamelisa =
Hennis =
Answer key is provided here:

232
Appendix B
APA style citation
When describing another researcher's ideas, phrases,

methods, words, measuring tools, or research findings in
your research report, the source needs to be cited by
providing the author’s last name and year of publication.
You can present the last names as part of the sentence and
place only the year of publication in parentheses:
Results by Varner and Wilson (2000) indicate…
According to Wilson and her colleagues (2000)…
Or you can provide in-text documentation by putting the

author name(s) and date in parentheses. An example is
provided below:
One study found that participants ate more rice when it was
presented in a large bowl (Davis & Bush, 2000). With this
example, be sure to use an ampersand (&) and place the
period outside of the parentheses.
In APA-style papers, quoting does not occur often.

However, when quoting it is necessary to use quotation
marks, and indicate the page number in addition to the
author(s) and year:
“The larger the variety of foods the more an individual will

eat” (Hall, 2011, p.200).
When a source has one or two authors you will cite their
233
names and date every time you refer to that source. When
three - five authors are the source, you will cite all of the
names and dates the first time. Subsequently when citing
the source you will use the author’s first name followed by
“et al.” and the date:
The primary determinant of weight loss is calorie deficit

(Hall, Jones, Arson, Tarer, & Quick, 2000). A consistent
calorie deficit leads to weight loss (Hall et al., 2000).
When six or more authors are the source you will cite the
first author, only, with each citation:
Scientific cognition involves multiple components (Harkins

et al., 2011)
APA style reference lists
The reference list contains a list of all the sources you cited
in your paper, in alphabetical order. Do not put sources in
the reference list that were not cited in the paper. Below
is an example of a journal article with one author:
Hastings, D.A. (2012). Determining calorie levels.

Nutrition, 3, 20-25.
In the above example the journal title and volume number

are italicized. Below is an example of a journal article with
two or more authors:
Harlow, K.L., Smith, J.J., Cline, K.E., & Anson, J.A.

(2007). Understanding visual illusions. New Visual
Science, 2, 12-15.
234
Below is an example book:
Hale, J. (2005). MaxCondition. Winchester, KY:

MaxCondition Publishing.
235
Index
abstract .................... 78, 82, 84, 88, 123, 170, 183, 191, 224
Abstract ........................................................................... 123
actuarial judgment ............................................................. 49
Ad Hominem ..................................................................... 15
alpha level ............................................................... 177, 180
alternative hypothesis........................................ 29, 181, 186
anecdotes ................................................................. 2, 71, 82
Anecdotes .................................................................... 12, 71
ANOVA........................................................... 184, 186, 227
Argumentum ad antiquitatem............................................ 16
Argumentum ad novitatem ............................................... 17
Association claim ............................................................ 167
between-participants ....................................................... 185
Between-participants....................................................... 185
burden of proof ........................................................... 17, 18
causation ............. 13, 14, 37, 38, 71, 78, 165, 167, 193, 204
Causation......................................................... 2, 37, 38, 205
central tendency ........................................ 40, 176, 198, 207
cognitive ability .............. 123, 124, 125, 126, 138, 145, 222
Cognitive ability.............................................................. 125
COGNITIVE ABILITY .................................................. 138
cohen’s d ......................................................................... 182
Common Myths .................................................................. 3
common sense ................................................34, 35, 36, 117
Common sense .................................................... 34, 36, 223
Common Sense ............................................... 2, 34, 35, 221
concepts....................................... 25, 29, 41, 54, 55, 72, 131
Concepts ............................................................................ 25
confirmation bias .........................................66, 67, 118, 122
Confirmation bias........................................................ 66, 67
construct validity............................................. 171, 172, 205
236
Construct validity ............................................................ 172
contaminated mindware ................... 118, 119, 121, 122, 141
control group ................................... 164, 165, 171, 199, 206
Control Group ................................................................. 163
control variable ....................................................... 172, 173
correction factor .............................................................. 175
correlation .... 13, 15, 37, 38, 55, 71, 78, 125, 126, 127, 129,
145, 151, 166, 167, 183, 192, 193
Correlation ............................................ 14, 37, 55, 191, 194
correlational studies .......................................................... 37
Correlational studies ......................................................... 38
Correlational Studies ........................................................... 2
counterbalance ................................................................ 205
covariation..................................................... 14, 38, 56, 167
Covariation ................................................................ 56, 193
critical region .................................................................. 180
critical thinking .. 12, 83, 123, 124, 125, 126, 131, 132, 133,
214, 222
Critical thinking .......................................... 3, 131, 133, 134
Critical Thinking ............................................................. 218
cross-sectional designs .................................................... 201
crystallized intelligence .................................................. 127
Crystallized intelligence.................................................. 127
cynic ................................................................................ 8, 9
Cynic ................................................................................... 8
decontextualized reasoning ............................................. 131
dependent variable ............ 23, 166, 174, 182, 194, 202, 209
descriptive statistics ........................................................ 176
Descriptive statistics ................................................. 40, 176
Double Blind ..................................................................... 58
dysrationalia .....................................................117, 146, 217
Dysrationalia ............................................... 3, 128, 146, 217
effect size ........................ 176, 177, 178, 182, 192, 195, 196
Effect size................................................................ 176, 182
237
epidemiology................................................................... 202
epistemic rationality ...................33, 114, 133, 142, 148, 152
Epistemic rationality ............................... 128, 133, 135, 148
ex post facto design ......................................................... 201
experts ................................................. 12, 18, 43, 45, 50, 69
Experts .................................................................... 2, 43, 46
extreme scores ......................................................... 198, 208
Extreme scores ................................................................ 208
factorial designs ...................................................... 185, 209
Fluid intelligence ............................................................ 127
frequency distribution ............................................. 162, 207
hypotheses ............11, 28, 29, 37, 78, 79, 122, 141, 191, 204
Hypotheses ...................................................... 28, 29, 54, 55
hypothesis ...11, 28, 29, 37, 71, 75, 177, 180, 181, 182, 186,
190, 191, 206, 228
Hypothesis..........................................................................11
inferential statistics ................................................. 176, 190
Inferential statistics ................................... 40, 176, 182, 190
instrumental rationality ................................... 143, 147, 148
Instrumental rationality ........................... 127, 133, 135, 143
instruments .......................................................... 25, 26, 151
Instruments ........................................................................ 25
intelligence 4, 22, 55, 67, 114, 116, 117, 120, 121, 122, 123,
124, 125, 126, 127, 128, 129, 137, 138, 140, 145, 146, 149,
150, 151, 183, 220, 222
Intelligence .. 2, 114, 115, 117, 123, 128, 129, 136, 137, 139,
140, 150, 222
internal validity ...... 13, 15, 38, 56, 164, 165, 166, 168, 172,
173, 193, 194, 199, 200
Internal validity ................................................. 38, 166, 193
Internal Validity .............................................................. 164
interval .................................................... 163, 196, 197, 198
Interval ............................................................................ 198
IQ ............................... 67, 117, 118, 127, 128, 136, 220, 222
238
IRB board ........................................................................ 171
logical fallacies ........................................................... 34, 70
Longitudinal designs ....................................................... 201
mean ....2, 9, 13, 14, 17, 24, 28, 37, 64, 75, 78, 85, 116, 142,
165, 174, 177, 179, 180, 181, 188, 189, 191, 198, 205, 207,
208, 231, 232
Mean ............................................................................... 232
measurement ....................... 26, 85, 170, 176, 196, 197, 202
Measurement ..................................................................... 26
median ..................................................................... 198, 231
meta-analysis........................................................... 195, 196
Meta-analysis .................................................................. 195
mode.................................................. 21, 198, 199, 207, 231
myside bias.................59, 116, 123, 124, 125, 126, 127, 149
Myside bias ............................................. 124, 125, 126, 127
nominal ................................................... 176, 196, 197, 198
Nominal........................................................................... 198
non-science ....................................................................... 30
nonsense detection kit ......................................................... 3
Nonsense Detection Kit ............................................ 2, 3, 65
nonsense indicator ............................................................. 72
Nonsense indicator .................. 65, 66, 67, 68, 69, 70, 71, 72
null hypothesis .......................... 28, 180, 186, 190, 191, 228
Null hypothesis ............................................................... 181
observation ................................................ 20, 21, 50, 91, 92
Observation ........................................................... 20, 22, 91
operational definition ............................ 25, 29, 66, 169, 170
ordinal ..................................................... 176, 196, 197, 198
Ordinal ............................................................................ 198
parameter................................................................. 174, 175
Parameter ........................................................................ 174
participant effects .................................................... 165, 200
Participant effects ............................................................ 200
peer review .................................................................. 59, 60
239
Peer review................................................................ 58, 217
Peer Review ................................................................ 58, 59
personal beliefs ................................................................. 68
Person-who statistics ......................................................... 42
Person-Who Statistics ....................................................... 41
pseudoscientific............................................................... 141
Pseudoscientific .............................................................. 122
quasi-independent variable ..................................... 168, 169
random assignment .... 23, 24, 163, 164, 169, 171, 172, 193,
199, 207
Random assignment ................................ 164, 171, 172, 193
random sampling ..................................................... 193, 199
randomized controlled trial ..................................... 206, 207
ratio ................................................................. 196, 197, 198
Ratio ................................................................................ 198
rationality ......3, 4, 5, 33, 114, 116, 120, 123, 126, 127, 128,
129, 130, 132, 133, 135, 137, 139, 140, 141, 142, 143, 145,
146, 147, 148, 150, 151, 152, 153, 219, 221
Rationality .......2, 3, 114, 126, 128, 133, 135, 137, 139, 140,
142, 143, 147, 222
reactivity ......................................................................... 192
Relativist Fallacy .............................................................. 18
reporting .................................................................... 24, 197
Reporting........................................................................... 24
research reports ............................................. 5, 87, 188, 220
Research Reports .................................................. 2, 87, 215
RQ test ............................................................................ 150
RQ Test ....................................................................2, 4, 115
SAT ................................................................. 125, 126, 127
scientific methods ....................................................... 33, 83
Scientific methods ......................................................... 7, 33
Single Blind ...................................................................... 58
skeptic ................................................................... 8, 18, 221
Skeptic............................................................................. 2, 8
240
SS ............................................................................ 174, 175
standard deviation ........................... 176, 179, 188, 189, 232
Standard deviation .......................................................... 179
standard error .................................................................. 179
Standard error.................................................................. 179
statistical prediction .............. 44, 45, 46, 47, 48, 49, 50, 219
Statistical Prediction ....................................................... 217
statistical validity .................................................... 176, 192
survey research........................................................ 205, 206
systematic empiricism ................................................. 20, 91
Systematic Empiricism ..................................................... 20
testimonials ................................................................. 13, 72
Testimonials ........................................................................ 2
theory .................... 30, 31, 63, 68, 74, 75, 85, 120, 127, 152
Theory ................................................................................11
true independent variable ........................................ 168, 169
Tukey’s HSD ................................................................... 184
two-tailed hypothesis ...................................................... 181
type 1 error ...................................................................... 228
Type 1 error ..................................................................... 190
type 2 error ...................................................................... 228
Type 2 error ..................................................................... 190
within-participants .......................................... 185, 200, 205
Within-participants.......................................................... 185
z-score ..................................................................... 189, 232
241
About the Author
Jamie Hale, M.S. (Experimental Psychology), is a

university instructor, author, science writer and fitness &
nutrition consultant. He is a researcher specializing in
cognitive behavioral nutrition and cognitive science. He
has conducted primary research in the areas of attention,
memory, scientific cognition, scientific literacy, and
various topics related to eating. He has contributed to
numerous publications (nationally and internationally). He
has authored six books. Jamie is a board member for the
Kentucky Council Against Health Fraud, and a member of
the Kentucky Academy of Science. He is creator of a
column called the Wide World of Science-
https://skepticalinquirer.org/category/the_wide_world_of_
science/- at the popular website Skeptical Inquirer. He is
director and founder of the websites
www.knowledgesummit.net and www.maxcondition.com.
242
Polemics Applications, LLC
Kevin Akers, President
polemicsapp@yahoo.com
info@polemicsapps.com
View our catalogue of business, army, music education,

and fitness apps at www.polemicsapps.com
Email us for Custom App Price Quotes!
 Custom Apps for iPhone / iPad / Android

 Like us on facebook at:
www.facebook.com/polemicsapps

JAMIE HALE - in Evidence We Trust - 2nd Edition

Uploaded by

Copyright:

Available Formats

JAMIE HALE - in Evidence We Trust - 2nd Edition

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

JAMIE HALE - in Evidence We Trust - 2nd Edition

Uploaded by

Copyright:

Available Formats

Although the author has made every effort to ensure the

accuracy and completeness of information contained in

Cover design: Kevin Akers www.polemicsapps.com

Copyright  2019, by Jamie Hale. All rights reserved.

Published by MaxCondition Publishing.

Chapter 1: The Need for Science and Statistics …..........7

The Skeptic …......................................................8

Chapter 2: The Need for Rationality ….........................114

Developing The RQ Test …...............................115

Chapter 3 FAQ: Research Methods and Statistics …..161

About the Author ….....................................................242

I would like to thank Kevin Akers for his editorial work,

Thanks to everyone who participated in the interviews

We are bombarded with information on a daily basis. It is

For the purposes perpetuated in the current book evidence

The content in chapter one includes short-articles (old, new

Chapter three features frequently asked questions about

The book ends with an appendices section. Practice

This edition consists of a lot of the same material that was

My hopes are after reading this book you appreciate the

Science is the great reality detector. A scientific worldview

The Need for Science and Statistics

In this chapter the focus will be on science and statistics.

“Jamie why are you so skeptical?” “Why do you have such

Some people believe that skepticism is the rejection of new

Science has not investigated every topic. Many claims are

What is a cynic? Cynics are distrustful of any advice or

“W Noble: Do you have any concerns about some people

Skepticism and Science

Skepticism is a key part of science. Science is vast; it

Sean Carroll has written an excellent article-Beyond

Inductive reasoning involves going from specifics to

The following is an excerpt from Why People Believe Weird

Hypothesis: A testable statement accounting for a set of

Theory: A well-supported and well-tested hypothesis or set

Fact: A conclusion confirmed to such an extent that it

When using scientific methods one of the primary goals is

Thinking Gone Wrong

Why do so many people believe everything they read or

Anecdotes are not science

Anecdotes are personal testimonies that support a claim.

With anecdotal evidence it is impossible to determine what

I supplemented with BCAAs (branch chained amino acids)

Scientific Jargon does not make a science

Disguising a diet plan, equipment ad, supplement

Bold Statements and Bold testimonials

Most of the time if it sounds too good to be true it is. The

“They say…” or “I have always heard…” and so on.

Even though failures in science are not always reported in

Correlation and causation

Just because two events follow each other in sequence does

Using emotive words is one of the key marketing tactics

Ad Hominem means “argument against the man”. This

Our culture relies heavily on the advice of authorities,

Another thing that is important to recognize is that many

Also known as appeal to common practice or false

This fallacy occurs when people believe something is better

Shifting the burden of proof

The burden of proof is on the person making the claim.