0% found this document useful (0 votes)

285 views

Reliability and Validity

1. Psychological tests aim to measure individual differences accurately, but are subject to systematic and random errors of measurement that reduce reliability. 2. A fundamental characteristic of psychological tests is that each scale should measure only one psychological trait. However, the example scoring shows that a test with four items measures both anxiety and sociability. 3. Both random and systematic errors can affect psychological test scores. Random errors vary randomly across people, while systematic errors consistently affect all people in the same way. Sources of error include item selection, administration, and scoring.

Uploaded by

R-wah Larounette

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

285 views

Reliability and Validity

Uploaded by

R-wah Larounette

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 29

CH 4

The Reliability and Validity of

Psychological tests
Concept of systematic and random errors of measurement
are very important when we assessing individual differences
, and leads to the important aspect of psychometrics known
as reliability theory.
One fundamental and entirely uncontroversial characteristics
of psychological tests is that each scale should assess one ( and
only one) psychological characteristic.
Item1: I often feel anxious Yes (2) - Uncertain ( 1) No (0)

Item 2: A good, load party is the best way to celebrate . Yes (2) - Uncertain ( 1) No (0)

Item 3: I have been to see my doctor because of nerves Yes (2) - Uncertain ( 1) No (0)

Item 4: I hate being on my own. Yes (2) - Uncertain ( 1) No (0)

Cannot hope to draw any conclusion
Measures two distinct concepts.
Scoring : 2 ,0 ,2 ,0 Anxious and unsociable
Scoring : 0 ,2,0 ,2 Non-anxious and sociable
Scoring : 1,1 ,1 ,1 Moderately anxious and moderately sociable
Ensure that all of the items in a particular scale measure one trait
There is always some error of measurement associated
with measures of size, mass or volume ( Digital scale in the
kitchen when we weight flour , the surveyors tape
100 time measurements average
Well known and few variable can effect the accuracy of
physical measurement
Random error is caused by any factors that randomly affect
measurement of the variable across the sample. For instance, each
person's mood can inflate or deflate their performance on any
occasion. In a particular testing, some children may be feeling in a
good mood and others may be depressed. If mood affects their
performance on the measure, it may artificially inflate the
observed scores for some children and artificially deflate them for
others. The important thing about random error is that it does not
have any consistent effects across the entire sample. Instead, it
pushes observed scores up or down randomly. This means that if
we could see all of the random errors in a distribution they would
have to sum to 0 -- there would be as many negative errors as
positive ones. The important property of random error is that it
adds variability to the data but does not affect average
performance for the group. Because of this, random error is
sometimes considered noise.

Systematic error : That they hold across most or all of the
members of a group. unlike the random errors sources of
systematic errors will not tend to cancel out when repeated
measurements are made under the same physical conditions

Systematic error is caused by any factors that systematically
affect measurement of the variable across the sample. For
instance, if there is loud traffic going by just outside of a
classroom where students are taking a test, this noise is likely
to affect all of the children's scores -- in this case, systematically
lowering them. Unlike random error, systematic errors tend to
be consistently either positive or negative -- because of this,
systematic error is sometimes considered to be bias in
measurement.

Sources of measurement error
Item selection
Test administration
Test scoring
Reducing Measurement Error
1) pilot test the instruments, getting feedback from the
respondents regarding how easy or hard the measure
was and information about how the testing environment
affected their performance.
2) train who applied the instrument so that they aren't
unconsciously introducing error.
3) double-check the data thoroughly. All data entry for
computer analysis should be verified.
4) use statistical procedures to adjust for measurement
error.
Finally, use multiple measures of the same construct.
Especially if the different measures don't share the same
systematic errors.
Good measurement instruments are those that are little
influenced by both random error & of systematic error.
Taking multiple measurements under any set of physical
conditions and averaging the results reduces the impact of
random errors.
Averaging measurements from different instruments will tend
to reduce the effects of systematic error.
Measurement error and reliability
Measurement error reduces reliability or repeatability
Assumptions from classical theory
Measurement errors are random
Mean error of measurement = 0
True scores and errors are uncorrelated: r
te
=0
Errors on different tests are uncorrelated:r
12
= 0
Reliability types
Stability
Internal consistency
T
e
s
t

-
r
e
t
e
s
t

S
p
l
i
t
-
h
a
v
e
s

P
a
r
a
l
l
e
l

f
o
r
m
s

The reliability of mental tests
Alpha Cronbach alpha
The square root of coefficient alpha is a very close
approximation to the correlation between individual s
scores on particular mental test and their true score
Standard error of measurement SEM = SD x SQ Root of 1- alpha
The average size of the correlation between the test items.
Other approaches to the measurement of reliability
Split-half reliability : the correlation of the total score based on the odd
numbered test items with the total score based on the even numbered
items and then corrected for a whole test using Spearman- Brown
formula. With the existence of the computer there seems to be no good
reason to use it today..
Test retest : we can call it stability. It checks whether trait scores
stay more or less constant over time.
Conditions (it required ):
Nothing significant has happened to the participants in the interval
between the two tests( e.g. no emotional crises, developmental
changes, or significant educational experiences that might affect
the trait. The test is a good measure of the trait.
If a test shows that a child is a genius one month and of average
intelligence the next,
The time between the first application and the second is very
important: leave a time that appropriate to minimize the
likelihood of people remembering their previous answers, and
the developmental changes, learning or other life events affect
individuals' positions on the trait.
The problem of test retest as compared with alpha : test-retest is
based on the total score- it says nothing about how people
perform on individual items. Whereas alpha shows whether a set
of items measures some single, underlying trait, a set of items
that had noting in common could still have perfect test-retest
reliability.
Parallel-forms reliability: In order to create two parallel forms of a
test, items are administered to a large sample of people, and pairs
of items with similar content , difficulty and item discrimination
are identified , in away that the two versions produce similar
distributions of scores.
Type of
Reliability
How to Measure
Test-Retest
Give the same assessment twice, separated by days,
weeks, or months. Reliability is stated as the
correlation between scores at Time 1 and Time
2.(Reliability between 0 to 1)
Problems: Practice, memory
Alternate
Forms
Create two forms of the same test (vary the items
slightly). Reliability is stated as correlation between
scores of Form A and Form B.
Problem: equality of the forms
Split-half
reliability
Cronbach
alpha

Split the test items into two groups (even, odd) give the
test to the group, correlate the total of the odd items
with the total of the even items.
( Reliability= 2r/1+r) Problem: equality of the halves,
reliability is not for the whole
Prof.Khalaf Nassar
Validity

Face

Construct
Predictive

Convergent
Divergent
Theory
Relationship
No relationship
Criterion
Concurrent
Predictive
Criterion
Prof.Khalaf Nassar
Content
validity
Logical

Test validity: According to reliability theory we can show whither or
not a set of test items seem to measure some underlying trait.
What it cannot do is shed any light on the nature of that trait. If we
construct a scale or test and we think that this set of items measure
a particular trait , there is no guarantee that they actually do so.
Even if a set of items appears to form a scale ,it is not possible to
tell what that scale measures just by looking at the items or just
because we claimed that.
Reliability is necessary for a test to be valid, since low reliability implies that
the test is not measuring any single trait. However, high reliability itself does
not guarantee validity , science this depends entirely on how, why, and with
whom the test is used.
Face Validity : It checks that the test looks as if it
measures what it is supposed to . Inspecting the
content of items is no guarantee that the test will
measure what it is intended to. Having high alpha does
not mean that the scale measures the concept that it
was designed to assess. ( Using judged , 80% )
Content validity : Sometimes it is possible to construct
a test which must be valid, by definition. For example :
constructing a spelling test , since, by definition, the
dictionary contains the whole domain of items, any
procedure that produces a representative sample of
words from this dictionary has to be a valid test of
sampling ability. ( Content analysis )
Face validity
Face validity is the simplest form.
Essentially just a subjective judgment about
whether the measure or test items appear to
be measuring what we want them to measure.

Prof.Khalaf Nassar
Logical validity
Ask judges to categorize the items according to
the test dimensions
Criterion validity
Whether the test gives results in agreement
with other measures of the same thing.
Two types:
concurrent: comparison of new test with
established test.
predictive: does the test predict some future event
(e.g. Intelligence test and exam results)?
Obviously concurrent validity is dependent on
the quality of the first test!
Prof.Khalaf Nassar
Predictive validity is the extent to which a score on a
scale or test predicts scores on some criterion measure.
For example, the validity of a cognitive test for job
performance is the correlation between test scores and,
for example, supervisor performance ratings. Such a
cognitive test would have predictive validity if the
observed correlation were statistically significant.

we assess the operationalization's ability to predict something
it should theoretically be able to predict. For instance, we
might theorize that a measure of math ability should be able to
predict how well a person will do in an engineering-based
profession. We could give our measure to experienced
engineers and see if there is a high correlation between scores
on the measure and their salaries as engineers. A high
correlation would provide evidence for predictive validity -- it
would show that our measure can correctly predict something
that we theoretically think it should be able to predict.
Construct validity
a test has construct validity if it
accurately measures a theoretical, non-
observable construct or trait
Does the measure tap the concept being
studied (face validity and predictive validity are
really types of construct validity)?
No simple way of establishing construct validity
but it is clearly very important.
Often we assess the relationships between
items in the test to see if they all appear to be
measuring the same thing.
Prof.Khalaf Nassar
Construct validity :
divergent validity is the exact opposite of criterion validity.
Suppose you are measuring a construct believed to have no
relationship to something else. If there is no relationship and if
your measurement has good construct validity, you would expect
scores on your measurement to be absolutely unrelated to scores
on a measure for the divergent construct.

Convergent validity: Is to check that the test scores relate to
other things as expected. measures of constructs that
theoretically should be related to each other are, in fact,
observed to be related to each other ,that is, you should be able
to show a correspondence or convergence between similar
constructs
Convergent : A test has convergent validity if it has a high
correlation with another test that measures the same
construct
Divergent :
divergent validity is demonstrated through a low
correlation with a test that measures a different construct
(two test that measure different traits)
The extent to which the new measure correlates poorly
with measures of different, unrelated constructs.
Prof.Khalaf Nassar
Type of
Validity
Definition Example/Non-Example
content
The extent to which the
content of the test matches
the instructional
objectives.
A semester or quarter exam that only
includes content covered during the
last six weeks is not a valid measure
of the course's overall objectives --
it has very low content validity.
Criterion
The extent to which scores
on the test are in
agreement with
(concurrent validity) or
predict (predictive
validity) an external
criterion.
If the end-of-year math tests in 4th
grade correlate highly with the
statewide math tests, they would
have high concurrent validity.
Construct
The extent to which an
assessment corresponds to
other variables, as
predicted by some
rationale or theory.
If you can correctly hypothesize that
students with high iman (Islamic
faith) have low depression (because
of theory), the assessment may have
construct validity.

Unisim Tutorial - Methanol Production
No ratings yet
Unisim Tutorial - Methanol Production
77 pages
BBPP1103-Principles of Management
No ratings yet
BBPP1103-Principles of Management
10 pages
Validity and Reliability
No ratings yet
Validity and Reliability
6 pages
Two Way Mixed ANOVA
No ratings yet
Two Way Mixed ANOVA
18 pages
Chapter 13, 14, 15
No ratings yet
Chapter 13, 14, 15
14 pages
06 - Independent Sample T Test
No ratings yet
06 - Independent Sample T Test
11 pages
seleniumIQ Guru99 PDF
No ratings yet
seleniumIQ Guru99 PDF
20 pages
Nikon D7200
No ratings yet
Nikon D7200
11 pages
Experimental Design
No ratings yet
Experimental Design
3 pages
Chapter 5 CORRELATIONAL AND QUASI-EXPERIMENTAL
No ratings yet
Chapter 5 CORRELATIONAL AND QUASI-EXPERIMENTAL
15 pages
Al-Saadi - Demystifying Ontology and Epistemology in Research
No ratings yet
Al-Saadi - Demystifying Ontology and Epistemology in Research
11 pages
Validity and Reliability
100% (2)
Validity and Reliability
20 pages
Chapter 4 Psych Assessment
No ratings yet
Chapter 4 Psych Assessment
5 pages
Chapter 4 Psych Assessment
No ratings yet
Chapter 4 Psych Assessment
8 pages
Lesson 6 Formulating The Hypothesis
No ratings yet
Lesson 6 Formulating The Hypothesis
40 pages
Research Design
No ratings yet
Research Design
14 pages
PRELIM CHAPTER 1 Psychological Testing and Assessment
No ratings yet
PRELIM CHAPTER 1 Psychological Testing and Assessment
80 pages
Validity and Reliability
No ratings yet
Validity and Reliability
31 pages
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
No ratings yet
PSYCHOLOGICAL ASSESSMENT CHAPTER 3-6 (Summary) 1
11 pages
601 4 Research Reliability & Validity
No ratings yet
601 4 Research Reliability & Validity
13 pages
N503 Non-Experimental WK 5
100% (1)
N503 Non-Experimental WK 5
23 pages
Reliability and Validity
100% (2)
Reliability and Validity
14 pages
Hypothesis Testing
No ratings yet
Hypothesis Testing
5 pages
Experimental and Quasi Experimental and Ex Post Facto Research Design
No ratings yet
Experimental and Quasi Experimental and Ex Post Facto Research Design
33 pages
Characteristics of A Good Test: Content Validity
No ratings yet
Characteristics of A Good Test: Content Validity
23 pages
4 Alternatives To Experimentation Surveys and Interviews PDF
No ratings yet
4 Alternatives To Experimentation Surveys and Interviews PDF
46 pages
Lesson 7 The Basics of Experimentation
100% (1)
Lesson 7 The Basics of Experimentation
48 pages
Reliability, Validity & Norms
No ratings yet
Reliability, Validity & Norms
25 pages
Rotter's Incomplete Sentence Blank Test
No ratings yet
Rotter's Incomplete Sentence Blank Test
9 pages
Analysis of Co-Variance (ANCOVA) and Multivariate Analysis of Co-Variance (Mancova)
100% (1)
Analysis of Co-Variance (ANCOVA) and Multivariate Analysis of Co-Variance (Mancova)
37 pages
Concept of Reliability and Validity
100% (1)
Concept of Reliability and Validity
6 pages
Characteristics of Good Assessment
No ratings yet
Characteristics of Good Assessment
18 pages
Likert Scale
No ratings yet
Likert Scale
6 pages
Types of Reliability and How To Measure Them
No ratings yet
Types of Reliability and How To Measure Them
18 pages
Collection Books by Cronbach's Alpha PDF
100% (1)
Collection Books by Cronbach's Alpha PDF
4 pages
Types of Statistical Tests
No ratings yet
Types of Statistical Tests
4 pages
MODULE 2 - Descriptive Statistics
No ratings yet
MODULE 2 - Descriptive Statistics
8 pages
CORRELATION and REGRESSION
100% (1)
CORRELATION and REGRESSION
19 pages
Quiz - Levels of Measurement For The Variable
No ratings yet
Quiz - Levels of Measurement For The Variable
71 pages
Ethics in Psychological Research-1
100% (2)
Ethics in Psychological Research-1
43 pages
Philosophical Foundations of Research: Epistemology: Positivism
No ratings yet
Philosophical Foundations of Research: Epistemology: Positivism
2 pages
One Way Anova
100% (2)
One Way Anova
36 pages
Validity & Reliability
No ratings yet
Validity & Reliability
27 pages
Test Construction
No ratings yet
Test Construction
40 pages
ExperimentalPsychology 7e Ch14 PowerPoint
No ratings yet
ExperimentalPsychology 7e Ch14 PowerPoint
29 pages
Personality Tests: Projective Test
No ratings yet
Personality Tests: Projective Test
3 pages
Question 1
No ratings yet
Question 1
18 pages
Validity and Reliability
No ratings yet
Validity and Reliability
2 pages
Compe App Rationalization
No ratings yet
Compe App Rationalization
162 pages
Experimental Psychology and The Scientific Method: Instructor: Mr. Omar T. Bualan
No ratings yet
Experimental Psychology and The Scientific Method: Instructor: Mr. Omar T. Bualan
22 pages
Multiple Regression - D. Boduszek - HUD PDF
No ratings yet
Multiple Regression - D. Boduszek - HUD PDF
37 pages
Statistics: Organize Understand
No ratings yet
Statistics: Organize Understand
9 pages
2.5 Within-Subjects Design Small N
No ratings yet
2.5 Within-Subjects Design Small N
12 pages
Content Analysis
No ratings yet
Content Analysis
3 pages
09 - Data Analysis - Descriptive Statistics
No ratings yet
09 - Data Analysis - Descriptive Statistics
23 pages
Correlation Coefficient
100% (1)
Correlation Coefficient
16 pages
Psychological Statistics PART 1 REVIEWER
No ratings yet
Psychological Statistics PART 1 REVIEWER
3 pages
Impact of Mental Health Literacy On Attitude Toward Seeking Counseling Among Senior High School Students of Dr. Carlos S. Lanting College
No ratings yet
Impact of Mental Health Literacy On Attitude Toward Seeking Counseling Among Senior High School Students of Dr. Carlos S. Lanting College
9 pages
Chapter 3 A Statistics Refresher
No ratings yet
Chapter 3 A Statistics Refresher
3 pages
Expe Quiz 1
No ratings yet
Expe Quiz 1
3 pages
1 Introduction To Psychological Statistics
No ratings yet
1 Introduction To Psychological Statistics
83 pages
Essentials of A Good Psychological Test
No ratings yet
Essentials of A Good Psychological Test
6 pages
Psychometric Properties
No ratings yet
Psychometric Properties
3 pages
9 Reliability
No ratings yet
9 Reliability
10 pages
Results and Discussions
No ratings yet
Results and Discussions
1 page
MATLAB Materials
No ratings yet
MATLAB Materials
37 pages
Exp. 4 Adsorption
No ratings yet
Exp. 4 Adsorption
5 pages
Fluid Flow Measurement
No ratings yet
Fluid Flow Measurement
15 pages
Experiment 2: Tensile Test Prelab
No ratings yet
Experiment 2: Tensile Test Prelab
15 pages
Guideline For Report Writing and Error Calculation
No ratings yet
Guideline For Report Writing and Error Calculation
12 pages
Experiment 1 Pre-Lab of Engineering Material
No ratings yet
Experiment 1 Pre-Lab of Engineering Material
29 pages
What Is Reflection and Reflective Writing
No ratings yet
What Is Reflection and Reflective Writing
3 pages
Smallville Season 2 Episode 17
No ratings yet
Smallville Season 2 Episode 17
5 pages
Jmi Ba LLB Books 85
No ratings yet
Jmi Ba LLB Books 85
2 pages
390W MBB Half-Cell Black Module: JAM60S21 365-390/MR/1000V
No ratings yet
390W MBB Half-Cell Black Module: JAM60S21 365-390/MR/1000V
2 pages
Assignment 1 - Artificial Intelligence
No ratings yet
Assignment 1 - Artificial Intelligence
8 pages
Master the UPSC Prelims 2025 in 100 Days: A Step-by-Step Roadmap
No ratings yet
Master the UPSC Prelims 2025 in 100 Days: A Step-by-Step Roadmap
6 pages
Hot Runner KEBA2580 220V
No ratings yet
Hot Runner KEBA2580 220V
19 pages
Reduced-Order State Observer Design
No ratings yet
Reduced-Order State Observer Design
145 pages
Maximo Authentication
No ratings yet
Maximo Authentication
4 pages
Arabic or Decimal Numbers To Roman Numbers Table
No ratings yet
Arabic or Decimal Numbers To Roman Numbers Table
10 pages
Sport Education Publications
No ratings yet
Sport Education Publications
9 pages
Indices Project
No ratings yet
Indices Project
3 pages
Effective Core Potential - by Dolg
No ratings yet
Effective Core Potential - by Dolg
35 pages
Carlin Magnetochemistry PDF
No ratings yet
Carlin Magnetochemistry PDF
335 pages
Lesson Plan Chemical Bonding
100% (2)
Lesson Plan Chemical Bonding
11 pages
Online College Magazine System
67% (3)
Online College Magazine System
24 pages
16 - Administrative Manual
No ratings yet
16 - Administrative Manual
9 pages
Smart Forms in SAP ABAP
100% (1)
Smart Forms in SAP ABAP
15 pages
Parramatta Light Rail
No ratings yet
Parramatta Light Rail
38 pages
Interview With Swami Dayananda Saraswati
No ratings yet
Interview With Swami Dayananda Saraswati
4 pages
Que.1 Write A Program To Print A Hut.: Solution
No ratings yet
Que.1 Write A Program To Print A Hut.: Solution
16 pages
Understanding General Systems TheoryThis Theory Was Developed by
No ratings yet
Understanding General Systems TheoryThis Theory Was Developed by
8 pages
r12 Insert Delete Pricelist Line
No ratings yet
r12 Insert Delete Pricelist Line
6 pages
Components of A Proposal - Monash
No ratings yet
Components of A Proposal - Monash
3 pages
First Appeal To SRA
100% (1)
First Appeal To SRA
3 pages
Python Programs - Part 1 - Curve Stitching
No ratings yet
Python Programs - Part 1 - Curve Stitching
30 pages
Catalogue en
No ratings yet
Catalogue en
79 pages
DHCP Project Report (Department)
75% (4)
DHCP Project Report (Department)
16 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Reliability and Validity

Uploaded by

Reliability and Validity

Uploaded by

CH 4

The Reliability and Validity of

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.