Psych Assessment

PSYCHOLOGICAL TESTING: Principles, Applications, & Issues 8th Edition NOTES & REVIEW PURPOSES
|Kaplan & Sacuzzo | PLUUUUS ULTRAAAAAAAAAAAAAA

PART I: PRINCIPLES: C1 Introduction Psychological testing: refers to all the possible uses,
applications, and underlying concepts of psychological and
BRIEF CONTENTS Basic Concepts educational tests. The main use of these tests, though, is to
PART I: PRINCIPLES Test: is a measurement device or technique used to quantify evaluate individual differences or variations among
behavior or aid in the understanding and prediction of individuals.
1. Introduction 1 behavior.
2. Norms and Basic Statistics for Testing 3 Reliability: refers to the accuracy, dependability,
3. Correlation and Regression 4 Item: a specific stimulus to which a person responds overtly. consistency, or repeatability of test results.
4. Reliability 5 Validity: refers to the meaning and usefulness of test results
5. Validity 7 Psychological Test: (educational test) is a set of items that Test administration: the act of giving a test
6. Writing and Evaluating Test Items 8 are designed to measure characteristics of human beings
that pertain to behavior. Overt behavior is an individual’s Interview: is a method of gathering information through
7. Test Administration 10
observable activity. Covert behavior takes place within an verbal interaction, such as direct questions.
PART II: APPLICATIONS individual and cannot be directly observed.
Historical Perspective
8. Interviewing Techniques 11 Scales: relate raw scores on test items to some defined
China
9. Theories of Intelligence and the Binet Scales 12 theoretical or empirical distribution.
10. The Wechsler Intelligence Scales: WAIS-IV, 14  Had a relatively sophisticated civil service testing
WISC-IV, and WPPSI-III 25511 Types of Tests
program more than 4000 years ago (DuBois,1970, 1972).
11. Other Individual Tests of Ability in Education 15 Individual tests: examiner or test administrator (the person  Han Dynasty (206 B.C.E. to 220 C.E.), the use of test
and Special Education giving the test) gives the test to only one person at a time batteries was quite common. These early tests related to
12. Standardized Tests in Education, Civil Service, 17 Group tests: administered to more than one person at a such diverse topics as civil law, military affairs,
and the Military time by a single examiner, agriculture, revenue, and geography.
13. Applications in Clinical and Counseling Settings 18  Ming Dynasty (1368–1644 C.E.). A national multistage
14. Projective Personality Tests 20 Ability tests: contain items that can be scored in terms of
testing program involved local and regional testing
15. Computers and Basic Psychological Science 21 speed, accuracy, or both. centers equipped with special testing booths.
in Testing
16. Testing in Counseling Psychology 21  Achievement tests: refers to tests measuring previous Charles Darwin
17. Testing in Health Psychology and Health Care 22 learning.
18. Testing in Industrial and Business Settings 22  Aptitude tests: refers to a tests measuring the potential  The Origin of Species, in 1859.
for learning or acquiring a specific skill.  Higher forms of life evolved partially because of
PART III ISSUES (Summaries only)  Intelligence tests: refers to a test measuring a person’s differences among individual forms of life within a
general potential to solve problems, adapt to changing species; the best most adaptive characteristics survive at
19. Test Bias 23
circumstances, think abstractly, and profit from the expense of those who less fit and that the survivors
20. Testing and the Law 23
experience. pass their characteristics on to the next generation.
21. Ethics and the Future of Psychological Testing 23  life has evolved to its currently complex and intelligent levels.
Personality tests: are tests related to the overt and covert
*Part III are mostly U.S based ek ek.
dispositions of the individual. Sir Francis Galton
 Structured personality tests: provide a statement,  That some people possessed characteristics that made
usually of the “self-report” variety, and require the subject them more fit than others; Hereditary Genius, 1869.
to choose between two or more alternative responses.  that individual differences exist in human sensory and
 Projective personality tests: Provides an ambiguous motor functioning, such as reaction time, visual acuity,
test stimulus; response requirements are unclear and physical strength.
1
James McKeen Cattell World War I The Minnesota Multiphasic Personality Inventory
(MMPI): A structured personality test that made no
 coined the term mental test  Military Recruitment assumptions about the meaning of a test response. Such
 based on Galton’s work on individual differences in  Large scale group testing meaning was to be determined by empirical research.
reaction time.  Robert Yerkes: headed a committee of distinguished
 perpetuated and stimulated the forces that ultimately psychologists who soon developed two structured group Factor analysis: a method of finding the minimum number
led to the development of modern tests. tests of human abilities: the Army Alpha (reading ability) of dimensions (characteristics, attributes), called factors, to
and the Army Beta (measured the intelligence of illiterate account for a large number of variables.
J. E. Herbart: mathematical models of themind adults).
J. R Guilford: made the first serious attempt to use factor
E. H. Weber: attempted to demonstrate the existence of a Achievement Tests analytic techniques in the development of a structured
psychological threshold, the minimum stimulus necessary to personality test.
activate a sensory system.  In contrast to essay tests, standardized achievement
tests provide multiple-choice questions that are The California Psychological Inventory (CPI): A
G. T. Fechner: devised the law that the strength of a structured personality test developed according to the same
standardized on a large sample to produce norms against
sensation grows as the logarithm of the stimulus intensity which the results of new examinees can be compared. principles as the MMPI.
Wilhelm Wundt: set up a laboratory at the University of  the relative ease of administration and scoring and the The Sixteen Personality Factor Questionnaire (16PF): A
Leipzig in 1879 and is credited with founding the science of lack of subjectivity or favoritism that can occur in essay or structured personality test based on the statistical procedure
psychology other written tests. of factor analysis; R. B. Cattell
 1923, Stanford Achievement Test by T. L. Kelley, G. M.
Wundt was succeeded by E. B. Titchner, whose student, G. Ruch, and L. M. Terman
Whipple, recruited L. L. Thurstone.  1930s, it was widely held that the objectivity and reliability
During the 1980s, 1990s, and 2000s several major branches
Whipple: provided the basis for immense changes in the of these standardized tests made them superior to essay
of applied psychology emerged and flourished:
field of testing by conducting a seminar at the Carnegie tests.
neuropsychology, health psychology, forensic psychology,
Institute in 1919 Personality Tests and child psychology. Because each of these important
areas of psychology makes extensive use of psychological
Thus, psychological testing developed from at least two lines Traits: relatively enduring dispositions (tendencies to act, tests, psychological testing again grew in status and use.
of inquiry: one based on the work of Darwin, Galton, and think, or feel in a certain manner in any given circumstance)
Cattell on the measurement of individual differences, and the that distinguish one individual from another.
other (more theoretically relevant and probably stronger)
based on the work of the German psychophysicists Herbart, Woodworth Personal Data Sheet: An early structured
Weber, Fechner, and Wundt. Experimental psychology personality test that assumed that a test response can be
developed from the second line of inquiry. taken at face value; was developed during World War I and
was published in final form just after the war
The Evolution of Intelligence and Standardized
Achievement Tests The Rorschach Inkblot Test: A highly controversial
projective test that provided an ambiguous stimulus (an
Binet-Simon Scale
inkblot) and asked the subject what it might be.
 1905 contained 30 items of increasing difficulty and was The Thematic Apperception Test (TAT): A projective test
designed to identify intellectually subnormal individuals that provided ambiguous pictures and asked subjects to
 1908 Binet-Simon Scale also determined a child’s mental make up a story; by Henry Murray and Christina Morgan
age in 1935.
 1916, L. M. Terman of Stanford University revision
2
PART I: PRINCIPLES: PROPERTY Percentile rank
TYPE OF Magnitude Equal Absolute 0
C2: Norms and Basic Statistics for Testing SCALE Interval  “What percent of the scores fall below a particular score
Nominal No No No (Xi)?”
Why we need Statistics?
Ordinal Yes No No  To calculate a percentile rank, you need only follow
1. Statistics are used for purposes of description. Numbers Interval Yes Yes No these simple steps: (1) determine how many cases fall
provide convenient summaries and allow us to evaluate Ratio Yes Yes Yes below the score of interest, (2) determine how many
some observations relative to others. cases are in the group, (3) divide the number of cases
2. We can use statistics to make inferences, which are logical below the score of interest (Step 1) by the total number
deductions about events that cannot be observed directly. Types of Scales of cases in the group (Step 2), and (4) multiply the result
of Step 3 by 100. The formula is ( Pr =B/ N × 100 )
Descriptive statistics: methods used to provide a concise Nominal scales:
description of a collection of quantitative information. Percentiles
 are really not scales at all; their only purpose is to name
Inferential statistics: methods used to make inferences
objects  specific scores or points within a distribution.
from observations of a small group of people (sample) to a
 used when the information is qualitative rather than  divide the total frequency for a set of observations into
larger group of individuals (population).
quantitative. hundredths.
MEASUREMENT: assigning numbers to objects.  Social science researchers commonly label groups in  indicate the particular score, below which a defined
sample surveys with numbers (such as 1 = African percentage of scores falls.
Properties of Scale American, 2 = white, and 3 = Mexican American)
Mean
1. Magnitude Ordinal scale:
 The property of “moreness.”  arithmetic average score in a distribution
 Particular instance of the attribute represents more,  with the property of magnitude but not equal intervals or  total the scores and divide the sum by the number of
less, or equal amounts of the given quantity than does an absolute 0 cases
another instance. Ex: Height : taller, shorter  rank individuals or objects but not to say anything about  sigma (S) means summation
2. Equal Intervals the meaning of the differences between the ranks.
 Height; IQ; Variance: averaged squared deviation around the mean
 The difference between two points at any place on the
scale has the same meaning as the difference between Standard deviation
Interval scale:
two other points that differ by the same number of scale
units. Ex: Ruler: Inches  has the properties of magnitude and equal intervals but  approximate of the average deviation around the mean.
 A psychological test rarely has the property of equal not absolute 0  the square root of the average squared deviation
intervals. (Ex: IQ levels & their meanings per level)  the measurement of temperature in degrees Fahrenheit. around the mean.
 The relationship between the measured units and some
Ratio scale Z score
outcome can be described by a straight line or a linear
equation in the form Y = a + bX.
 has all three properties (magnitude, equal intervals, and  is the difference between a score and the mean,
3. Absolute Zero divided by the standard deviation
an absolute 0)
 Obtained when nothing of the property being measured  transforms data into standardized units that are easier
 speed of travel, 0 miles per hour (mph)
exists. Ex: dead heart rate to interpret.
 For many psychological qualities, it is extremely Frequency distribution displays scores on a variable or a
difficult, if not impossible, to define an absolute 0 point. measure to reflect how frequently each value was obtained; Normal distribution**
Ex: Measuring and defining “0” shyness from a scale of defines all the possible scores and determines how many  known as a symmetrical binomial probability
0 to 10 people obtained each of those scores. distribution.
3
McCall’s T Tracking: tendency to stay at about the same level relative Residual: The difference between the observed and
to one’s peers predicted score (Y − Y 0)
 the mean is 50 rather than 0 and the standard deviation
is 10 rather than 1 Norm-referenced test: compares each person with a norm The best-fitting line: keeps residuals to a minimum;
 T = 10Z + 50 minimizes the deviation between observed and predicted Y
Criterion-referenced test: describes the specific types of scores
Quartiles: points that divide the frequency distribution into skills, tasks, or knowledge that the test taker can
equal fourths. demonstrate such as mathematical skills Principle of least squares: the best-fitting line is obtained
by keeping these squared residuals as small as possible.
Deciles: similar to quartiles except that they use points that
mark 10% rather than 25% intervals. Pearson product moment correlation coefficient: a ratio
PART I: PRINCIPLES: used to determine the degree of variation in one variable that
Stanine system: his system converts any set of scores into can be estimated from knowledge about variation in the
C3: Correlation and Regression
a transformed scale, which ranges from 1 to 9. The scale is other variable. The correlation coefficient can take on any
standardized to have a mean of 5 and a standard deviation Correlation coefficient: mathematical index that describes value from −1.0 to 1.0.
of approximately 2 the direction and magnitude of a relationship. Three
hypothetical relationships: degrees of freedom: (df ) defined as the sample sizeminus
Norms two, or N − 2.
 positive correlation: X and Y have high scores
 refer to the performances by defined groups on Regression plots: are pictures that show the relationship
 negative correlation: X & Y have high & low scores
particular tests. between variables
 no correlation.
 The norms for a test are based on the distribution of
scores obtained by some defined sample of individuals. Spearman’s rho: a method of correlation for finding the
Regression: used to make predictions about scores on one
 The mean is a norm, and the 50th percentile is a norm. association between two sets of ranks. The rho coefficient
variable from knowledge of scores on another variable.
Norms are used to give information about performance (r) is easy to calculate and is often used when the individuals
relative to what has been observed in a standardization Regression line: defined as the best-fitting straight line in a sample can be ranked on two variables but their actual
sample. through a set of points in a scatter diagram. It is found by scores are not known or have a normal distribution.
using the principle of least squares, which minimizes the
Dichotomous variables: have only two levels. Examples
squared deviation around the regression line.
are yes–no, correct–incorrect, and male–female.
Regression coefficient: (b) is the slope of the regression
True dichotomous variables: because they naturally form
line. The regression coefficient can be expressed as the ratio
two categories. Ex: gender
of the sum of squares for the covariance to the sum of
squares for X. Artificially dichotomous variables: because they reflect
an underlying continuous scale forced into a dichotomy. Ex:
Sum of squares: defined as the sum of the squared
Pass or fail in a test
deviations around the mean
Biserial correlation: expresses the relationship between a
Covariance: used to express how much two measures
continuous variable and an artificial dichotomous variable.
covary, or vary together.
Point biserial correlation: To find the association between
Slope: describes how much change is expected in Y each
a dichotomous (two-choice) variable and a continuous
time X increases by one unit.
variable.
Intercept: (a) the value of Y when X is 0; the point at which
phi coefficient: When both variables are dichotomous and
the regression line crosses the Y axis.
at least one of the dichotomies is “true”
4
tetrachoric correlation: If both dichotomous variables are Discriminant analysis: multivariate method; to find the  assumes that the true score for an individual will not
artificial linear combination of variables that provides a maximum change with repeated applications of the same test.
discrimination between categories  Because of random error, however, repeated
Variable X
Variable Y Continuous Artificial True applications of the same test can produce different
Factor analysis: used to study the interrelationships among
Dichotomous Dichotomous scores. Theoretically, the standard deviation of the
Continuous Pearson r Biserial r Point Biserial r a set of variables without reference to a criterion.
distribution of errors for each person tells us about the
Artificial Biserial r Tetrachoric Phi
Dichotomous magnitude of measurement error.
True Point biserial Phi Phi  Standard error of measurement: Because we usually
Dichotomous r PART I: PRINCIPLES: assume that the distribution of random errors will be the
C4: Reliability same for all people, classical test theory uses the
Standard error of estimate standard deviation of errors as the basic measure of
error.
 the standard deviation of the residual  Classical test theory requires that exactly the same test
 The standard error of estimate is a measure of the Error: implies that there will always be some inaccuracy in items be administered to each person.
accuracy of prediction. Prediction is most accurate when our measurements.
the standard error of estimate is relatively small. As it The Domain Sampling Model
Reliable: Tests that are relatively free of measurement error
becomes larger, the prediction becomes less accurate.  another central concept in classical test theory.
Abraham De Moivre: introduced the basic notion of
Coefficient of determination: the correlation coefficient  considers the problems created by using a limited
sampling error, 1733
squared; this value tells us the proportion of the total number of items (sample) to represent a larger and more
variation in scores on Y that we know as a function of Karl Pearson: developed the product moment correlation, complicated construct. (Population)
information about X. 1896  conceptualizes reliability as the ratio of the variance of
the observed score on the shorter test and the variance
Coefficient of alienation is a measure of nonassociation Charles Spearman: worked out most of the basics of of the long-run true score
between two variables. contemporary reliability theory and published his work in a  the greater the number of items, the higher the reliability
1904 article entitled “The Proof and Measurement of  To estimate reliability, we can create many randomly
Shrinkage: is the amount of decrease observed when a Association between Two Things.”
regression equation is created for one population and then parallel tests by drawing repeated random samples of
applied to another. Classical Test Score Theory items from the same domain.
 Ex: Spelling Test – Items- Dictionary
Cross validation: use the regression equation to predict  Classical test score theory assumes that each person has
performance in a group of subjects other than the ones to a true score that would be obtained if there were no errors Item Response Theory (IRT)
which the equation was applied. Then a standard error of in measurement. However, because measuring  the computer is used to focus on the range of item
estimate can be obtained for the relationship between the instruments are imperfect, the score observed for each difficulty that helps assess an individual’s ability level. For
values predicted by the equation and the values actually person almost always differs from the person’s true ability example, if the person gets several easy items correct,
observed. or characteristic. The difference between the true score the computer might quickly move to more difficult items.
and the observed score results from measurement error. If the person gets several difficult items wrong, the
Third variable: external influence X (Observed Score) = T (True Score) + E (Error) computer moves back to the area of item difficulty where
Multivariate analysis: considers the relationship among  Or we can say that the difference between the score we the person gets some items right and some wrong.
combinations of three or more variables. obtain and the score we are really interested in equals the  overall result is that a more reliable estimate of ability is
error of measurement. X – T = E obtained using a shorter test with fewer items.
Multiple regression: type of multivariate analysis; to find  A major assumption in classical test theory is that errors of  method requires a bank of items that have been
the linear combination of the (three) variables that provides measurement are random.
systematically evaluated for level of difficulty
the best prediction of (law school success).
5
Reliability Models  Covariance occurs when the items are correlated with  Kappa indicates the actual agreement as a proportion of
each other. the potential agreement following correction for chance
reliability coefficient: is the ratio of the variance of the true
 Formula 21, or KR21, a special case of the reliability agreement.
scores on a test to the variance of the observed scores
formula that does not require the calculation of the p’s and  Values of kappa may vary between 1 (perfect agreement)
Time Sampling: Test–retest reliability q’s for every item. and −1 (less agreement than can be expected on the basis
of chance alone).
 estimates are used to evaluate the error associated with Cronbach’s Coefficient Alpha
administering a test at two different times. Values Interpretation
 most general method of finding estimates of reliability Greater than 0.75 Excellent Agreement
 This type of analysis is of value only when we measure
through internal consistency 0.40 — 0.75 Fair to Good
“traits” or characteristics that do not change over time.
 no right or wrong answers in a test (Satisfactory) Agreement
 carryover effect: occurs when the first testing session
 Factor analysis is one popular method for dealing with the Leser than 0.40 Poor Agreement
influences scores from the second session.
situation in which a test apparently measures several
Item Sampling: Parallel Forms Method (Equivalent) different characteristics
Sources of Errors
 compares two equivalent forms of a test that measure the Difference score
 Time Sampling: The same test given at different points in
same attribute. The two forms use different items;
 is created by subtracting one test score from another. This time may produce different scores, even if given to the
however, the rules used to select items of a particular
might be the difference between performances at two same test takers.
difficulty level are the same.
points in time  Item sampling: The same construct or attribute may be
 given on the same day, the only sources of variation are
 it is most convenient to find difference scores by first assessed using a wide pool of items.
random error and the difference between the forms of the
creating Z scores for each measure and then finding the  Internal consistency: refers to the intercorrelations
test; at different times, error associated with time
difference between them (score 2 − score 1). among items within the same test
sampling is also included in the estimate of reliability.
 Different observers record the same behavior: Even
Reliability in Behavioral Observation Studies
Split-Half Method though they have the same instructions, different judges
 simple that they have no psychometric problems, but they observing the same event may record different numbers.
 a test is given and divided into halves that are scored
have many sources of error. Because psychologists
separately. The results of one half of the test are then Sources of Example Method How
cannot always monitor behavior continuously, they often Error assessed
compared with the results of the other.
take samples of behavior at certain time intervals. Under Same test Correlation
 If the test is long, the best method is to divide the items these circumstances, sampling error must be considered
Time Test–retest
between
sampling given at two
randomly into two halves; can cause problems when
 frequently unreliable because of discrepancies between points scores
items on the second half of the test are more difficult than obtained on the
true scores and the scores recorded by the observer. in time
two occasions
items on the first half.
 reliability estimates have various names — interrrater, Item Different Alternate Correlation
 odd-even system: whereby one subscore is obtained for items used to between
interscorer, interobserver, or interjudge reliability; consider sampling forms or
the odd-numbered items in the test and another for the assess the equivalent
the consistency among different judges who are parallel forms of the
even-numbered items. same forms
evaluating the same behavior. test that have
 Spearman-Brown formula: allows you to estimate what attribute different items
Consistency 1. Split-half Corrected
the correlation between the two halves would have been Kappa statistic Internal correlation
if each half had been the length of the whole test consistency of items between two halves
 best method for assessing the level of agreement among within the of the test
same test 2. KR20

Kuder-Richardson 20 (KR20) several observers; 3. Alpha
 a measure of agreement between two judges who each Different
 The formula for calculating the reliability of a test in which Observer Kappa
rate a set of objects using nominal scales. J. Cohen (1960) differences observers statistic
the items are dichotomous, scored 0 or 1 (right or wrong) recording
6
Standard Errors Tests are most reliable if they are unidimensional. This Content validity evidence
means that one factor should account for considerably more
Remember that psychologists working with unreliable tests of the variance than any other factor. Items that do not load  determine whether a test has been constructed
are like carpenters working with rubber yardsticks that on this factor might be best omitted. adequately; logical rather than statistical.
stretch or contract and misrepresent the true length of a  Construct underrepresentation describes the failure to
board. However, as all rubber yardsticks are not equally Discriminability analysis capture important components of a construct.
inaccurate, all psychological tests are not equally inaccurate.  Construct-irrelevant variance occurs when scores are
The standard error of measurement allows us to estimate  Item analysis which examines the correlation between
influenced by factors irrelevant to the construct. (Anxiety)
the degree to which a test provides inaccurate readings; that each item and the total score for the test.
is, it tells us how much “rubber” there is in a measurement.  low correlation indicates that the item drags down the Criterion validity evidence
The larger the standard error of measurement, the less estimate of reliability and should be excluded.
 how well a test corresponds with a particular criterion.
certain we can be about the accuracy with which an attribute
Correction for attenuation  Such evidence is provided by high correlations between a
is measured. Conversely, a small standard error of
test and a well-defined criterion measure.
measurement tells us that an individual score is probably  estimate what the correlation between two measures
 Criterion: the standard against which the test is compared
close to the measured value. would have been if they had not been measured with error.
 The reason for gathering criterion validity evidence is that
 These methods “correct” for the attenuation in the
How Reliable Is Reliable? the test or measure is to serve as a “stand-in” for the
correlations caused by the measurement error.
measure we are really interested in.
 The answer depends on the use of the test.  To use the methods, one needs to know only the
 Reliability estimates in the range of .70 and .80 are good reliabilities of two tests and the correlation between them. Predictive validity evidence: type or form of criterion
enough for most purposes in basic research. validity evidence that has the forecasting function of tests.
 Reliabilities greater than .95 are not very useful because
Concurrent-related evidence: assessments of the
they suggest that all of the items are testing essentially the PART I: PRINCIPLES:
simultaneous relationship between the test and the criterion;
same thing & that the measure could easily be shortened.
C5: Validity applies when the test and the criterion can be measured at
 a test of skill at using the multiplication tables for one-digit
the same time. (Job Samples)
numbers would be expected to have an especially high Validity
reliability. Tests of complex constructs, such as creativity, Validity Coefficient.
might be expected to be less reliable  the agreement between a test score or measure and the
 In clinical settings, high reliability is extremely important. quality it is believed to measure.  The relationship between a test and a criterion; correlation
When tests are used to make important decisions about  defined as the answer to the question, “Does the test  This coefficient tells the extent to which the test is valid for
someone’s future, evaluators must be certain to minimize measure what it is supposed to measure?” making statements about the criterion.
any error in classification; evaluators should attempt to find  Validity is the evidence for inferences made about a test  one rarely sees a validity coefficient larger than .60;
a test with a reliability greater than .95. score. There are three types of evidence: (1) construct- ranges of .30 to .40 are commonly considered high.
 Standard error of measurement: The wider the interval, the related, (2) criterion-related, and (3) content-related.  A coefficient is statistically significant if the chances of
lower the reliability of the score. Using the standard error (Standards) obtaining its value by chance alone are quite small: usually
of measurement, we can say that we are 95% confident less than 5 in 100.
that a person’s true score falls between two values. Face validity
Evaluating Validity Coefficients
What to Do About Low Reliability  the mere appearance that a measure has validity. We
often say a test has face validity if the items seem to be  Look for Changes in the Cause of Relationships. The
 to increase the length of the test (consider fatigue) reasonably related to the perceived purpose of the test. logic of criterion validation presumes that the causes of the
 throw out items that run down the reliability  “looks like” it is valid. These appearances can help relationship between the test and the criterion will still exist
 estimate what the true correlation would have been if the motivate test takers because they can see that the test is when the test is in use.
test did not have measurement error. relevant.
7
 What Does the Criterion Mean? Criterion-related validity Convergent Evidence  Sometimes we cannot demonstrate that a reliable test has
studies mean nothing at all unless the criterion is valid and meaning. In other words, we can have reliability without
reliable  When a measure correlates well with other tests believed validity. However, it is logically impossible to
 Review the Subject Population in the Validity Study. to measure the same construct is obtained demonstrate that an unreliable test is valid.
Another reason to be cautious of validity coefficients is that  measures of the same construct converge, or narrow in,
on the same thing. A test that is reliable can be invalid.
the validity study might have been done on a population that
does not represent the group to which inferences will be  Convergent evidence is obtained in one of two ways. In
A test that is unreliable is invalid.
made. the first, we show that a test measures the same things as
 Be Sure the Sample Size Was Adequate. Another other tests used for the same purpose. In the second, we
problem to look for is a validity coefficient that is based on demonstrate specific relationships that we can expect if
the test is really doing its job. PART I: PRINCIPLES:
a small number of cases. Sometimes a proper validity study
cannot be done because there are too few people to study. Discriminant Evidence (Divergent validation) C6: Writing and Evaluating Test Items
 Never Confuse the Criterion with the Predictor
 Check for Restricted Range on Both Predictor and  demonstration of uniqueness
Criterion. A variable has a “restricted range” if all scores  To demonstrate discriminant evidence for validity, a test Item Writing : DeVellis (2012)
for that variable fall very close together. should have low correlations with measures of unrelated
 Review Evidence for Validity Generalization. Criterion- constructs, or evidence for what the test does not 1. Define clearly what you want to measure.
related validity evidence obtained in one situation may not measure. 2. Generate an item pool.
be generalized to other similar situations. Generalizability  indicates that the measure does not represent a construct 3. Avoid exceptionally long items.
refers to the evidence that the findings obtained in one other than the one for which it was devised. 4. Keep the level of reading difficulty appropriate for those
situation can be generalized—that is, applied to other  convergent and discriminant studies actually correlate the who will complete the scale.
situations. This is an issue of empirical study rather than tests with many different criteria. “All validation is one, and 5. Avoid “double-barreled” items that convey two or more
judgment. in a sense all is construct validation” Lee Cronbach 1980 ideas at the same time.
 Consider Differential Prediction. Predictive relationships 6. Consider mixing positively and negatively worded items.
Criterion-Referenced Tests
may not be the same for all demographic groups.
Item Formats
Construct: defined as something built by mental synthesis.  The procedures for establishing the validity of a criterion-
referenced test resemble those for studying the validity of The Dichotomous Format
Construct validity evidence any other test.
 offers two alternatives for each item. Usually a point is
 criterion-referenced tests have items that are designed to
 established through a series of activities in which a match certain specific instructional objectives.
given for the selection of one of the alternatives. ( true-
researcher simultaneously defines some construct and false); ease of construction and ease of scoring,
 Validity studies for the criterion-referenced tests would
develops the instrumentation to measure it.  to be reliable, a true-false test must include many items.
compare scores on the test to scores on other measures
 This process is required when “no criterion or universe of that are believed to be related to the test.
Overall, dichotomous items tend to be less reliable, and
content is accepted as entirely adequate to define the therefore less precise than some of the other item formats.
 The idea of comparing an individual with himself or herself
quality to be measured”
rather than to the norms of a group remains appealing The Polytomous Format (polychotomous)
 Construct validation involves assembling evidence about
what a test means. This is done by showing the Relationship Between Reliability and Validity  resembles the dichotomous format except that each item
relationship between a test and other tests and has more than two alternatives. Typically, a point is given
measures. . Each time a relationship is demonstrated,  Attempting to define the validity of a test will be futile
for the selection of one of the alternatives, and no point is
one additional bit of meaning can be attached to the test. if the test is not reliable.
given for selecting any other choice. (Multiple-choice tests)
Over a series of studies, the meaning of the test gradually  Theoretically, a test should not correlate more highly with  distractors: Incorrect choice; Well-chosen distractors are
begins to take shape. any other variable than it correlates with itself.
an essential ingredient of good items.
8
 guessing threshold describes the chances that a low- Item Analysis Item characteristic curve
ability test taker will obtain each score.
 A general term for a set of methods used to evaluate test  On these individual item graphs, the total test score is
 Essay exams can be evaluated using the same principles
items, is one of the most important aspects of test plotted on the horizontal (X) axis and the proportion of
used for structured tests.
construction. examinees who get the item correct is plotted on the
The Likert Format  The basic methods involve assessment of item difficulty vertical (Y) axis.
and item discriminability  The total test score is used as an estimate of the amount
 requires that a respondent indicate the degree of of a “trait” possessed by individuals. Because we can
agreement with a particular attitudinal question. Item difficulty: defined by the number of people who get a never measure traits directly, the total test score is the best
 consists of items such as “I am afraid of heights.” Instead particular item correct. approximation we have.
of a yes-no reply, alternatives are offered: strongly  Thus, the relationship between performance on the item
 For example, if 84% of the people taking a particular test
disagree, disagree, neutral, agree, & strongly agree. (5) and performance on the test gives some information about
get item 24 correct, then the difficulty level for that item is
 allowing the respondent to be neutral; responses might be how well the item is tapping the information we want.
.84. Some people have suggested that these proportions
strongly disagree, moderately disagree, mildly disagree,  In summary, item analysis breaks the general rule that
do not really indicate item “difficulty” but item “easiness.”
mildly agree, moderately agree, & strongly agree. (6) increasing the number of items makes a test more reliable.
The higher the proportion of people who get the item
The Category Format correct, the easier the item When bad items are eliminated, the effects of chance
 The optimal difficulty level for items is usually about responding can be eliminated and the test can become
 A technique that is similar to the Likert format but that uses halfway between 100% of the respondents getting the more efficient, reliable, and valid.*
an even greater number of choices. item correct and the level of success expected by chance Item Response Theory (IRT)
 On a scale from 1 to 10; need not have exactly 10 points; alone.
it can have either more or fewer categories.  items should have a variety of difficulty levels because a  make extensive use of item analysis
 However, experiments have shown that responses to good test discriminates at many levels.  Each item on a test has its own item characteristic curve
items on 10-point scales are affected by the groupings of  items in the difficulty range of .30 to .70 tend to maximize that describes the probability of getting each particular
the people or things being rated; m can be avoided if the information about the differences among individuals. item right or wrong given the ability level of each test taker.
endpoints of the scale are clearly defined and the subjects With the computer, items can be sampled, and the specific
are frequently reminded of the definitions of the endpoints. Item discriminability: determines whether the people who range of items where the test taker begins to have difficulty
 increasing the number of choices beyond nine can reduce have done well on particular items have also done well on can be identified
reliability (element of randomness; many alternatives, the whole test.  IRT is now widely used in many areas of applied research,
unclear discrimination between fine-grained choices) and there are specialized applications for specific
 The Extreme Group Method: This method compares
 Visual analogue scale. Using this method, the problems such as the measurement of self-efficacy,
people who have done well with those who have done
respondent is given a 100-millimeter line and asked to psychopathology, industrial psychology, and health.
poorly on a test.
place a mark between two well-defined endpoints
 discrimination index: The difference between these External Criteria
Checklists and Q-Sorts proportions (people in each group who got each item
correct)  Item analysis has been persistently plagued by
Adjective checklist: a subject receives a long list of  The Point Biserial Method: Another way to examine the researchers’ continued dependence on internal criteria,
adjectives and indicates whether each one is characteristic discriminability of items is to find the correlation between or total test score, for evaluating items. You can use
of himself or herself. performance on the item and performance on the total similar procedures to compare performance on an item
Q-sort: can be used to describe oneself or to provide ratings test; correlation between a dichotomous (two category) with performance on an external criterion
of others; a subject is given statements and asked to sort variable and a continuous variable.  The advantages of using external rather than internal
them into nine piles  This correlation is evaluated the same way as the criteria against which to validate items were outlined by
extreme group discriminability index. The closer the value Guttman (1950) more than 60 years ago. Nevertheless,
of the index is to 1.0, the better the item. external criteria are rarely used in practice
9
Items for Criterion-Referenced Tests The Race of the Tester. Because of concern about bias, the terminal or personal computer and the automatic recording
effects of the tester’s race have generated considerable of test responses. The computer offers many advantages in
 compares performance with some clearly defined criterion attention. Some groups feel that their children should not be test administration, scoring, and interpretation, including
for learning. This approach is popular in individualized tested by anyone except a member of their own race. It is ease of application of complicated psychometric issues and
instruction programs; criterion-referenced test would be important to emphasize that, despite some suggestions that the integration of testing and cognitive psychology (& bias).
used to determine whether objective had been achieved. racial bias in IQ testing can be managed, serious racial bias
 involves clearly specifying the objectives by writing clear persists in our culture. Mode of Administration
and precise statements about what the learning program
Language of Test Taker. The amount of linguistic demand A variety of studies have considered the difference between
is attempting to achieve.
self-administered measures and those that are administered
 **The bottom of the V Polygon is antimode, or the least can put non-English speakers at a disadvantage. Even for
tests that do not require verbal responses, it is important to by a tester or a trained interviewer. For studies of psychiatric
frequent score. This point divides those who have been
consider the extent to which test instructions assume that disability, the mode of asking the questions makes a
exposed to the unit from those who have not been
the test taker understands English. The standards difference. In educational testing, it is less clear that the
exposed and is usually taken as the cutting score or point,
emphasize that some tests are inappropriate for people mode of test administration has a strong impact.
or what marks the point of decision. When people get
scores higher than the antimode, we assume that they whose knowledge of the language is questionable. The Subject Variables. A final variable that may be a serious
have met the objective of the test. When they get lower standard is that, for test takers who are proficient in two or source of error is the state of the subject. Motivation and
scores, we assume they have not. more languages, the test should be given in the language anxiety can greatly affect test scores. Test Anxiety; Illness.
that the test takers feel is their best.
Limitations of Item Analysis Behavioral Assessment Methodology
Training of Test Administrators. Different assessment
The main problem is this: though statistical methods for item procedures require different levels of training. Measurement goes beyond the application of psychological
analysis tell the test constructor which items do a good job tests. Many assessment procedures involve the observation
of separating students, they do not help the students learn. Expectancy Effects. Often called Rosenthal effects; data of behavior.
Young children do not care as much about how many items sometimes can be affected by what an experimenter expects
they missed as they do about what they are doing wrong. to find. Rosenthal argued that the expectancy effect results Reactivity. Reliability and accuracy are highest when
Many times children make specific errors and will continue from subtle nonverbal communication between the someone is checking on the observers.
to make them until they discover why they are making them. experimenter and the subject. The experimenter may not
Drift. When trained in behavioral observation methods,
even be aware of his or her role in the process.
observers receive extensive feedback and coaching. After
Effects of Reinforcing Responses. Because they leave the training sessions, though, observers have a
PART I: PRINCIPLES: reinforcement affects behavior, testers should always tendency to drift away from the strict rules they followed in
administer tests under controlled conditions; reward can training and to adopt idiosyncratic definitions of behavior
C7: Test Administration
significantly affect test performance. The effects of random
Expectancies. The impact of expectancy is subtle. It
The Examiner and the Subject feedback are rather severe, causing depression, low
probably has some minor biasing effect on behavioral data.
motivation for responding, and inability to solve problems.
The Relationship Between Examiner and Test Taker. This condition is known as learned helplessness. The finding that expectancy bias occurs significantly in some
Both the behavior of the examiner and his or her relationship studies but not others is consistent with the notion that
Inexperienced test administrators often do not fully
to the test taker can affect test scores; examiners should be expectancy produces a minor but potentially damaging
appreciate the importance of standardization in
aware that their rapport with test takers can influence the administration. Whether they give a test or supervise others effect. To avoid this sort of bias, observers should not know
results. They should also keep in mind that rapport might be who do, they must consider that the test may not remain what behavior to expect
influenced by subtle processes such as the level of reliable or valid if they deviate from the specified instructions. Deception. Most people feel confident that they can
performance expected by the examiner.
accurately judge other people. Systematic studies show that
Computer-Assisted Test Administration. Interactive
testing involves the presentation of test items on a computer
10
most people do a remarkably poor job in detecting a liar. The Principles of Effective Interviewing interviewer remains quiet and listens. He or she should use
detection of lying and honesty has become a major industry. minimum effort to maintain the flow, such as using a
The Proper Attitudes. Good interviewing is actually more a transitional phrase such as “Yes,” “And,” or “I see.” To make
Statistical Control of Rating Errors. Attempts to increase matter of attitude than skill. Experiments in social such a response, the interviewer may use any of the
rater reliability through extended training have been psychology have shown that interpersonal influence (the following types of statements: verbatim playback,
particularly frustrating for many researchers and applied degree to which one person can influence another) is related paraphrasing, restatement, summarizing, clarifying, and
psychologists because training is expensive and time- to interpersonal attraction (the degree to which people share understanding. Even more powerful is the empathy or
consuming. The halo effect is the tendency to ascribe a feeling of understanding, mutual respect, similarity, and understanding response. One good way to accomplish this
positive attributes independently of the observed behavior. the like). Attitudes related to good interviewing skills include involves what we call understanding statements. To
Some psychologists have argued that this effect can be warmth, genuineness, acceptance, understanding, establish a positive atmosphere, interviewers begin with an
controlled through partial correlation in which the correlation openness, honesty, and fairness. To appear effective and open-ended question followed by understanding statements
between two variables is found while variability in a third establish rapport, the interviewer must display the proper that capture the meaning and feeling of the interviewee’s
variable is controlled. attitudes. communication.
Responses to Avoid. As a rule, making interviewees feel Measuring Understanding. Attempts to measure
uncomfortable tends to place them on guard, and guarded understanding or empathy originated with Carl Rogers’s
PART II: APPLICATIONS: or anxious interviewees tend to reveal little information about seminal research into the effects of client-centered therapy
C8: Interviewing Techniques themselves. If the goal is to elicit as much information as
possible or to receive a good rating from the interviewee,  Level-one responses bear little or no relationship to the
then interviewers should avoid certain responses, including interviewee’s response.
judgmental or evaluative statements, probing statements,  Level-two response communicates a superficial
The Interview as a Test
hostility, and false reassurance. awareness of the meaning of a statement. The individual
Similarities Between an Interview and a Test who makes a level-two response never quite goes beyond
Judgmental or evaluative statements
his or her own limited perspective.
 Method for gathering data Probing statements  Level-three response is interchangeable with the
AVOID
 Used to make predictions Hostile responses interviewee’s statement; the minimum level of responding
 Evaluated in terms of reliability False reassurance that can help the interviewee. Paraphrasing, verbatim
 Evaluated in terms of validity playback, clarification statements, and restatements are
 Group or individual Effective Responses. One major principle of effective all examples of level-three responses.
 Structured or unstructured interviewing is keeping the interaction flowing. The interview  Level-four and level-five responses not only provide
is a two-way process; one person speaks first, then the accurate empathy but also go beyond the statement given.
Like all tests, an interview has a defined purpose. other, and so on. Except in structured interviews or for a In a level-four response, the interviewer adds “noticeably”
Furthermore, just as the person who gives a test must take particular purpose, one can effectively initiate the interview to the interviewee’s response; the interviewer adds
responsibility for the test-administration process, so the process by using an open-ended question. A closed-ended “significantly” to it.
interviewer must assume responsibility for the conduct of the question brings the interview to a dead halt, thus violating
interview. the principle of keeping the interaction flowing. The open- Active listening: is the foundation of good interviewing skills
ended question requires the interviewee to produce for many different types of interviews; the power of the
Reciprocal Nature of Interviewing something spontaneously; the closed-ended question to understanding response
Although there are many types and purposes of interviews, recall something.
Types of Interviews
all share certain factors. First, all interviews involve mutual Responses to Keep the Interaction Flowing. After asking
interaction whereby the participants are interdependent— the open-ended question, the interviewer as a rule lets the  Evaluation Interview: A confrontation is a statement that
that is, they influence each other. Interview participants also points out a discrepancy or inconsistency. Direct questions
interviewee respond without interruption; that is, the
affect each other’s mood (social facilitation).
11
can be used toward the end of the interview to fill in any  cross-ethnic, cross-cultural, and cross-class interviewing  H. Gardner: defined intelligence as the ability “to resolve
needed details or gaps in the interviewer’s knowledge.  the highly structured interview continues to be the most genuine problems or difficulties as they are encountered”
 Structured Clinical Interviews: provide a specific set of effective means of eliminating, or at least reducing bias  Sternberg: defined intelligence as “mental activities
questions presented in a particular order; specified set of  The safest approach is to consider interview data as involved in purposive adaptation to, shaping of, and
rules for probing so that, as in a standardized test, all tentative: a hypothesis or a set of hypotheses to be selection of real-world environments relevant to one’s life”
interviewees are handled in the same manner; offer confirmed by other sources of data. Results from  Anderson: intelligence is two-dimensional and based on
reliability but sacrifice flexibility standardized tests are meaningless if not placed in the individual differences in information-processing speed and
 Case History Interview: a biographical sketch; a context of case history or other interview data. The two executive functioning influenced by inhibitory processes.
chronology of major events in the person’s life, a work go together, each complementing the other, each  More recent views depict intelligence as a blend of abilities
history, a medical history, and a family history; a essential in the process of evaluating human beings. including personality and various aspects of memory.
developmental approach, examining an individual’s entire  T. R. Taylor: identified three independent research
life, beginning with infancy or the point at which the given Interview Reliability
traditions that have been employed to study the nature of
type of history is first relevant.  inter-interviewer agreement (agreement between two or human intelligence: the psychometric (examines the
 Mental Status Examination: used primarily to diagnose more interviewers). elemental structure of a test), the information-processing
psychosis, brain damage, and other major mental health (we examine the processes that underlie how we learn and
 unstructured interviews have low levels of reliability
problems. Its purpose is to evaluate a person suspected solve problems), and the cognitive approaches (n focuses
 in terms of adverse impact, interviews give fairer
of having neurological or emotional problems in terms of on how humans adapt to real world demands)
outcomes than many other widely used selection tools,
variables known to be related to these problems; include
 because interview procedures vary considerably in their Formal intelligence testing began with a decision of a French
the person’s appearance, attitudes, and general behavior.
degree of standardization in terms of interview minister of public instruction around the turn of the 20th
Developing Interviewing Skills development, administration, and/or scoring. Simply, century. In 1904, the French minister officially appointed a
different interviewers look for different things, an commission, to which he gave a definite assignment: to
 The first step is to become familiar with research and argument echoed by others recommend a procedure for identifying so-called subnormal
theory on the interview in order to understand the (intellectually limited) children. Alfred Binet, had
principles and underlying variables in the interview
demonstrated his qualifications for the job by his earlier
 A second step in learning such skills is supervised research on human abilities
PART II: APPLICATIONS:
practice. Experience truly is the best teacher.
 As a third step, one must make a conscious effort to apply C9: Theories of Intelligence and the Binet Scales Binet’s Principles of Test Construction
the principles involved in good interviewing, such as
Defining Intelligence Binet defined intelligence as the capacity (1) to find and
guidelines for keeping the interaction flowing.
maintain a definite direction or purpose, (2) to make
Sources of Error in the Interview  Alfred Binet: defined intelligence as “the tendency to take necessary adaptations—that is, strategy adjustments—to
and maintain a definite direction; the capacity to make achieve that purpose, and (3) to engage in self-criticism so
adaptations for the purpose of attaining a desired end, and that necessary adjustments in strategy can be made.
Interview Validity the power of autocriticism”
 Spearman: defined intelligence as the ability to educe Principle 1: Age Differentiation
 Many sources of interview error come from the extreme
either relations or correlates.
difficulty we have in making accurate, logical  Age differentiation refers to the simple fact that one can
 Freeman: intelligence is “adjustment or adaptation of the
observations and judgments. differentiate older children from younger children by the
individual to his total environment,” “the ability to learn,”
 halo effects occur when the interviewer forms a favorable former’s greater capabilities.
and “the ability to carry on abstract thinking”
or unfavorable early impression.  a function increases in age; one could determine the
 Das: defined intelligence as “the ability to plan and
 general standoutishness. One prominent characteristic equivalent age capabilities of a child independent of his
structure one’s behavior with an end in view”
can bias the interviewer’s judgments and prevent an or her chronological age. This equivalent age capability
objective evaluation. was eventually called mental age
12
Principle 2: General Mental Ability The Early Binet Scales The Intelligence Quotient (IQ) (Stern)
 general mental ability; the total product of the various  used a subject’s mental age in conjunction with his or her
separate and distinct elements of intelligence Binet and T. Simon, collaborated to develop the first version chronological age to obtain a ratio score. This ratio score
 could restrict the search for tasks to anything related to of what would eventually be called the Stanford Binet presumably reflected the subject’s rate of mental
the total or the final product of intelligence.; could judge Intelligence Scale. development; IQ = MA/CA × 100.
the value of any particular task in terms of its correlation
The 1905 Binet-Simon Scale The 1937 Scale
with the combined result (total score) of all other tasks.
Spearman’s Model of General Mental Ability  an individual intelligence test consisting of 30 items  age range down to the 2-year-old level; maximum
presented in an increasing order of difficulty. possible mental age to 22 years, 10 months.
 intelligence consists of one general factor (g) plus a large  Idiot described the most severe form of intellectual  Standardization sample came from 11 U.S. states
number of specific factors impairment, imbecile moderate levels of impairment, and representing a variety of regions.
 general mental ability; psychometric g (or simply g) moron the mildest level of impairment.  Inclusion of an alternate equivalent; Forms L and M were
 positive manifold; phenomenon that when a set of  lacked an adequate measuring unit to express results; it designed to be equivalent in terms of difficulty & content.
diverse ability tests are administered to large unbiased also lacked adequate normative data and evidence to  A major problem with the 1937 scale was that its reliability
samples of the population, almost all of the correlations support its validity. (norms for the 1905 scale were based coefficients were higher for older subjects than for
are positive; resulted from the fact that all tests, no matter on only 50 children who had been considered normal younger ones. Thus, results for the latter were not as
how diverse, are influenced by g (the analogy of a central based on average school performance) stable as those for the former.
power station for a large metropolitan city.)
The 1908 Scale The 1960 Stanford-Binet Revision&Deviation IQ (SB-LM)
 factor analysis: a method for reducing a set of variables
or scores to a smaller number of hypothetical variables  age scale, which means items were grouped according to  tried to create a single instrument by selecting the best
called factors; can determine how much variance a set age level rather than simply one set of items of increasing from the two forms of the 1937 scale.
of tests or scores has in common; This common variance difficulty  Tasks that showed an increase in the percentage passing
represents the g factor.  little effort to diversify the range of abilities tapped with an increase in age as did tasks that correlated highly
 as a general rule, approx. half of the variance in a set of  A subject’s mental age was based on his or her with scores as a whole
diverse mental-ability tests is represented in the g factor. performance compared with the average performance of  deviation IQ was simply a standard score with a mean of
Implications of General Mental Intelligence (g) individuals in a specific chronological age group. 100 and a standard deviation of 16
Terman’s Stanford-Binet Intelligence Scale The Modern Binet Scale
 The concept of general intelligence implies that a
person’s intelligence can best be represented by a single
g (general intelligence)
score, g, that presumably reflects the shared variance H. H. Goddard: published a translation of the 1905 Binet- (reflects the common variability of all tasks)
underlying performance on a diverse set of tests. Simon scale in 1908, and the 1908 scale in 1911 Crystallized abilities
 Differences in unique ability stemming from the specific (reflect learning—the realization of original potential
task tend to cancel each other, and overall performance L. M. Terman: directed the 1916 Stanford-Binet version that through experience_
comes to depend most heavily on the general factor. flourished and served for quite some time as the dominant Fluid-analytic abilities
intelligence scale for the world. (represent original potential, or the basic
The gf-gc Theory of Intelligence capabilities that a person uses to acquire crystallized
The 1916 Stanford-Binet Intelligence Scale
abilities)
Fluid intelligence (f): can best be thought of as those abilities
Terman’s 1916 revision increased the size of the Short-term memory
that allow us to reason, think, and acquire new knowledge
standardization sample. Unfortunately, the entire (refers to one’s memory during short intervals—the
Crystallized intelligence (c): represents the knowledge and standardization sample of the 1916 revision consisted amount of information one can retain briefly after a
understanding that we have acquired single, short presentation)
exclusively of white, native-Californian children.
13
Thurstone: intelligence could best be conceptualized as PART II: APPLICATIONS: The Wechsler–Bellevue scale (1939), was poorly
comprising independent factors, or “primary mental abilities.” standardized. By 1955, however, Wechsler had revised the
C10: The Wechsler Intelligence Scales: WAIS-IV, Wechsler–Bellevue scale into its modern form, the Wechsler
Characteristics of the 1986 Revision Adult Intelligence Scale (WAIS), which was revised in 1981
WISC-IV, and WPPSI-III
 age scale format was entirely eliminated; items with the (the WAIS-R), again in 1997 (the WAIS-III), and yet again in
same content were placed together into any one of 15 2008 (WAIS-IV).
separate tests to create point scales. The Wechsler Intelligence Scales Scales, Subtests, and Indexes
 there are now five rather than four main factors. Each
factor, in turn, has an equally weighted nonverbal and  inappropriateness of the 1937 Binet scale as a measure of  Wechsler defined intelligence as the capacity to act
verbal measure. the intelligence of adults purposefully and to adapt to the environment; intelligence
 examiner–subject rapport was often impaired when adults is “the aggregate or global capacity of the individual
Fluid Nonverbal Matrices Tasks were tested to act purposefully, to think rationally and to deal
Reasoning Verbal Analogies
(FR)  emphasis on speed, with timed tasks scattered throughout effectively with his environment”; implies that
Knowledge Nonverbal Recognize the scale, tended to unduly handicap older adults. intelligence comprises several specific interrelated
(KN) Absurdities in  mental age norms clearly did not apply to adults. functions or elements and that general intelligence results
Pictures  it did not consider that intellectual performance could from the interplay of these elements.
Verbal Vocabulary  Index: created where two or more subtests are related to
General deteriorate as a person grew older.
Quantitative Nonverbal Quantitative
Intelligence Reasoning Reasoning
a basic underlying skill
Two of the most critical differences were:
(QR) Verbal Verbal Quantitative
Subtests
Reasoning 1. Wechsler’s use of the point scale concept rather than an
Visual/Spatial Nonverbal Form Board Vocabulary
age scale and Verbal Similarities
Reasoning Verbal Positions and
(VS) Directions 2. Wechsler’s inclusion of a nonverbal performance scale. comprehension Information
Working Nonverbal Block Pattern Picture completion
Memory Memory The Point Scale Concept
Perceptual Block design
(WM) Verbal Sentence Memory organization Matrix reasoning
 credits or points are assigned to each item
 By arranging items according to content and assigning a Arithmetic
Characteristics of the 2003 Fifth Edition specific number of points to each item, Wechsler Working memory Digit span
constructed an intelligence test that yielded not only a total Letter–number sequencing
 The fifth edition represents an elegant integration of the
overall score but also scores for each content area. Digit symbol–coding
age-scale and point-scale formats.
Processing speed Symbol search
 two routing measures (subtests): one nonverbal, one The Performance Scale Concept
verbal; point scale, which means that each contains  The Vocabulary Subtest: provides a relatively stable
similar content of increasing difficulty.  consisted of tasks that require a subject to do something estimate of general verbal intelligence
 The purpose of the routing tests is to estimate the rather than merely answer questions
 The Similarities Subtest: consists of paired items of
examinee’s level of ability  both the verbal and performance scales were increasing difficulty, identifying similarities
 the start point : The estimated level of ability standardized on the same sample, and the results of both
 The Arithmetic Subtest: contains approximately 15
 the basal : the level at which a minimum criterion scales were expressed in comparable units.
relatively simple problems in increasing order of difficulty.
number of correct responses is obtained  attempts to overcome biases caused by language, culture, (concentration, motivation, and memory)
 ceiling: which is a certain number of incorrect and education.  The Digit Span Subtest: requires the subject to repeat
responses that indicate the items are too difficult.  provide the clinician with a rich opportunity to observe digits, given at the rate of one per second, forward and
 Uses a standard deviation of 15 for IQ and factor scores behavior in a standard setting. backward; measures short-term auditory memory
14
 The Information Subtest: intellective & nonintellective Pattern analysis: one evaluates relatively large differences PART II: APPLICATIONS:
components, including abilities to comprehend between subtest scaled scores
C11: Other Individual Tests of Ability in Education and
instructions, follow directions, provide a response.
Psychometric Properties of the Wechsler Adult Scale Special Education
 The Comprehension Subtest: what should be done in a
given situation; to provide a logical explanation for some  The WAIS-III standardization sample consisted of a Seguin Form Board Test: It consisted of a simple form
rule or phenomenon; to define proverbs; measures stratified sample 2200 adults divided into 13 age groups board with objects of various shapes placed in appropriately
judgment in everyday practical situations, common sense. from 16:00 through 90:11 as well as 13 specialty groups; shaped holes (such as squares or circles); primarily to
 The Letter–Number Sequencing Subtest: made up of stratified according to gender, race, education, and evaluate mentally retarded adults and emphasized speed of
items in which the individual is asked to reorder lists of geographic region performance.
numbers and letters.  The impressive reliability coefficients for the WAIS-IV
 The Digit Symbol–Coding Subtest: requires the subject Healy–Fernald Test: was developed as an exclusively
attest to the internal and temporal reliability of the four
to copy symbols; Measures such factors as ability to learn nonverbal test for adolescent delinquents; t provided several
index scores, and full-scale IQ
types of tasks, rather than just one, and there was less
an unfamiliar task, visual-motor dexterity, degree of  The Wechsler tests are considered among the most valid.
persistence, and speed of performance emphasis on speed.
 The Block Design Subtest: The subject must arrange the The WISC-IV
Knox: developed a battery of performance tests for non-
blocks to reproduce increasingly difficult designs. ;
 The latest version of this scale to measure global English-speaking adult immigrants to the United States. The
requires the subject to reason, analyze spatial test was one of the first that could be administered without
intelligence and, in an attempt to mirror advances in the
relationships, & integrate visual &motor functions. language. Speed was not emphasized.
understanding of intellectual properties; provides
 The Matrix Reasoning Subtest: the subject is presented
composite index
with nonverbal, figural stimuli. The task is to identify a Infant Scales
 use of empirical data to identify item biases
pattern or relationship between the stimuli.
 standardization sample consisted of 2200 children Brazelton Neonatal Assessment Scale (BNAS)
 The Symbol Search Subtest: the subject is shown two
 scaled scores are calculated from raw scores on the basis
target geometric figures. The task is to search from among
of norms at each age level, just as in the WAIS-IV.  is an individual test for infants between 3 days and 4
a set of five additional search figures &determine whether weeks of age; purportedly provides an index of a
the target appears in the search group. The WPPSI-III newborn’s competence
 scores are obtained in a variety of areas,neurological,
Index Scores  a downward extension of the WISC-IV for measuring social, and behavioral aspects of a newborn’s functioning;
intelligence in the youngest children (2.5 years to 7 years,
 Verbal comprehension index: measure of crystallized reflexes, responses to stress, startle reactions, cuddliness,
3 months). animal pegs, an optional test that is timed and motor maturity, ability to habituate to sensory stimuli, and
intelligence.
requires
 Perceptual reasoning index: measure of fluid intelligence. hand–mouth coordination
 the child to place a colored cylinder into an appropriate  lack of norms; failure to predict future intelligence; is
 Working memory: the information that we actively hold in
hole in front of the picture of an animal; and (2) sentences, extremely well constructed.
our minds, in contrast to our stored knowledge, or long-
an optional test of immediate recall in which the child is
term memory
asked to repeat sentences presented orally by the Gesell Developmental Schedules (GDS)
 Processing speed index: attempts to measure how quickly
examiner.
your mind works.  (aka Gesell Maturity Scale, the Gesell Developmental
Observation, and the Yale Tests of Child Development) is
FSIQs
one of the oldest and most established infant intelligence
 obtained by summing the age-corrected scaled scores all measures.
four index composites; a deviation IQ with a mean of 100  developmental status of children from 2.3 months to 6.3
and a standard deviation of 15 is obtained. The FSIQ years of age.; that human development unfolds in stages
represents a measure of general intelligence. or in sequences over time; norms
15
 developmental quotient (DQ) is determined according to global scales called sequential processing, simultaneous Leiter International PerformanceScale–Revised(LIPS-R)
a test score, which is evaluated by assessing the presence processing, learning, planning, and knowledge.
or absence of behavior associated with maturation;  intended for psychological, clinical, minority-group,  strictly a performance scale; aims at providing a nonverbal
parallels the mental age (MA) concept. preschool, and neuropsychological assessment as well as alternative to the Stanford-Binet scale for the age range of
research; purports to enable the psychoeducational 2 to 18 years
Bayley Scales of Infant and Toddler Development– Third evaluation of learning disabled and other exceptional  to assess the intellectual function of children with
Edition (BSID-III) children and educational planning and placement. pervasive developmental disorders
 sequential-simultaneous distinction;  The Leiter scale purports to provide a nonverbal measure
 for infants between 1 and 42 months of age; and assesses of general intelligence by sampling a wide variety of
development across five domains: cognitive, language,  Sequential processing refers to a child’s ability “to solve
problems by mentally arranging input in sequential or functions from memory to nonverbal reasoning.
motor, socioemotional, and adaptive
serial order.” simultaneous processing refers to a child’s  assessing children with autism
 uses measures such as the infant’s response to a bell, the
ability to follow an object with eyes, and the ability to follow ability to “synthesize information (from mental wholes) in Porteus Maze Test (PMT)
oral instructions ;the motor scale assumes that later order to solve a problem”
mental functions depend on motor development  a popular but poorly standardized nonverbal performance
General Individual Ability Tests for Handicapped and
 Psychometrically rigorous; Predicts well for retarded measure of intelligence
Special Populations
infants; Does not predict future intelligence  it includes 12 mazes that increase in complexity across
Columbia Mental Maturity Scale–Third Edition (CMMS) age levels. The participant is required to trace the maze
Cattell Infant Intelligence Scale (CIIS) from the starting point to the goal while following certain
 purports to evaluate ability in normal and variously rules
 based on normative developmental data. handicapped children from 3 through 12 years of age.
 measure intelligence in infants & young children used for individuals with special needs; provides a more Testing Learning Disabilities
 an age scale format, contains five test items for each suitable measure of intelligence than do the more
month between 2 and 12 months of age and five items for Illinois Test of Psycholinguistic Abilities (ITPA-3)
established scales
each 2-month interval between 12 and 36 months of age.  requires the subject to discriminate similarities and  children ages 2 through 10;
 remained relatively unchanged for more than 60 years. It differences by indicating which drawing doesn’t belong on  major tests designed specifically to assess learning
is psychometrically unsatisfactory. a 6-by-9-inch card containing 3-5 drawings, depending on disabilities; information-processing theory.
Major Tests for Young Children
the level of difficulty. The task is multiple-choice.  assumes that failure to respond correctly to a stimulus can
 reliable instrument that is useful in assessing ability in result not only from a defective output (response) system
McCarthy Scales of Children’s Abilities (MSCA) many people with sensory, physical, or language but also from a defective input or information-processing
handicaps. Because of its multiple-choice nature, system; assumes that a human response to an outside
 measure ability in children between 2 and 8 years old. however, and consequent vulnerability to chance stimulus can be viewed in terms of discrete stages or
 18 scales, 15 are combined into a composite score known variance, one should use results with caution. processes.
as the general cognitive index (GCI), a standard score
with a mean of 100 and a standard deviation of 16. Peabody Picture Vocabulary Test–Fourth Edition Woodcock-Johnson III
 index reflects how well the child integrated prior learning (PPVT-IV)
 designed as a broad-range individually administered test
experiences & adapted them to the demands of the scales.
 purports to measure hearing or receptive (hearing) to be used in educational settings. It assesses general
Kaufman Assessment Battery for Children– Second vocabulary, presumably providing a nonverbal estimate of intellectual ability (g), specific cognitive abilities, scholastic
Edition (KABC-II) verbal intelligence aptitude, oral language, and achievement
 Each form has 204 plates, with each plate presenting  Cattell-Horn-Carroll (CHC) three-stratum theory of
 an individual ability test for children between 3 and 18  four numbered pictures. The subject must indicate which intelligence
years of age; consists of 18 subtests combined into five of the four pictures best relates to a word read aloud by
the examiner.
16
Visiographic Tests Individual Achievement Tests: Wide Range Advantages of Individual Tests and Group Tests
Achievement Test-3 (WRAT-4)
Benton Visual Retention Test–Fifth Edition (BVRT-V) Individual tests Group tests
 intelligence tests measure potential ability, whereas Provide information beyond Are cost-efficient
 assumes that brain damage easily impairs visual memory achievement tests measure what the person has actually the test score
ability; visual memory task is consistent with possible brain Allow the examiner to observe Minimize professional time for
acquired or done with that potential.
damage or brain diseases behavior in a standard setting administration and scoring
 purportedly permits an estimate of grade-level functioning Allow individualized Require less examiner skill and
 psychological deficit: poor performance on a specific in word reading, spelling, math computation, and sentence interpretation of test scores training
task is related to or caused by some underlying deficit comprehension . Have more objective and more
 ages 8 and older, the Benton test consists of geometric  used for children ages 5 and older and has two levels for reliable scoring procedures
designs briefly presented and then removed. The subject each of the three achievement areas. Have especially broad application
must then reproduce the designs from memory  The test merely required participants to pronounce words
Bender Visual Motor Gestalt Test (BVMGT) from a list; the test has not changed for nearly 60 years, it Using Group Tests
is “already outdated”
 It consists of nine geometric figures that the subject is  Use Results with Caution
simply asked to copy  Be Especially Suspicious of Low Scores
 anyone older than 9 who cannot copy the figures may PART II: APPLICATIONS:  Consider Wide Discrepancies a Warning Signal
suffer from some type of deficit; errors can occur for people C12: Standardized Tests in Education, Civil Service,  When in Doubt, Refer
whose mental age is less than 9 (low intelligence), those and the Military
with brain damage, those with nonverbal learning Group Tests in the Schools: Kindergarten Through 12th
disabilities, and those with emotional problems. When justifying the use of group standardized tests, test Grade
users often have problems defining what exactly they are Achievement tests Aptitude tests
Memory-for-Designs (MFD) Test trying to predict, or what the test criterion is. 1. Evaluate the effects of a 1. Evaluate the effects of an
 involves perceptual–motor coordination; 8 to 60 years of known or controlled set of unknown, uncontrolled set of
Comparison of Group and Individual Ability Tests experiences experiences
age
2. Evaluate the product of a 2. Evaluate the potential to profit
 subject attempts to draw a briefly presented design from  Individual tests require a single examiner for a single course of training from a course of training
memor; 15 drawings can then be corrected for age and subject; he examiner takes responsibility for eliciting a 3. Rely heavily on content 3. Rely heavily on predictive
intelligence by reference to a table. maximum performance. If a problem exists that might validation procedures criterion validation procedures
inhibit a maximum performance
Creativity: Torrance Tests of Creative Thinking (TTCT)  must assume that the subject was cooperative and
Group Achievement Tests
motivated. Subjects are not praised for responding, there
 One can define creativity as the ability to be original, to
are no safeguards to prevent a person from receiving a low
combine known facts in new ways, or to find new
relationships between known facts. score for reasons other than low ability Stanford Achievement Test is one of the oldest of the
 The Torrance tests separately measure aspects of Individual tests Group tests
standardized achievement tests widely used in the school
creative thinking. In measuring fluency, administrators ask One subject is tested at a time. Many subjects are tested at a system. Evaluates achievement in kindergarten through
an individual to think of as many different solutions to a time. 12th grades in the following areas: spelling, reading
problem as possible; Originality, a test maker attempts to Examiner records responses. Subjects record own comprehension, word study and skills, language arts, social
evaluate how new or unusual a person’s solutions to responses. studies, science, mathematics, and listening
Scoring requires considerable Scoring is straightforward and comprehension.
problems are; Flexibility is measured in terms of an skill. objective.
individual’s ability to shift directions or try a new approach Examiner flexibility can elicit There are no safeguards. Metropolitan Achievement Test (MAT): measures
to problem solving maximum performance if achievement in reading by evaluating vocabulary, word
permitted by standardization.
17
recognition, and reading comprehension. MAT-8 also Goodenough-Harris Drawing Test (G-HDT) The Logical-Content Strategy
measures mathematics by evaluating number concepts
problem solving and computation  either group or individually administered Woodworth Personal Data Sheet
 The subject is instructed to draw a picture of a whole man
Group Tests of Mental Abilities (Intelligence)  The first personality inventory ever
and to do the best job possible; standardized by
determining those characteristics of human-figure drawings  to identify military recruits who would be likely to break
Kuhlmann-Anderson Test (KAT)–Eighth Edition down in combat; contained 116 questions to which the
that differentiated subjects in various age groups.
individual responded “Yes” or “No.”
 8 separate levels; kindergarten through 12th grade.
The Culture Fair Intelligence Test
 items are primarily nonverbal at lower levels, requiring Early Multidimensional Logical-Content Scales
minimal reading and language ability  Constructed under the direction of R. B. Cattell;  Bell Adjustment Inventory - attempted to evaluate the
 percentile band – (confidence interval); provides the range  a paper-and-pencil procedure that covers three levels subject’s adjustment in a variety of areas such as home
of percentiles that likely represent subject’s true score. (ages 4–8 & mentally disabled adults, 8–12 & randomly life, social life, and emotional functioning.
selected adults, high-school and above-average adults).  Bernreuter Personality Inventory- for subjects as young
Henmon-Nelson Test (H-NT)
as age 13 and included items related to six personality
Standardized Tests Used in the U.S. Civil Service
 two sets of norms: raw score distributions by age; raw traits such as introversion, confidence, and sociability.
System
score distributions by grade. Mooney Problem Checklist;
 extremely sound instrument; can help predict future General Aptitude Test Battery (GATB)
 Contains a list of problems that recurred in clinical case
academic success quickly. history data and in the written statements of problems
 reading ability test that purportedly measures aptitude for
Cognitive Abilities Test (COGAT) a variety of occupations; use in making employment The Criterion-Group Strategy
decisions in government agencies; aptitudes
 three separate scores: verbal, quantitative, and nonverbal. Minnesota Multiphasic Personality Inventory (MMPI and
 Specifically designed for poor readers, poorly educated Standardized Tests in the U.S. Military: The Armed MMPI-2)
people, people for whom English is a second language. Services Vocational Aptitude Battery
 true–false self-report questionnaire; designed to aid in the
College Entrance Tests  identify students who potentially qualify for entry into the diagnosis or assessment of the major psychiatric or
military and can recommend assignment to various psychological disorders.
SAT Reasoning Test military occupational training programs.  S. R. Hathaway, a psychologist, and J. C. McKinley,
Cooperative School and College Ability Tests  eight criterion groups; Hypochondriacs; Depressives;
The American College Test Hysterics; Psychopathic deviates; Paranoids;
PART II: APPLICATIONS: Psychasthenics; Schizophrenics; Hypomanics;
Graduate and Professional School Entrance Tests (masculinity-femininity & social-introversion)
C13: Applications in Clinical and Counseling Settings
Graduate Record Examination Aptitude Test Validity Scales Group tests
Miller Analogies Test Strategies of Structured Personality-Test Construction
Lie Scale (L) Detect individuals who attempted to present
The Law School Admission Test Deductive Strategies themselves in an overly favorable way.
Infrequency scale (F) detect individuals who attempt to fake bad,
Nonverbal Group Ability Tests  Logical-Content Strategy K scale Locate those items that distinguished
 Theoretical Strategy normal from abnormal groups when both
Raven Progressive Matrices (RPM) groups produced a normal test pattern.
Empirical Strategies
 60 matrices; logical pattern or design with a missing part.;  Criterion-Group Strategy  Twopoint code, Meehl (1951) emphasized the importance
subject select the appropriate design from choices. with or  Factor Analytic Strategy of conducting research on individuals who showed specific
without a time limit; measure of general intelligence, two-point codes and other configural patterns.
Spearman’s g; general fluid intelligence
18
California Psychological Inventory (CPI)–Third Edition The Theoretical Strategy Combination Strategies
 attempts to evaluate personality in normally adjusted Edwards Personal Preference Schedule (EPPS) Positive Personality Measurement
individuals and thus finds more use in counseling settings.
 human needs proposed by Murray include the need to  research suggests that it may be advantageous to
 The test contains 20 scales, each of which is grouped into
accomplish (achievement), the need to conform evaluate individuals’ positive characteristics in an attempt
one of four classes. Class I scales: poise, self-assurance,
(deference), and the need for attention (exhibition). to understand the resources that an individual is endowed
and interpersonal effectiveness. Class II scales:
socialization, maturity, and responsibility, conscientious,  concerned about faking and social desirability - forced- with and how this affects behavior and well-being.
honest, ethical and moral issues. Class III scales: choice method, solution to the problem of faking and other  the ability to live a satisfying life even in the midst of stress
achievement potential and intellectual efficiency. Class IV: sources of bias. and hardship depends on positive personal characteristics
interest modes.  Ipsative scores present results in relative terms rather rather than only on the absence of psychopathology or
than as absolute totals; compare the individual against negative affect
The Factor Analytic Strategy himself or herself and produce data that reflect the relative  Currently, several such measures of positive
strength of each need for that person; each person thus characteristics exist that evaluate traits such as
Guilford’s Pioneer Efforts
provides his or her own frame of reference conscientiousness, hope, optimism, and self-efficacy
 Guilford and his associates determined the Personality Research Form, Third Edition (PRF-III) and The NEO Personality Inventory–Three (NEO PI-R™)
interrelationship (intercorrelation) of a wide variety of tests Jackson Personality Inventory Revised (JPI-R)
and then factor analyzed the results in an effort to find the  attempts to provide a multipurpose inventory for
main dimensions underlying all personality tests.  based on Murray’s theory of needs; developed independent predicting interests, health and illness behavior,
 J. R. Guilford - Guilford-Zimmerman Temperament Survey specific definitions of each need. psychological well-being, and characteristic coping
 JPI - normal individuals to assess various aspects of styles. (Likert Format)
Cattell’s Contribution
personality including interpersonal, cognitive, and value  three broad domains:
 Allport and Odbert (1936) reduced their list to 4504 “real” orientations; 15 scales have been organized in terms of five o neuroticism (N) - defined by anxiety and depression
traits. (171) to (36 Surface traits) to (16 Source Traits) higher-order dimensions termed analytical, extroverted, o extroversion (E) - degree of sociability or withdrawal
 Sixteen Personality Factor Questionnaire, 16PF emotional, opportunistic, and dependable. o openness (O) - breadth of experience that is amenable
 Clinical Analysis Questionnaire (CAQ) items related to  PRF - intended primarily for research purposes  the “Five-factor model”
psychological disorders have been factor analyzed,  general amenability to cross-cultural and international
Self-Concept
resulting in 12 new factors in addition to the 16 needed to studies, along with the potential significance of
measure normal personalities.  tset of assumptions a person has about himself or herself biologically based universal human traits,
 Gough’s Adjective Checklist - contains 300 adjectives in Frequently Used Measures of Positive Personality Traits
Problems with the Factor Analytic Strategy
alphabetical order
 subjective nature of naming factors  Piers-Harris Children’s Self-Concept Scale–Second Rosenberg Self-Esteem Scale: measures global feelings
o Common variance - amount of variance a particular Edition - contains 80 self-statements and requires a “Yes” of self-worth using 10 simple and straightforward statements
variable holds in common with other variables; It results or “No” response that examinees rate on a 4-point Likert scale.
from the overlap of what two or more variables are  Tennessee Self-Concept Scale–Second Edition - a
General Self-Efficacy Scale (GSE): measure an
measuring. formal paper-and-pencil test that is designed to measure
individual’s belief in his or her ability to organize resources
o Unique variance - factors uniquely measured by the self-concept data
and manage situations, to persist in the face of barriers, and
variable; construct measured only by the variable in  Rogers - the self is organized to remain consistent
to recover from setbacks. 10 items ; 4 minutes.
question.  Q-sort technique - a person receives a set of cards with
o Error variance - variance attributable to error appropriate self-statements, sorts the cards in piles from Ego Resiliency Scale Revised: measure of ego resiliency
 Factor analytic procedures generally identify sources of least to most personally descriptive, make two cards; first or emotional intelligence was developed by Block and
common variance at the expense of unique variance. describes real self, second describes ideal self. Kremen; 14 items, 4-point Likert scale to rate statements
19
Dispositional Resilience Scale (DRS): by Bartone, Wright, PART II: APPLICATIONS: examiner presents each card again to obtain sufficient
Ingraham, and Ursano to measure “hardiness,” which is information for scoring purposes. The five major Rorschach
defined as the ability to view stressful situations as C14: Projective Personality Tests scoring categories are location (where), determinant (why),
meaningful, changeable, and challenging. The Projective Hypothesis: proposes that when people content (what), frequency of occurrence (popular-original),
attempt to understand an ambiguous or vague stimulus, their and form quality (correspondence of percept to stimulus
Hope Scale: aka Dispositional Hope Scale; characterizes
interpretation of that stimulus reflects their needs, feelings, properties of the inkblot).
hope as goal driven energy (agency) in combination with
capacity to construct systems to meet goals (pathways); experiences, prior conditioning, thought processes, etc. Against In Favor
scale of 12 items that are rated 8-point Likert scale ranging The Rorschach Inkblot Test 1. Lacks a universally accepted 1. Lack of standardized
from “definitely false” to “definitely true.”; 2 to 5 minutes standard of administration, procedures is a historical accident
Historical Antecedents scoring, and interpretation. that can be corrected.
Life Orientation Test–Revised (LOT-R): self-report 2. Evaluations of data are 2. Test interpretation is an art, not
measure of dispositional optimism, tendency to view the  J. Kerner: noted individuals report idiosyncratic or unique subjective. a science; all test interpretation
involves a subjective component.
world and the future in positive ways; consists of 10 items personal meanings when viewing inkblot stimuli. (1857) 3. Results are unstable over 3. A new look at the data reveals
developed to assess individual differences in generalized  Binet proposed the idea of using inkblots to assess time. that the Rorschach is much more
optimism versus pessimism. 5- point response scale ranging personality functioning (1896) stable than is widely believed.
from “strongly disagree” to “strongly agree.”  The publication of the first set of standardized inkblots by 4. Is unscientific. 4. Has a large empirical base.
Whipple (1910); concerning the potential value of inkblots 5. Is inadequate by all traditional 5. Available evidence is biased
Satisfaction with Life Scale (SWLS): a multi-item scale for standards. and poorly controlled; therefore
for investigating human personality failed to provide a fair evaluation.
the overall assessment of life satisfaction as a cognitive
 Rorschach receives credit for finding an original and
judgmental process, rather than for the measurement of
important use for inkblots: identifying psychological An Alternative Inkblot Test: The Holtzman
specific satisfaction domains.
disorders; investigation in 1911 and culminated in 1921 with
the publication of his famous book Psychodiagnostik.  created to meet these difficulties while maintaining the
Positive and Negative Affect Schedule (PANAS): by
 David Levy: brought Rorschach’s to the U.S from Europe advantages of the inkblot methodology: variable number of
Watson, Clark, and Tellegen (1988) to measure two
responses from one subject to another, lack of standard
orthogonal dimensions of affect; two scales—positive affect  Samuel J. Beck: interested in studying certain patterns or,
as he called them, “configurational tendencies” in procedures, and lack of an alternative form
(PA) and negative affect (NA). Each scale consists of 10
adjectives; rate the extent to which their moods have Rorschach responses  Both forms, A and B, of the Holtzman contain 45 cards.
mirrored the feelings described by each adjective during a  Marguerite Hertz: stimulated considerable research on the Each response may be scored on 22 dimensions.
specified period of time. Rorschach during its establishment in the US The Thematic Apperception Test
 Bruno Klopfer: published several key Rorschach books
Coping Intervention for Stressful Situations (CISS): 48-  introduced in 1935 by Christina Morgan and Henry Murray
and articles and played an important role in the early
item questionnaire that measures coping styles by asking of Harvard University;
development of the test
subjects how they would respond to a variety of stressful  based on Murray’s (1938) theory, which distinguishes 28
 Zygmunt Piotrowski and David Rapaport: influence on
situations; 5-point Likert scale with choices ranging from “not human needs,
clinical practitioners who use the Rorschach.
at all” to “very much,” this inventory assesses individuals
according to three basic coping styles: task-oriented coping, Stimuli, Administration, and Interpretation Rorschach TAT
emotion-oriented coping, and avoidance-oriented coping. Rejected by scientific Well received by scientific
 Of the 10 cards, five were black and gray; two contained community community
Core Self-Evaluations: broad-based personality construct black, gray, and red; and three contained pastel colors of Atheoretical Based on Murray’s (1938) theory
is composed of four specific traits: self esteem, generalized various shades. of needs
Oversold by extravagant claims Conservative claims
self-efficacy, neuroticism, and locus of control.  Rorschach administration involves two phases: free- Purported diagnostic instrument Not purported as diagnostic
association and inquiry. First phase: the examiner presents Primarily clinical use Clinical and nonclinical uses
each card with a minimum of structure. Second phase: the
20
Stimuli, Administration, and Interpretation PART II: APPLICATIONS: Cognitive Functional Analysis
The TAT stimuli consist of 30 pictures, of various scenes, C15: Computers and Basic Psychological Science in  what a person says to himself plays a critical role in
and one blank card. Specific cards are suited for adults, Testing behavior; Meichenbaum’s technique
children, men, and women. In administering, the examiner  internal dialogue such as self-appraisals and expectations.
Cognitive-behavioral procedures differ from traditional tests
asks the subject to make up a story; he or she looks for the  ascertains the environmental factors that precede
events that led up to the scene, what the characters are in that they are more direct, have fewer inferential
behavior as well as those that
assumptions, and remain closer to observable phenomena.
thinking and feeling, and the outcome. Almost all methods of  maintain it. In addition, this kind of analysis attempts to
interpretation take into account the hero, needs, press,  Traditional tests are based on the medical model, which ascertain the internal or cognitive antecedents and
themes, and outcomes. views the overt manifestations of psychological disorders consequences of a behavioral sequence.
Alternative Apperception Procedures merely as symptoms of some underlying cause. This
Computers and Psychological Testing
underlying cause is the target of the traditional procedures.
 Family of Man photo-essay collection: provides a  Cognitive behavioral tests are based on the belief that  For testing, one can use computers in two basic ways: (1)
balance of positive and negative stories and a variety of the overt manifestations of psychological disorders are to administer, score, and even interpret traditional tests
action and energy levels for the main character. more than mere symptoms. Although possibly caused by and (2) to create new tasks and perhaps measure abilities
 The Children’s Apperception Test (CAT): was created some other factor, the behaviors themselves—including that traditional procedures cannot tap.
to meet the special needs of children ages 3 through 10; actions, thoughts, and physiological processes—are the  Farrel (1992) has identified seven applications of
contain animal rather than human figures targets of behavioral tests. computers in the field of cognitive-behavioral assessment:
 Tell Me a Story Test (TEMAS): is a TAT technique that (1) collecting self-report data, (2) coding observational
Procedures Based on Operant Conditioning
consists of 23 chromatic pictures depicting minority and data, (3) directly recording behavior, (4) training, (5)
nonminority characters in urban and familial settings Steps in a Cognitive – Behavioral Assessment organizing and synthesizing behavioral assessment data,
 Gerontological Apperception Test: uses stimuli in which Step 1 Identify critical behaviors. (6) analyzing behavioral assessment data, and (7)
one or more elderly individuals are involved in a scene with Step 2 Determine whether critical behaviors are supporting decision making.
a theme relevant to the concerns of the elderly, such as excesses or deficits
loneliness and family conflicts Step 3 Evaluate critical behaviors for frequency,
duration, or intensity (that is, obtain a baseline). PART II: APPLICATIONS:
 Senior Apperception Technique: is an alternative to the Step 4 If excesses, attempt to decrease frequency,
Gerontological Apperception Test and is parallel in content duration, or intensity of behaviors; if deficits, C16: Testing in Counseling Psychology
attempt to increase behaviors.
Nonpictorial Projective Procedures Measuring Interests
Self-report techniques
 Word Association Test: infer possible disturbances and  The Strong Vocational Interest Blank: match the
areas of conflict from an individual’s response to specific  Focus on situations that lead to particular response interests of a subject to the interests and values of a
words. (Kent-Rosanoff word association, Rapaport et al.) patterns, whereas traditional procedures focus on criterion group of people who were happy in the careers
 Sentence Completion Tasks: incomplete sentence determining the internal characteristics of the individual they had chosen.
tasks provide a stem that the subject is asked to that lead to particular response patterns.  The Strong-Campbell Interest Inventory: interests
complete.(Rotter Incomplete Sentence Blank, Incomplete  purport to be more related to observable phenomena than express personality and that people can be classified into
Sentences Task of Lanyon and Lanyon, Washington are traditional procedures. one or more of six categories according to their interests
University Sentence Completion Test) (Holland’s theory of vocational choice)
 Figure Drawing Tests: expressive techniques, to create Functional (behavior-analytic) approach
 The Campbell Interest and Skill Survey: ultimately
something, usually a drawing. (Draw-a-Person Test,  Rather than labeling people as schizophrenic or neurotic, yields a variety of different types of scales. (Orientation,
House-Tree-Person Test, Kinetic Family Drawing Test, the psychologist would focus on behavioral excesses and basic, Occupational)
Goodenough Draw-a-Man Test) deficits (Kanfer and Saslow):
21
 The Kuder Occupational Interest Survey: ranks the test  Adult Neuropsychology Base Rates and Hit Rates
taker in relation to men and women employed in different o Halstead-Reitan Neuropsychological Battery: find
occupations and are satisfied with their career choices. specificareas(brain)that to particular behaviors  cutting score: score marking the point of decision
 The Career Assessment Inventory: is designed for o Luria-Nebraska Neuropsychological Battery: the  hit rate: the percentage of cases in which a test accurately
people not oriented toward careers requiring college or concept of pluripotentiality—that any center in the brain predicts success or failure
professional training. can be involved in several different functional systems  base rate: pass rate without using any test yet; rate of
 The Self-Directed Search: to be a self-administered, self- o California Verbal Learning Test (CVLT): identify predicting success on the job without the test.
scored, and self-interpreted vocational interest inventory different strategies, processes, and errors that are
Decision on the basis of cutting score
associated with specific deficits. Performance on the job Acceptable Unacceptable
Measuring Personal Characteristics for Job Placement  Automated Neuropsychological Testing Success Hit Miss
Trait Factor Approach: Osipow’s Vocational  Anxiety and Stress Assessment Failure Miss Hit
Dimensions: give extensive tests covering personality  The State-Trait Anxiety Inventory: behavior is
abilities, interests, and personal values to learn as much influenced by situations (State) and personality traits.  Hit: correct prediction
about a person’s traits as possible.  Measures of Coping  Miss: test makes an inaccurate prediction
 Ecological Momentary Assessment: computer collect o false positive: Ex: Someone incapable is hired (Type
Attribution Theory information on a continuing basis I error) false negatives: Ex: Someone capable is not
hired (Type II error)
 suggested that events in a person’s environment can be Quality-of-Life Assessment
caused by one of three sources: persons, entities (things TRUTH
or some aspect of the environment), and times (situations  Medical Outcome Study Short Form-36 (SF-36): TEST (+) Condition (-) Condition
 Mischel demonstrated that personality measures do not physical functioning, role-physical, bodily pain, general (+) Prediction TRUE POSITIVE FALSE POSITIVE
always accurately predict behavior in particular situations. health perceptions, vitality, social functioning, role- (-) Prediction FALSE NEGATIVE TRUE NEGATIVE
At about the same time, many attribution theorists began emotional, and mental health
demonstrating that people explain the behavior of others  Nottingham Health Profile (NHP): respondent indicates
Taylor-Russell Tables: method for evaluating the validity of
by using personality traits; however, when asked about whether or not a health condition has affected his or her
life in these areas. a test in relation to the amount of information it contributes
their own behavior, they tend to attribute cause to the
beyond the base rates; give the likelihood that a person
situation. These ideas gave rise to the development of  Decision Theory Approaches: include methods for
selected on the basis of the test score will actually succeed.
measures to assess the characteristics of social estimating the value of the equivalent of a life-year, or a
environments and work settings. Studies of the stability of quality-adjusted life year (QALY). Utility Theory and Decision Analysis: define levels
occupational interests suggest that aspects of personal  mHealth: diverse applications of wireless and mobile besides success&failure; information available to the analyst
preference are fairly stable over long periods of time. technologies designed to improve health research, health-
care services, and health outcomes. Value-Added Employee Assessments: evaluating on the
PART II: APPLICATIONS:  NIH Toolbox: stimulate the use of a common set of basis of the value they add
C17: Testing in Health Psychology and Health Care measures in research and in clinical care. Incremental Validity: unique information gained through
using the test
Neuropsychological Assessment PART II: APPLICATIONS:
Personnel Psychology from the Employee’s
 Clinical Neuropsychology: the scientific discipline that C18: Testing in Industrial and Business Settings Perspective: Fitting People to Jobs
studies the relationship between behavior and brain
functioning in the realms of cognitive, motor, sensory, and Personnel Psychology—The Selection of Employees The Myers-Briggs Type Indicator: determine where
emotional functioning people fall on the introversion–extroversion dimension and
Employment Interview: primary tool for selecting
 Developmental Neuropsychology: to provide a baseline on which of the four modes they most rely.
for neurological changes over time employees.
22
 Sensing: knowing through sight, hearing, touch, and so on; PART III: ISSUES: PART III: ISSUES:
 Intuition: inferring what underlies sensory inputs;
C19: Test Bias (Summary) C20: Ethics and the Future of Psychological Testing
 Feeling: focusing on the emotional aspect of experience;
(Summary)
 Thinking: reasoning or thinking abstractly. There are strong differences of opinion about the
value of intelligence and aptitude tests for minority group The future of psychological testing depends on many issues
Tests for Use in Industry: Wonderlic Personnel Test
members. As a result of the challenge to traditional tests, and developments. Professional issues include
(WPT): quick (12-minute) test of mental ability in adults.
approaches such as the Chitling Test, the BITCH, and the
SOMPA have been developed.  theoretical concerns, such as the usefulness of the trait
Measuring Characteristics of the Work Setting
concept as opposed to index of adjustment;
The Social-Ecology Approach Part of the debate about test bias results from  Adequacy of tests & actuarial versus clinical
different moral views about what is fair: prediction.
 ecological psychology: focuses on events that occur in  Moral issues include human rights such as the right to
 Unqualified individualism: argued that a testing and
a behavioral setting. refuse testing, the right not to be labeled, and the right to
selection program is fair if it selects the best-suited people,
 involves examining the relationship between work privacy.
regardless of their social group.
satisfaction and the requirements of the job.
 Quotas: selection of members from different racial and  Divided loyalty that can result from administering a test
Job Analysis ethnic groups according to their proportions in the general to an individual for an institution: Whose rights come first?
population. Also, professionals have an ethical duty to provide and
 Checklists: used by job analysts to describe the activities  Qualified individualism: compromise between the other understand the information needed to use a test properly.
and working conditions usually associated with a job title. two.  Social issues such as dehumanization, the usefulness of
 Critical incidents: observable behaviors that differentiate tests, and access to testing services also inform the field
successful from unsuccessful employees. The critical- Although test bias will surely remain an area of of testing today.
incident method was developed by J. C. Flanagan considerable controversy, some positive potential solutions
Current trends include the proliferation of new tests,
 Observation: method for learning about the nature of the have come to light. For example, differences in test scores
may reflect patterns of problem solving that characterize higher standards, improved technology, increasing
job.
different subcultures. Also, one might evaluate tests against objectivity, greater public awareness and influence, the
 Interviews can also be used to find out about a job.
outcome criteria relevant to minority groups. computerization of tests, and testing on the Internet.
However, some workers may give an interviewer
information that differs from what they would give another Test performance As for the future, anything is possible, especially in a field as
employee because they are uncomfortable or fear that controversial as testing. Psychology is now better equipped
what they say will be held against them. One group believes the differences are biological in in technique, methodology, empirical data, and experience
 Questionnaires: commonly used to find out about job origin while another believes the differences result from than ever before, and the members of this new and
situations, but their use calls for special precautions; the influence of social environment. Further, we now expanding field, as a group, are relatively young. Therefore,
inexpensive; the employer may never know whether the have evidence that gene expression is affected by it does not seem unrealistic or overly optimistic to expect that
respondent understood the questions. environments. the next 50 years will see advances equal to those of the last
50. On the other hand, psychology has come so far in the
Person–Situation Interaction PART III: ISSUES:
last 50 years that a comparable advance in the next 50 could
 The interactionists support their position by reporting the C20: Testing and the Law (Summary) easily produce results unimaginable today
proportion of variance in behavior explained by person,
With increasing frequency, tests are coming under legal
by situation, and by the interaction between person and [FINISH. BAAANZAI!!]
regulation. The regulation of testing through statute (laws
situation.
passed by legislators), regulation (rules created by
 career satisfaction depends on an appropriate match
agencies), and litigation (lawsuits) has only recently become
between person and job.
common.
23

Psych Assessment

Uploaded by

Copyright:

Available Formats

Psych Assessment

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Psych Assessment

Uploaded by

Copyright:

Available Formats

PSYCHOLOGICAL TESTING: Principles, Applications, & Issues 8th Edition NOTES & REVIEW PURPOSES

|Kaplan & Sacuzzo | PLUUUUS ULTRAAAAAAAAAAAAAA

same test 2. KR20

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.