Ada 118822
Ada 118822
Ada 118822
F/G q/2
DEVELOPMENT OF A MICROCOMPUTER-HASED ADAPTIVE TESTING SYSTEM. P--ETC(tl)
JUN 82 C 0 VALE, C ALBING, L FOnTE-LFNNOX NOOO'4-2-C-n132
UbiLLASSIFIEO ONR-ASC-82-01 NL
-"III2III
CC)
SX2~
I 1
I I
Unclassified
ISo. DECL ASSI FICATION/ DOWNGRADING
SCHEDULE
17. DISTRIBUTION STATEMENT (of the absttac entered In Block 20, It diffeornt hm Report)
19. KEY WORDS (Continue an revere side it necesary ond i~iy by block number)
adaptive testing latent trait theory
computerized testing microcomputer testing system
tailored testing adaptive testing computer system design
item response theory microcomputer evaluation
* -- - - * ;.__
Unclassified
SECURITY CLASSIFICATION OF THIS PAGE (aInm Data SEIm
20.
microcomputer systems were considered in search of hardware on which
to implement the design; of these, three eight-bit microcomputer
systems were selected for future consideration. A draft of a user's
manual, reviewed by experts, was developed for the proposed system
and served as a preliminary design document. In summary, this research
indicated that a definite need exists for a microcomputer-based
adaptive testing system and that the proposed system would meet that
need.
AcOeSS01•
- iS -
Di ribUt t..e
,4e-1 i st t P c r
Unclassified
- . . . . . -
PREFACE
Iii
4
TABLE OF CONTENTS
L4
11. DESIGN OF A SYSTEM TO MEET THE REQUIREMENTS . . . . . . 36
CAT Systems . . . . . . . . . . . . .. .. .. . .. 36
The Minnesota system. ... ........ ... 36
The Civil Service system . . . . . . . . . . . . .37
The Army system. .... ....... ...... 37
The Missouri system... .. .. .. ...... 37
CAl Systemns . . . .. .. .. . .* . . . . . . . . . 39
PLATO. . . . . . . . . . . . . . . . .. .. .. 40
Author Language . .. .. .. .. .. .. . .. . . . 41
Modue delimiters . . .. .. .. .. .. .. .. 42
Assignment statement . .. .. .. .. .. ... 42
Basic statements . . . . . . . .42
Instructions . .. .. .. .. .. . .. . .. .. .. 49
Item Presentation .. .... .... .. .. ... . . . 49
Display Information. .. .... .... .. . .. 51
Structure . . . . . . . . . . . . .. .. .. . .51
Response Acceptance and Editing . . .. .. .. . .. 52
Error correction .. .... ........ . . . . 52
Special responses . . .. .. .. .. . .. . .. 52
Timing . . . . . .. .. . . . . . . .. . . . . 52
Examnee Monitoring. ... . . . . . . . . . . . 53
-4-
Test Analysis. ... .......... . . . . . . .. .. .. 55
Template Processing... .. .. .. .. .. .. .. . 57
Input and output. .. ... ....... ...... 57
Print-instructs.. .. .. .. .. .. ..
. . . . 59
Accept-Input . . .... . . .. .. .. .. .. 59
Flush-Trim ... ... ........ . . . . . . . 60
Summary. .. .... ........ ....... 60
Test Consolidation . . . . . . . . . . .. .. .. .. .. 61
Stage-One .. ... ....... . . . . . . . . . 61
Stage-Two .. ... ........ . .. . . . . . . 63
Stage-Three . .. .. .. .. .. .. .. .. . .64
Stimmary. .. ....... .. .. .. .. .. .. .. 64
I tern Banking . . .. .. .. .. . .. . . .. .. .. .. 65
-6-
* ***t ~
I. SPECIFICATION OF THE SYSTEM REQUIREMENTS
-7-
demand, such a system should be able to administer tests of most formats
and strategies (both adaptive and conventional) that could benefit from
computer administration. A comprehensive list and description of potential
system requirements was compiled as a first step in the design of a system
meeting these needs.
The analysis of system requirements will be presented in four parts.
Part 1 will present a comprehensive list of item types currently in
existence. Part 2 will consider testing strategies for selecting and
ordering items; adaptive testing models will be reviewed In detail. Part 3
will describe a survey of system requirements as seen by potential system
users. Finally, Part 4 will synthesize the requirements to provide basic
specifications for the system.
-8-
previously attempted alternatives eliminated or disabled. In comn-
puterized presentation, this may be done by eliminating the chosen
alternatives from the CRT screen. The item is presented until the
examinee answers correctly or until all of the Incorrect alternatives are
exhausted.
The confidence-weighted multiple-choice item format (Shuford, Albert,
6Massengill, 1966) is another method of extracting more information from
the simple multiple-choice item. In this format, the examinee answers the
multiple-choice item and is then asked to indicate his/her level of
confidence that the choice is correct. Administratively, this is a matter of
following each knowledge question with a confidence question by which the
examinee indicates his/her degree of confidence that the answer was
correct.
Another variation, probab iistic- response items (de Finneti, 1965),
incorporates the examinee's confidence into the initial item response.
Rather than choosing a specific alternative, the examinee assigns
probabilities to each alternative in accordance with his/her confidence that
each is correct. Administratively, these items require a simple method for
allowing the examinee to assign the probabilities to the appropriate
alternatives and a method for editing and correcting the probabilities when
they do not add up to 1.0.
Although the multiple-choice response has been the method most
widely Implemented for knowledge items, the free-response Item has seen
some limited use in computerized testing. For item types requiring a
numerical response, such as an arithmetic item, the free-response mode Is
only slightly more difficult to Implement than the multiple-choice mode.
Simple textual -response items have also been successfully implemented on a
computer (e.g., Vale & Weiss, 1977; Vale, 1978). The problem with
free-response items comes not in accepting the response, but in processing
the response Into a form that can be scored by the computer. In the
example by Vale (1978), the free responses were categorized for further
statistical processing. More complex (e.g., essay) free-response modes
have not yet been successfully computerized, although some
computer -ass isted-instruction programs can accept a limited range of free
responses In textual format.
Cognitive-Process Items
Cognitive-process Items assess whether an examinee possesses a
cognitive skill without', in theory, requiring the examinee to possess any
knowledge. The distinction between cognitive-process and knowledge Items
Is similar to the distinction that may once have existed between aptitude
and achievement before, among other examples, vocabulary achievement
became a primary component of scholastic aptitude.
Several cognitive-process tests considered for use in armed service
recruit selection and placement have been listed and described by Cory
(1973, 1977, 1978). These include five tests of memory and four tests of
concept formation. A "Memory for Objects" test tachistoscopically
presented frames contai.aing pictures of four to nine objects. The
examinee's task was either to recall and type the items or to recognize the
items from a word list. The "Memory for Words" test was similar except
that the frames consisted of three to nine three- or five-letter words.
The "Visual Memory for Numbers Test" presented a sequence of four to
thirteen digits, one digit every second; the examinee had to recall the
sequence. The "Auditory Memory for Numbers" test was similar except
that the digits were presented orally by a tape recorder. The
"Object-Number Test" was a paired-associates learning task.
-10-
examinee was to move the numbers of one matrix to make the matrices
match by moving single numbers into the single vacant slot. Movement
was accomplished by entering the number to be moved and the direction In
which to move it.
The main administrative difference between the knowledge Items and
the cognitive-process Items appears to be the requirement for a dynamic
administration process in some of the cognitive-process items. The memory
items require that portions of the items be sequenced with precise timing
between portions. Additionally, as implemented by Cory, some of the
items require free responses.
Perceptual -Motor Items
Perceptual -motor items differ from the items discussed so far In that
the previous items require discrete (e.g., keyboard) responses.
Perceptual -motor items may require a continuous response of some form
(e.g., joystick) rather than a press of a finite number of response
buttons. These items also may require a pictorial or graphic stimulus.
Cory (1973) described two tests of perceptual speed, four tests of
perceptual closure, and two tests of detection of motion. The "Comparing
Figures" test, one of the perceptual speed tests, presented sets of
squares or circles with embedded vertical or horizontal bars. The
examinee's task was to determine if all of the bars were oriented In the
same way. The "Counting Numbers" test presented a string of numbers
and asked the examinee to determine the frequency of a specified number.
"Recognizing Objects," one of the perceptual closure tests, presented
pictures of common objects with 10 to 90 percent of the picture blotted
out. The examinee's task was to identify the object. The "Gestalt
Completion" test was very similar. "Concealed Words" presented words
with missing letters and required the examinee to Identify the word. The
"Hidden Patterns" test required the examinee to recognize a test pattern
embedded in a larger pattern.
The "Memory for Patterns" test, one of the movement detection tests,
presented a pattern by sequentially blinking a sequence of dots.
Examinees were asked to Identify the patterns. The "Drift Direction" test
presented a dot moving slowly by a line. The task was to determine if It
a w-is moving toward, away from, or parallel to the line.
Hunter (1978) described two computerized perceptual -motor tests that
were outgrowths of earlier mechanical tests. "Two-Hand Coordination"
required the examinee to move a cursor to follow a target as It rotated
around the screen. The cursor was controlled by two hand controls, one
for vertical movement and the other for horizontal movement. The score
was computed as the average distance from the target over a five minute
period. This type of Item differs from those previously discussed in that
it requires real-time control of the screen and an examinee interface that
Is mechanical and continuous rather than a discrete response.
Barrett et al.'s test battery also included a reaction time test. The
test presented a warning signal (an asterisk) and followed it 1 to 5
seconds later with a letter or a number. The examinee responded by
pressing a button on the response panel to indicate whether a letter or a
-12-
number had been presented. The score consisted of mean response time
for the correct responses.
Simulations
-13-
Non-Cognitive Items
- 14-
The inter-subtest branching strategies are similar in concept to the
Inter-item branching strategies except that the branching is from subtest
to subtest rather than from item to item. A subtest is simply a group of
items. Vale (1981) further divided this class of strategies Into re-entrant
and nonre-entrant forms. A re-entrant form was one in which
administration could branch to a subtest, out of the subtest, and back
into it. An example of the nonre-entrant form Is the two-stage test
(Angoff & Huddleston, 1958; Weiss, 1974; Weiss & Betz, 1973) in which the
score on a routing test determines the appropriate measurement test. One
example of a re-entrant strategy is the flexilevel strategy (Lord, 1971b;
Weiss, 1974). In the flexilevel test, items are ordered by difficulty and
testing begins in the middle. If the items are split into an easy subtest
and a difficult subtest, administration branches back and forth between
them, branching to the difficult subtest after a correct response and to
the easy subtest after an incorrect response. The stradaptive strategy
(Vale S Weiss, 1978), another re-entrant form, groups items into several
strata and branches to a more difficult stratum after a correct response,
or to an easier stratum after an incorrect one.
-15-
.........
and repeat the process for a fixed number of items or until the estimate
becomes sufficiently precise.
Conceptually, model-based testing strategies are simple to administer;
the Item expected to most Improve the estimate of ability is selected and
administered, the ability is re-estimated, and the process is repeated.
Practically, the difficulties arise in determining which item Is expected to
most improve the ability estimate and in estimating the ability. Both
processes are dependent on the statistical model used and most are based
on MRT.
IRT models. IRT refers to a family of psychometric theories that
express the probability of choosing a particular item response as a
function of an underlying trait or ability. The model most often used for
CAT is the three-parameter logistic model (Birnbaum, 1968).
In the three-parameter logistic model, the Item is characterized by
the three parameters a, b, and c. Ability is characterized by a single
parameter, theta. The a parameter is an index of the item's power to
discriminate among different levels of ability. It ranges, theoretically,
between negative and positive infinity but, practically, between zero and
about three when ability is expressed in a standard-score metric. A
negative a parameter means that a low-ability examinee has a better chance
of answering the item correctly than does a high-ability examinee. An a
parameter of zero means that the item has no capacity to discriminate
between different levels of ability (and would therefore be useless as an
item in a power test). Items with high positive a parameters provide
sharper discrimination among levels of ability and are generally more
desirable than items with low a parameters.
The b parameter Indicates the difficulty level of an item. It is scaled
in the same metric as ability and indicates what value of theta an examinee
would need to have a 50-50 chance of knowing the correct answer to the
item. This is not, however, the level of theta at which the examinee has
a 50-50 chance of selecting a correct answer if it is possible to answer the
item correctly by guessing.
-16-
P(u=110) = c + (1-c) *,[1.7a(O-b)] [I]
where:
*(x) = 1/[1+exp(-x)]
1.0-
p
r
0
b
a
b
II b
f I*
r 0l~m
A.C:.2
!i".
-T7 -
.2.5 -1.5 -.5 0 .5 1.52.
Ability
4.1
-17
Several other models can be obtained by making minor modifications to
the basic three-parameter model. If the guessing parameter (c) is set to
zero, it becomes the two-parameter logistic model. If the discrimination
parameter Is also set to a constant, it becomes equivalent to the
one-parameter or Rasch model. If the cumulative-logistic function is
changed to a cumulative-normal function, it becomes a one-, two-, or
three-parameter normal-ogive model. Computationally, the logistic and
normal-ogive models are nearly equivalent. Scoring is simpler in models
with fewer parameters.
i l-18
* .4
from the item's distractors and always result in a unique ability estimate
(the three-parameter logistic models do not). They require several more
parameters for each item and, therefore, require somewhat more extensive
computational effort.
One goal of fair testing is to measure ability with equal precision at all
levels. Toward this goal, Samejima (1980) developed a constant
information model that provided a constant level of information (and thus
equal precision) over a range of ability. Theoretically, such items could
be combined more easily into an equiprecise test than items calibrated
according to other IRT models. Practically, it has yet to be-demonstrated
that the model adequately characterizes any useful item type.
-19-
of the posterior distribution (Owen, 1969, 1975). In the maximum-
likelihood strategy, it is usually done by maximizing the local Item
information (Birnbaum, 1968). Item information, because of its greater
ease of computation, is sometimes used in the Bayesian strategies too.
These two goals are accomplished, conceptually at least, by evaluating the
information or posterior variance expected to result from each item and
then selecting the best item. Practically, more efficient procedures (e.g.,
Vale S Weiss, 1977) are used.
Unidimensional IRT requires that all test items measure a single trait.
When batteries of tests are administered using unidimensional IRT, they
must consist of subtests, each composed of unidimensional items. Two
general approaches to the efficient administration of multidimensional
batteries have been taken: the fully multidimensional and the
semi-multidimensional approaches.
Order-Theory Strategies
-20-
(persons + items) X (persons + items) can be constructed from these
binary dominance relationships. By powering this dominance matrix, a
complete matrix of dominance relationships can be recovered from a partial
matrix of relationships, if a sufficient set of relationships exists. Given
sufficient random dominance relationships, the complete matrix can be
recovered. "Sufficient" will typically be more than a minimum number of
relationships, however. The goal in interactive testing based on order
theory is to complete the dominance matrix with a minimally sufficient set
of dominance relationships.
-21-
administered items. Specifically, he attempted to find, for an examinee, a
matching sequence of items In the archival database and then make
predictions about items yet to be administered. Kalish's strategy could be
quite difficult to administer efficiently because it requires a database
organized by unique response vectors and this could require a large
amount of storage.
Kingsbury and Weiss (1979) developed a mastery-testing extension of
Owen's (1969, 1975) Bayesian testing strategy. The testing procedure was
identical to that used for measurement (as opposed to classification).
However, the procedure terminated when the decision regarding whether
the examinee was above or below a cutoff could be made with less than a
specified amount of error. Administratively, this procedure is only
slightly more difficult to implement than the original Owen strategy.
-22-
Theft was a major concern in the area of machine security. The
suggestion was made that the testing systems be equipped with an alarm
and, further, that the storage media (e.g., discs) have some means to
prevent people from reading them on other systems. In addition, there
was some concern that examinees would "experiment" and press buttons
they should not touch or even pull floppy discs out of drives if they were
part of the testing terminal. Outright vandalism was not considered to be
a serious problem.
Item types. Most of the item types cited in the interviews were
discussed previously in the literature review. The only new type
mentioned was one with sequentially dependent keying. In such an item,
the correct response to an Item is dependent on the examinee's responses
to previous items. For example, in a medical simulation "hold for further
observation" may be an acceptable response unless the examinee has, in a
previous item, administered a drug that stopped the patient's heart. The
sequentially dependent item provides a distinct challenge in developing a
testing system.
-23-
A Questionnaire Survey of Potential System Users
A questionnaire was developed based on ideas obtained from the
literature review and through interviews with CAT researchers. The
questionnaire, reproduced in Figure 2, contained 28 questions grouped
into four sections. The first section, which contained nine questions,
focused on test characteristics and was to be answered by individuals who
are responsible for test development. The second section contained nine
questions and focused on test administration; answers to these questions
were solicited from individuals who were responsible, directly or indirectly,
for test proctoring. The third section, System Procurement, asked how
much a system should cost and what features it should have. It was
intended for individuals who were responsible for purchasing or ordering
systems such as this. The fourth section, Test Development, was
intended to assess the familiarity with computers of those individuals who
would be involved in the actual implementation of tests on the system.
A list of potential system users was assembled from various mailing
lists and lists of participants at conferences that emphasized the use of
computers in testing. The final list contained 108 names. Questionnaires
were mailed to all of these individuals. Each package contained a
questionnaire, a stamped addressed return envelope, and a perscnalized
cover letter explaining the project and the purpose of the questionnaire.
The letter asked that the questionnaire be returned within one week.
By the end of the third week after the original mailing, 40 completed
questionnaires had been returned. In addition, 10 individuals sent notes
or letters indicating that they did not feel that they should respond to the
questionnaire because they were not directly involved with testing or
because they did not want to influence the results of the survey. Five
more questionnaires were returned by the post office because the
addressees had moved. By the end of the third week, 55 questionnaires
had been accounted for.
A follow-up letter, along with an additional questionnaire, was sent to
the remaining individuals. This resulted in ten more completed
questionnaires. Analyses were performed on the 50 completed
questionnaires.
Individuals usually responded to some but not all sections of the
questionnaire. This was expected because not all sections applied to all
Individuals. Missing data were handled systematically for each item, but
the method differed from item to item. In general, there were two
methods of deciding If data were missing. Unless otherwise noted In the
text, if an Individual responded to any item within a section, s/he was
assumed to have responded to all items in the section. Thus, the
percentage of endorsements was computed using the number of individuals
responding to the section as the denominator of the fraction. The
exception to this was in the computation of means of numbers (such as the
length of a test) where the data were considered missing if the item was
left blank.
-24-
Test Characteristics. The first three items In the Test Characteristics
section dealt with test length. Forty-two individuals responded to the
first question about the length of their longest test. Lengths ranged from
16 to 1,000 items. The mean of the forty-two responses was 138. The
mean is not particularly informative for system design, however, because a
system must be able to accommodate unusual cases. Percentile points were
thus calculated for the seventy-fifth, ninetieth, and ninety-fifth percentile
ranks. For this calculation, percentile points were defined as the value at
or below which the specified percentage fell. Thus, a system designed for
the seventy-fifth percentile point would accommodate 75% of the
respondents. The seventy-fifth, ninetieth, and ninety-fifth percentile
points for the first question were 150, 250, and 329 items, respectively.
As with the means, these percentile ranks were calculated using only the
individuals responding to the item.
-25-
It a i1 .2v
01 - 4 ps
U. L) a Q 0. v-
311
0
ILI
cl; 00
~~~~-J
u~A U 4
IV I
Uis
I-I.
0,,
aa
I1I
6.o 0.a
InI
ES, 2~J
so HI I
rI_j1 Id6-
"I"
o''o
a! !3j
*J 10a
Twenty-seven percent said that they would use the two-parameter logistic
model and 71% said that they would use the three-parameter logistic model.
For the more sophisticated response models, 23% said that they would use
the nominal logistic model, 18% said that they would use the graded logistic
model, and 21% said that they would use the continuous logistic model.
Test administration. The second section of the questionnaire
addressed the practical considerations of everyday administration. Forty-
three individuals responded to this section.
The educational levels of examinees for which the system would be
used were primarily high school and college. One respondent said that
the system would be used at the pre-school level, 26% said that it would
be used at the grade-school level, 77% said it would be used at the
high-school level, and 65% said it would be used at the college level.
Forty individuals responded to the question about the desired length
of the delay between an examinee's response and the presentation of the
next item. Of these forty, 8% said a delay of over five seconds would be
tolerable, 33% said a delay of two to five seconds would be tolerable, 43%
said a delay of one to two seconds would be tolerable, and 16% said a
delay of less than a second would be tolerable.
Eighty-eight percent said a proctor would usually be available to assist
the examinee, 12% said a proctor would occasionally be available, and only
one respondent said a proctor would rarely or never be available.
-28-
Regarding system security, 91% said the system should have some
feature to prevent unauthorized use and 63% felt that the system should
encode the test information to prevent it from being read on another
computer.
System procurement. The third section attempted to determine what
features would be desirable, when weighed against their cost. This
section was designed to be completed by Individuals who were In a position
to buy or authorize the purchase of a system. System features were
listed along with their approximate price and respondents were asked
which features they would like to buy.
The first two questions dealt with basic systems. The first listed the
basic test administration system, capable of administering but not
developing tests. Four price ranges were listed (under $2,000; $2,000 to
$3,000; $3,000 to $5,000; and $5,000 to $10,000) and respondents were
asked to Indicate how many of each they would consider buying at each of
the price ranges. The second asked the same question about a system
capable of developing tests. Responses to these Items were rather
difficult to interpret because of some aberrant response tendencies. First,
some individuals responded with an X instead of a number; the X was
replaced with a 1 in these cases. Also, some individuals Indicated that
they would buy more systems at the higher price than at the lower prices;
In these cases it was assumed that, If they were willing to buy a certain
quantity at a high price, they would be willing to buy at least that many
at a lower price and their responses were adjusted accordingly.
In response to the first question regarding the basic system, the 38
respondents indicated that they would buy 41,240 systems at $2,000 or
less. However, they would buy only 4,208 systems If the price were
$3,000, and they would buy 3,532 for $5,000 each. Respondents Indicated
that they would buy 3,080 systems If the cost were $10,000. These
figures for the test development system were 3,4841; 3,4167; 3,056; and
3,0411. These values must be Interpreted with caution in both cases,
however, because a single respondent Indicated a desire to purchase 3,000
systems; this number overwhelms the responses of the remaining
respondents. The totals, of 1,240; 1,208; 532; and 80 for the basic
administration system and 1184, 4167, 56, and 411 for the test development
system may be more appropriate. The second set of figures shows a
sharp break as the system costs rise above $3,000 and may suggest a
target price for the system.
The remaining questions dealt with specific features. Seventy-four
percent wanted black-and-white line drawings, 29% wanted color line
drawings, 24% wanted shaded color pictures, and 53% wanted video tape or
disc pictures. Only 8%needed color text displays. For Input methods,
71% wanted a simplified keyboard, 32% chose an analog device such as a
joystick, 50% wanted a touch screen, and 8% wanted voice recognition.
Eighteen percent could use for a musical tone generator and 29% wanted a
voice synthesizer. Sixty-three percent wanted the system to support
-29-V
high-level language compilers, 410% needed word-processing capabilities,
and 13% thought an accounting system might be useful.
Eighty-four percent of the respondents indicated that they would be
willing to try the testing system if It were made available to them free of
charge.
Test development. The fourth section attempted to assess the
familiarity of potential test developers with computers. Three questions
were asked and 419 individuals responded.
The first question asked what types of computer systems the
respondent had used. Only 41%(two respondents) had never used a
computer. Eighty-four percent had used a mainframe computer, 67% had
used a minicomputer, and 67% had used a microcomputer.
The second question asked If they had written computer programs.
Six percent had not written programs; 80% had written package programs
(e.g., SPSS); 92% had written BASIC, Pascal, or FORTRAN programs; 33%
had written assembly language programs; and 16% had written courseware.
The final question described menu and author language methods of test
specification and asked which they would prefer. Thirty-nine percent
preferred the author language method, 29% preferred the menu system,
and 33% had no preference.
-30-
for ranked variables such as the amount of storage required. Features
were considered valuable If desired by at least 10% of the respondents.
Accommodation of Tests
Storage requirements. 'To determine computer storage requirements,
each word Is considered to be five characters long followed by a space.
The storage requirements neccessary to accommodate the item banks of the
respondents can be computed by multiplying the number of words by six.
The computed average length was 964,200 characters. The seventy-fifth,
nrinetieth, and ninety-fifth percentiles were 540,000; 1,080,000; and
7,200,000 characters, respectively.
Using the seventy-fifth percentile cutoff, the system must be able to
store a bank of 540,000 characters. This must be on-line storage because
it represents the item bank used by an on-line adaptive test. The
conventional tests described by the respondents are all smaller than this,
so there are no further requirements for conventional tests.* The system
will need sufficient additional storage for the system programs and scratch
files and this must be Included in the design. A minimum of 100,000
characters should be allowed for this.
The resulting figure of 6110,000 characters does not include storage
for graphic Items. Depending on the number of pictures and types of
compression possible, graphic storage could greatly Increase the storage
requirements.
Display requirements. Table 1 presents an analysis of the required
and desirable stimulus presentation characteristics for each of the Item
types, as determined from an Intuitive analysis of the Item types reviewed.
X's in the table Indicate the required characteristic and O's Indicate
desirable characteristics.
Six characteristics are considered and listed across the top of the
table. Character presentation means that the system will have to display
standard alphanumeric characters. Graphic display indicates the system
will have to present drawings. A timed stimulus Is one whose presentation
time can be controlled. A real -tie stimul us Is one that may have several
timed presentations. A dynamic stimulus Is one that has motion within a
frame rather than a series of timed frames. Timed and real-time are
considered to be on a continuum; If an Item requires real-time
presentation, It also requires timed presentation. An auditory stimulus
consists of voice, music, or any stimulus that must be heard.
The total number of X's and O's is presented at the bottom of the
table and gives some Indication of the need to Include each of the stimulus
characteristics In a multipurpose testing system. Of the 30 Item types
considered, 241 required character presentation of Items. Graphic
presentation was required by 10 and desirable for 11 more. Timed display
was required by 10 Item types and desirable for six more. Ten Item
-31-
Table 1. Stimulus and Response Requirements for Various Item Types
V U
- L u E
2i
I IS
Knowledge
Dichotomous X 0 0 0 X 0 0 0
Answer until correct X 0 0 0 X 0 0 0
Confidence weighted X 0 0 0 X 0 0 0
Probaballstic X 0 0 0 X 0 0 0
Free Response X 0 0 0 X 0 0 0
Cognitive Process
Memory X X X X X X 0 X
Concept X X X 0 X
Spatial Reasoning X X 0 X
Sequential Memory X 0 X 0 X 0 X
Simultaneous Memory X 0 X X 0 X
Multiple Item Access X' 0 X X 0
Perceptual -motor
Perceptual Speed X X X X 0 0
Perceptual Closure X X 0 X 0 0
Movement Detection X X X X X 0 0
2-Hand Coordination X X X X X X X
Complex Coordination X X X X X X X
Attention to Detail X X X
Spatial X X 0 X
Coding Speed X X X
Array Memory X 0 X X 0 X
Vector Memory X O X X 0 X
Visual Search X X X 0 X
Linear Scanning X X X 0 X
Matrix Scanning X X X 0 X
Choice Reaction Time X 0 X X 0 X
Simulations
Prestwood X X 0 0
Robinson 6 Walker X X X 0 0 0
Kneer X X X 0 0 0
Pine X X X O X 0 00 0
Non-Cognitive
Any X 0 X 0 0
TOTAL
Required 2q 10 10 10 3 1 28 0 0 2 0 0 0 16 2 2 0
Desireabie 0 11 6 0 1 7 0 0 2S 0 0 0 0 12 1 0 10
-33-
Timed response was required by 16 of the test forms and desirable
for 12 more. Real-time response was required by two and desirable for
one more. Dynamic response was required by the two perceptual-motor
tests and not desirable for any others.
Voice recognition of auditory responses was not required for any of
the tests. It was desirable for 10 of them, however.
Affordability
-34-
Test Development
The final section suggested that the potential users of the test
development system are quite sophisticated in computer use. Therefore,
either a menu or an author-language interface for test development can be
used and some programming knowledge on the part of the user can be
assumed.
-35-
~ -- -
11. DESIGN OF A SYSTEM TO MEET THE REQUIREMENTS
-36-
In addition to administering tests, the Minnesota system provided
extensive reports of the performance of examinees. While this feature is
unusual for a research system, it is essential for an operational testing
system. The Minnesota reporting system unfortunately suffered the same
shortcoming as the testing system -- the specifications for the reports had
to be coded in FORTRAN.
The need for a more efficient method of test specification soon became
obvious. When the Minnesota system was expanded and transfered to .
minicomputer, a somewhat more user-friendy test specification system was
developed (DeWitt, 1976). This system provided a test specification
program that allowed a non-programmer to develop a test by choosing
strategies and Item pools. This program did not offer any features for
revising old strategies or specifying new ones, however.
Unfortunately, little more can be said about the Civil Service system.
No complete description of the system was ever published and the entire
project was abandoned lust before its inauguration because of a change in
the mission of the Civil Service Commission.
-37-
flexible system but exemplifies the systems of its era -- it administered
adaptive tests but offered no flexibility in specifying stategies.
TCL. The concept of an author language is not new, having been
applied in computer -ass isted- instruction systems for years. The first
attempt to apply the concept to an adaptive testing system was by Vale
(1981). Vale developed a microcomputer-based CAT system primarily as
a microcomputer CAT demonstration system. Vale's system was designed to
provide (1) a general means of specifying test structures without resorting
to a standard programming language; and (2) a minimal delay (at run time)
between the response to an item and the display of the next item.
The first of Vale's design objectives was met through the development
of an authoring language called Test Control Language (TCL). TCL was a
form of programming language tailored to the task of specifying adaptive
tests. It included single statements to perform functions that would
require many instructions in FORTRAN. Bayesian scoring, for example,
was accomplished by a single statement.
Vale's second objective was achieved by compiling the TCL source to
a simpler executable form. The textual source instructions were converted
to single numbers. Furthermore, complicated statistical searches were
converted to table searches at compile time.
TCL also had some shortcomings, however. In TCL, the items of a
test were too closely tied to the test itself. For example, certain item
characteristics, such as time limits for response to the item, had to be
specified in the test itself as part of the test specification. One could
argue, however, that such characteristics are dependent on the nature of
the item, regardless of which test it appears on. Thus each Item should
have such characteristics specified at the time of item creation.
TCL also allowed branching to specific items within a test. This can
make a test specification difficult to follow and to modify. Whenever an
existing item is replace with another, the test developer must then change
all branches to and from the item accordingly.
Vale's TCL system provided no special item-banking tools. The
standard text editor was used. Each line had a single character code to
specify one of four record types. An ideal language would provide a more
user-friendly interface.
In general, Vale's system was a successful implementation of CAT on
a microcomputer. It provided a more modern system design that was
meant for microcomputers, and as such provided a good starting point for
the design of a computerized adaptive testing language and the associated
CAT software. It lacked, however, an Ideal design and adequate
documentation for commercial use.
-38-
CAl Systems
p Computer Assisted Instruction (CAl), or Computer Managed
Instruction (CMI), is a field that deals with many of the same issues as
CAT. CAl systems must present material to students in much the same
way that CAT must present items to the person being tested. CAI also
does a certain amount of testing in which actual test questions are asked.
It uses information obtained through this testing to determine the speed
with which new material can be presented to the student.
CAl systems offer some insights that carry over to CAT. One of the
strongest areas of carry-over is the area of authoring languages and
procedures. These are the languages and procedures used to build the
lessons that are presented to the students.
SILTS. SILTS (SILTS User's Manual), the Scholastic Interpretive
Learning/ Testing System, is a CAl system used primarily for developing
games and simulations. Its authoring system is of interest here primarily
because of its complexity of use. The basic unit in SILTS is a "node"
which is a file containing statements. Instruction in SILTS is authored by
developing nodes using an author language consisting of 37
single-character statements. "1P," for example, prints a line of text; "G"
causes execution to go to a specific node. An initial node could thus list
branches to a string of nodes which could each cause information to be
printed.
SILTS is apparently quite a capable system for developing games and
simulations. The single-character statements with little mnemonic meaning
make It difficult for the occasional author to remain familiar with the
language, however. A set of more mnemonic statements would be a
definite improvement. Fewer statements might also make it more
manageable.
MIL. MIL (Luker, 1979), the Minnesota Instructional Language, is a
CAI system with an authoring system implemented as a FORTRAN
preprocessor. It is essentially a version of FORTRAN that has been
enhanced by the addition of a set of macro- instructions to help the CAl
author. Instructions called key-match operators are examples of the
macro-i nstruct ions. These operators accept a user's response and
determine if it matches one or more literal strings. They return a value
of "true" or "false" which can be used In a FORTRAN logic expression.
MIL is obviously more efficient than authoring directly in FORTRAN.
FORTRAN, however, is not an acceptable authoring language because It is
too difficult to learn. MIL is even more difficult to learn because it
requires a knowledge of FORTRAN as a precursor. It thus does not
succeed in allowing test authors to avoid learning a programming language.
GENIS 1. Bell and Howell offer a commercial CAl package called
GENIS I (Bell and Howell, 1979). It Is composed of two separate CAl
-39-
systems. CDS 1, the Courseware Development System Is the higher-level
system. It consists of a menu-driven authoring subsystem and a
presentation subsystem. The menu-driven aspect of CDS I is fairly
simple. It Is essentially a fixed sequence of questions aimed at collecting
the required information. This repetitive and lengthy process may be
unappealing to an experienced author, however.
-40o-
Directions Suggested by Current and Past Systems
Two conclusions regarding the state of the art can be drawn from the
review presented. The first is that, until recently, little emphasis has
been placed on the development of a general-purpose CAT system for
developing and administering tests using a variety of stategies. None of
the systems allowed a test developer with minimal computer knowledge to
develop or modify an adaptive testing strategy. Vale's system represents
astep in the right direction but the sophisticated author Ing-language
systems used for CAl suggest that there is much room for improvement.
The second conclusion, drawing heavily upon the PLATO design, Is
that several levels of author interfacing may be useful. Testing strategies
may be impossible to specify completely without a complex author language
using concepts similar to those found In programming languages. Such
complex author languages may be too difficult for certain users to master,
however. A simpler, but somewhat less flexible, menu-driven system
should be supplied for use by these Individuals.
Test Construction
The test construction system is used to create a test to be
administered by the test administration system. It draws specific items
from the item banking system, incorporates them into the testing strategy,
and selects the information that should be recorded for test analysis and
interpretation. As was suggested by the questionnaire analyses and
literature review, the proposed test construction system includes facilities
for constructing tests either directly through an author language or
indirectly using a menu system to generate the author language.
Author Language
Tests may be specified directly using an author language called CATI
(Computerized Adaptive Testing Language). CATL is a language explicitly
designed for specifying adaptive tests.
Throughout the CATL language, mnemonic simplicity has been used to
guide the choice of statement names, labels, and variables. Statement
names were chosen to describe the functions they perform. Label and
variable names in CATI can be long alphanumeric strings as opposed to
the single-character or numeric names used In some languages. This
allows test developers to use descriptive names, which makes tests easier
to understand. Furthermore, the language was developed to allow test
specifications to be written in a modular style.
A special line-termination convention was used throughout CATI.
Many modern programming languages Ignore line structure altogether ~
that statements may continue from one line to the next without a
line-continuation character. Although this makes the language more
-41-
..........
flexible, It requires every statement to end with a statement -term inat ion
character. In Pascal, for example, misplaced statement termination
characters are the most common cause of program errors. For this
reason, a very simple line continuation was chosen for CATL: every
statement ends at the end of the line unless it Is continued on the next
tine with an & character. This is especially well adapted to CATL tests
since few CATI statements require more than one line.
Basic statements. The third group includes the basic statements for
inserting comments and for presenting an Item or group of Items. An
essential part of any programming language Is a mechanism for Inserting
comments Into a program to document Its design. The ! character Is used
in CATL to mark the beginning of a comment because its vertical nature
graphically separates the program from the comments. CATL allows any
number of characters following an ! to be used for comments. As In most
languages, a CATL comment (preceded by the comment delimiter) may fill
an entire line; unlike many languages' comments, however, a CATL
comment and its delimiter may follow other statements on the same line.
The end of a line terminates a comment. PL/l, Pascal, and several other
-42-
languages require both the beginning and the end of a comment to be
marked with a special character. However, if an end-of-comment marker
Is accidentally omitted, the instructions following the comment wilt be
ignored by the compiler. Allowing the end of a line to terminate a
comment eliminates the possibility of this error occurring.
In CATL the administration of an item is the fundamental operation.
The administration of an item is specified by putting the Item identifier on
a line and preceding it by the # symbol.
Item development and test development are separated to allow a single
pool of items to be used in many different tests; items are not modified at
the time a test Is constructed. However, item characteristics Included In
an Item may be overridden when the item is used in a CATL test.
Removing the specification of items and item characteristics from the
test-construction process simplifies the tests; allowing item characteristics
to be overridden provides flexibility for special applications.
It is often desirable to develop tests or item pools that can be
included in several different tests. An auxilliary Include operation (not
actually part of CATI) Is provided to accomplish this. The *symbol,
followed by a file name, allows information from another file to be copied
into the test. This statement avoids the need to keep copies of the same
information In more than one test.
Test ing-strategy statements. The CATL author language uses several
building blocks or strategy primitives to specify strategies. Complex
testing strategies can be constructed by combining these primitives with
scoring algorithms and conditional logic. The first primitive is post-item
branching. In a CATL test with no branching, administration of each item
in the test proceeds sequentially until the last Item (or line) In the test is
reached. Branching allows administration to continue at a different line in
the test, depending on how the examinee answers any of the previous
Items. Specifically, CATL permits branching to a different line In the test
depending on (1) whether an item Is answered correctly or Incorrectly,
or (2) which one of several possible responses to the Item is chosen.
The other two strategy primitives use pools of Items. The SEQUENCE
statement allows individual items to be grouped together as If they were
logically one Item. Each time the SEQUENCE statement Is executed, the
next item (starting from the one at the top of the sequence) is
administered. The SEQUENCE Is terminated by an ENDSEQUENCE
statement. Item characteristics may be overridden by Items within a
SEQUENCE statement In the same way that they are overridden In a #
statement. The SEQUENCE statement Itself may contain only branching
Information.
Although the SEQUENCE statement is general enough to have several
uses, it is especially useful for implementing inter-subtest branching
strategies. In these strategies, branching Is based on an item response.
-43-
CATL uses the same syntax for branching on the SEQUENCE statement as
It does to branch to individual items. The Implementation of branching
strategies In other testing languages such as TCL (Vale, 1981) required
pointers to the Items in a test. Pointers within a program are very
difficult to implement, and as a result they have been avoided in modern
general-purpose languages. CATL's SEQUENCE statement automatically
keeps track of which items in each SEQUENCE statement have been
administered, so no pointers are needed.
* -44-
* L
scoring routines that will be used and the variables that will be used for
the scores. The variables are automatically updated with values from the
scoring routine whenever they are used. In a general-purpose language,
the scoring routine would have to be called explicitly each time the
variable was used, requiring many extra program lines. This has been
avoided In the design of CATL by making the CATL system responsible for
determining when to update the scores. Thus, the CATL system keeps
track of when scores are to be updated and updates them only when they
are used. Scores are not unconditionally updated after every response.
To provide an eisy mechanism for ending tests when the given
conditions are met, CATL uses the TERMINATE statement. This statement
was intended primarily for tests using model-based branching strategies
that administer Items until the estimate of the examinee's ability becomes
sufficiently precise. The TERMINATE statement may be used in
conjunction with any other testing strategy, however. Thltz makes the
specification of termination conditions more flexible than it would be If it
were built directly Into the SEARCH or SEQUENCE commands. The
design of the TERMINATE statement is consistent with structured
programming doctrines. Although this type of statement is uncommon In
programming languages, It is similar to the structured BREAK statement in
the C programming language.
-415-
Item Banking
The heart of any testing system Is the set of available items. The
process of entering, maintaining, and ordering these Items is known as
item banking. First, an item must be created: It must be named and
described, and the body of its text must be entered Into the storage
system of the computer. After it has been created, it may still undergo
changes. The author of the item may wish to alter the wording of the
text, change part of the item's description (e.g. parameters) or otherwise
modify the Item. These alterations should be made without the need to
re-create the entire item. Making the creation and editing processes
simple and similar to each other Is an advantage for the user.
Finally, the item banking system must be able to support specialized
items. Such special items allow the author to make use of sophisticated
options for a CAT system (e.g., videodisc displays and graphics). Here
again, item banking deals with the entry and editing of such special items.
Item Classification and Storage
Underlying the entry and editing process for individual Items is the
file system that supports the organization of the items Into useful
collections or pools. Items are typically grouped together according Io
some criterion, usually pertaining to the subject matter of the item.
Without such grouping, it would be difficult to manage the many items
created by test developers.
Each Item needs a unique identifier. In most item banking systems
this consists of a content-area identifier and a unique Item number within
the area. Previous banking systems have used content identifiers based
on mnemonic codes, Dewey decimal codes, and simple descriptive codes.
For banks of small to moderate size, a simple descriptive code Is probably
best. For example, such a code might be six letters long. The Item
identifier might then be six letters followed by a number for identification
within that area.
Closely related to the classification and naming of items are the
mechanisms for Item storage and retrieval. The item name provides the
first link In the search and retrieval of an Item. An on-line directory of
item names and storage location addresses typically provides pointers for
use by programs that need access to the Items. It is used In much the
* same manner as a telephone directory. For small systems, large master
directories are not always feasible, however. Limited disc storage space
makes It Impossible to maintain lists of thousands of Item names and
corresponding disc addresses.
The storage technique that was developed for the proposed CAT
system attempts to make as much use as possible of the underlying
operating system's file storage facilities. Directories of files are
maintained by the operating system for the organization and access of data
-46-
files. When applied to item banking, these directories can be used as the
f irst level of organization of an item banking scheme.
To take advantage of the operating system, then, the name of the
item must contain, In part, some Information pertaining to Its location.
This can be readily accomplished by implementing a scheme that uses the
first n characters of the Item name as the filename of the pool in which
this fitem is stored (where n is less than or equal to the maximum allowable
filename size for the given system). For most systems six characters are
allowable. Very large Item banks or very complex naming schemes may
require more complex software algorithms for efficient storage and
retrieval. For the proposed system, the file name approach should
suf fice.
The remaining characters of an item name are then used to locate an
item within the pool. This is accomplished by using the directory that the
Item bank maintains at the beginning of each file. For example, to find
the item named VOCABUOO1, the first six letters are used for the filename
(thus the file "VOCABU11 is opened) and within that file, its directory
contains the remainder of the name, "1001."1
Items themselves are stored with both header and body together as a
single record. The header contains the various Item characteristic
parameters and other flags thnt Indicate the type of the item and its
various execution requirements. The body of an item contains its textual
Information. Since the body of an item can be of any length, the Item
records that make up a file of Items must be variable length records.
Special Items
Videodisc and graphics displays were the two most desired display
options for the CAT system prototype. To support these options, the item
banking system must Include in Its design features that facilitate entry
and editing of these options.
Videodisc. The videodisc option to the proposed system will be
implemented using a standard serial Interface to the videodisc and an
auxiliary TV monitor beside the video screen/keyboard assembly.
Videodisc pictures will be displayed on the auxiliary monitor while any
explanatory text will appear on the CAT video screen. More extensive
features for combining videodisc and computer displays were considered to
be beyond the scope and price constraints of this system because of the
technical complexities Involved in mixing computer and videodisc signals on
a single screen (see Belar, 1982).
Three basic functions are supported with videodisc: (1) displaying a
single frame, (2) displaying a range of frames, and (3) looping through a
range of frames repeatedly. If an item uses the videodisc, three
additional fields are used In an Item's header. These fields are: starting
frame number, final frame number, repeat flag. If the starting frame Is
44.
zero, then this item does not use the videodisc option. For displaying a
single frame, the start and end frame numbers are Identical. The repeat
flag, if TRUE, specifies that the range of frames from start to end should
be displayed over and over until the response time expires or the question
is answered.
Graphics. Graphics are stored in the body of an Item. The header
must contain a flag that, If set, indicates that the body of the Item
contains graphics.
If point graphics are to be used, the item author needs the digitizer
option for the system when the Item Is being created or edited. (Systems
for administration of graphics items need not be equipped with such a
digitizer.) One form of digitizer consists of a specially wired pen and pad
arrangement that allows the user to move the pen across the pads
surface. The x-y position of the pen is then sent to the CPU. Special
programs for the entry and editing of graphics with the digitizer are
readily available. Such a program will be used for graphics editing on the
proposed system and Its output will be entered as t..e body of a graphic
item.
Alternatively, graphic Items can be stored as control Instructions for
a graphic terminal. In this case, graphics items can be stored as textual
Items. No graphics system software needs to be developed for this mode
of entry.
Creating and Editing Item Text
The interface most frequently used in item banking performs the
functions of entering and editing an Item. It should be as user-friendly
as possible. The most efficient and user-friendy editors are screen-
oriented editors. A screen editor displays a portion of a text document
and allows the user to make changes In the text that are immediately
apparent on the CRT screen. Since the CAT system prototype will have
a CRT screen, the use of the screen's features for editing should play
an Important part of the design of the editor.
A combination of screen-oriented approaches was taken In the
proposed system. First the Item header Information (inserted Into a
number of small data fields) is requested in a fill-In-the-blank mode. The
author can see what Information Is being requested and can fill In any or
all of the appropriate information. The fill-in approach seemed most
appropriate for headers since they are generally of a fixed format. This
fill-In-the-blank approach has been used In systems such as Plato (Control
Data, 1978a, 1978b). The question-answer approach has been used by some
(e.g., GENIS I by Bell & Howell, 1979) but this Is often too slow for
the experienced user. The fill-in approach displays almost all the
various fields at once, allowing the author to see what Is required,
giving the whole picture and thus making the system easier to use.
-48a-
Once the header has been completed to the author's satisfaction,
editing shifts to a free-format approach for the body of the item. Here
the screen-oriented editing features refered to above are used.
Test Administration
The test administration system needed by most users must perform
four functions:
1. It must instruct examinees how to use the testing system and, in
general, how to respond to items.
2. It must display the items to the examinee.
3. It must accept responses to the items, edit them for proper
format, and score them.
4. It must pass information to a monitoring system that can keep a
proctor informed of each examinee's status.
Instructions
A basic part of test administration is teaching the examinee how to
respond to the test. In paper -and -pencilI testing, the instructions are
read by a test proctor as the examinee follows along in the test booklet.
A computerized testing system can free the proctor from this duty and
interactively administer instructions to each examinee, providing
clarification where it Is required. As DeWitt and Weiss (1974) pointed out,
computerized Instruction must be capable of branching on the basis of
examinee responses. By branching on a response, the system can produce
instructions of the type shown in Figure 3. Examinees who have
previously taken computerized tests or who read and comprehend the
instructions quickly can begin sooner than other examinees. Those who
require more extensive explanation receive the specific Instructions they
need to start the test.
Branched instruction is, in a sense, simply a form of adaptive testing
in which the Instructional screens are treated like (unscored) test Items.
A computerized testing system capable of branched adaptive testing can
thus perform the Instructional sequence as a type of test at the beginning
of the session. The proposed system Is designed to handle instructional
sequences In this manner. Instructional screens will be entered as Items
and the instructional sequence will be specified as a branched adaptive
test.
Item Presentation
-49-
Figure 3. Sample Instructional Sequence
This is a computerized
adaptive test. Have you
ever taken this type of N #
test before? Answer Y
or N pyes) ano).
Computerized adaptive
tests are designed to
provide a more accurate
estimate of your ability
than can be obtained from
traditional paper -and-pencil
tests. In this test, you
will be asked to answer a
number of questions. Some
will be multiple-choice
questions. Others will ask
you to type in some short,
free-response answers.
Y
Do you have any questions?
Answer Y (yes) or N (no).
N(If you answer yes, a proctor
will assist you.)
Call
Proctor
audio or visual modes, visual being the more common of the two. In
designing an item display, both the display information and item structure
must be considered.
t -51-
7
The proposed system will support HELP and PASS function keys.
-52-
screen. For others, svich as items displayed for a fraction of a second, a
separate response time. should be provided. In the proposed system,
stimulus and resporse times will be contained as characteristics of the
items.
Examinee Monitoring
Even though the system software Is designed to be as user-friendly
as possible, it may still be necessary to provide personal supervision and
assistance. An exam inee- monitoring system (cf. Prestwood, 1980) would
allow this. This assumes there will always be a proctor present at the
monitoring station; responses to the questionnaire suggested that this
assumption is valid.
The proctor will, at a minimum, start the test from either the
monitoring station or the exam inee's terminal. The proctor should also be
able to stop or interrupt a test (pause) without losing the response data
accumulated to that point.
As mentioned in the previous section, an examinee could signal the
proctor for assistance with the HELP key at any time. If a proctor's
terminal was not available, the examinee's terminal could beep or flash. If
a proctor's terminal was available, the help-needed indicator could flash at
the monitoring station until the proctor responded. A proctor's terminal
could also be used to track the examinee's performance. This would allow
the proctor to determine if the examinee needed help and would allow the
proctor to see where in the testing sequence an examinee was.
Regardless of whether a proctor's terminal was available, the
monitoring system could constantly evaluate the examinee's response
pattern to detect problems such as "sleeping", random responding, and
coached responding (Prestwood, 1980). Algorithms are available for
detecting some of these problems and should be useful in informing the
proctor of problems s/he might otherwise fail to detect, especially when
large numbers of examinees are being tested.
The basic concerns of the system design with regard to the
monitoring system are the design or selection of the error-detection
algorithms and the provision of communication between the testing stations
and the proctor's station. The former has been provided elsewhere (cf.
Prestwood, 1980) and the latter is dependent on the hardware
configuration. The proposed system will provide capability for
error-detecting algorithms and will, as an option, support the proctor's
console.
Test Interpretation
-53-
well-formatted presentation of the test res-ults. Interpretation is not a
trivial task. The user must be able ti ,,eclfy detailed instructions for
the interpretive function to use. However, these Instructions should be
easy for the user to understand and/or specify.
[#ABL001
(BSCORE > 2.0)
High Ability
#ABL002]
In this example the module's name is #ABLOO1, the first name after the
opening brackets. If BSCORE exceeds 2.0, the text High Ability is
printed. The last line of the module calls another module, #ABL002, which
may present additional information. That message, as well as the one In
the module above, could have been several paragraphs in length.
-S4-
Test Analysis
-55-
probability that an examinee with very low ability would answer it
correctly.
Unlike the conventional analyses, IRT item analyses are
computationally burdensome and may require a good deal of computer
memory. There are several methods for estimating these item parameters
including (1) a set of transformations from the conventional item statistics,
(2) a method that minimizes the lack of fit to the IRT model, and (3) a
maximum-likelihood approach that maximizes the joint likelihood of a set of
item and ability parameter estimates given the observed data. The
transformational approach Is computationally the least taxing but does not
yield particularly good estimates unless a fairly strict set of assumptions is
met. The maximum-likelihood approach is, theoretically, the best of the
three approaches but requires much computation and a large amiount of
memory for intermediate storage.
IRT test analyses are, in some ways, functionally similar to the
conventional analyses. The primary IRT test statistic is the test
information function that is proportional to the inverse of the squared
conditional standard error of measurement. From this function, the test
reliability can be estimated for any specified population of examinees.
Alternatively, the information function can be averaged over the examinees
in the population. Computation of the information function is relatively
simple once the item parameters have been estimated. Other test statistics
of possible interest include the means of the item parameters and their
standard deviations.
The IRT item analysis capability will be available on the extended
system. The method of analysis will be the method of maximum likelihood.
This calibration capability will be available for dichotomous items using the
three-parameter logistic model. The proposed basic system will not contain
any IRT item analysis capabilit; as because it will not have sufficient
computing power nor will it have disc storage with sufficient capacity or
access speed.
-56-
Ill. DESIGN OF A SOFTWARE PACKAGE TO IMPLEMENT THE SYSTEM
Test Construction
Template Processing
Templates are test-specification shells written in CATL. They differ
from complete test specifications in that they contain holes marked by
special placeholders indicating the positions for Item Identifiers and other
pieces of Information. A template can be "filled In" by the test developer
to produce a complete test specification. This test can, in turn, be
consolidated Into an executable test file and then administered. This
filling In, or preprocessing, of templates allows a novice user to construct
tests easily. It also provides a major convenience for those test
developers who need to construct a variety of tests that use the same test
strategy. For more information concerning the creation of tests using
templates, as well as Information on the templates themselves, see Chapter
5 and section 8.9 of the draft User's Manual (Appendix A).
The design of the template processor Is shown In Figure 5. !t
depicts the five major functions of the preprocessor as a hierarchy. The
functions In this hierarchy, which Is a system design document, can be
translated Into procedures that operate as described below:
Input and output. The preprocessor, called Fill-Template in
Figure 5, reads through the test-template (see Figure 4) using the
procedure Read-Source. Read-Source examines each line to see If It
contains any of the special template constructs (i.e., INSTRUCT statements
or blanks to fill In). If no menu statements are found, the line Is
written In Its original form to the test -specIf icat Ion file (Figure 4)
via the procedure Write-Test.
-57-
04
04
IA0
~(
~
LU IEL ~
CLC
Figure 5. Template Processor Overview
Fill-
Template
Single- Muple-
Entry Entry
-59-
Typically, item identifiers will be Input to the preprocessor, although the
system does not restrict the use of underlines to item identifiers. Any
type of input will be accepted, inserted at that point, and written to the
test file. It is up to the template author to provide sufficient INSTRUCT
statements to explain the input requirements.
The procedure Multiple-Entry is called to handle the repetitive case.
Here again the user's input replaces the underline characters, and the new
line Is written to the testfile. However, in contrast to Single-Entry, the
preprocessor does not move on to the next line In the template file.
Instead, the same line is repeated: New input is requested from the user
and, if supplied, replaces the underlines. This new line is written to the
test file. This process continues until the user responds to the input
with a special function key to terminate It.
-60-
Test Consolidation
Test consolidation Is a process very similar to program compilation
that is used to minimize the processing required at run time. In the
consolidation process, special processing Is performed on SEARCH lists to
produce a table for Item selection at run time, Item Identifiers are
converted to Item addresses, and the CATL source statements are
converted Into a shorthand form that can be quickly processed and is more
compact than the source test specification. Once test consolidation is
complete, the test is ready for administration. 'As such it can, if desired,
be transferred to a smaller system for administration.
Consolidation is a three-stage process. The first stage converts the
CATI source statements of a test into the executable shorthand. It also
produces tables from the SEARCH lists. The second stage selects the
Items that are used In a test from the global Item bank (or banks) and
creates a file containing only those selected Items. The third stage makes
a final pass on the executable shorthand code to resolve branching
addresses (i.e., to fill in addresses at the branch- Initiation points that
were not known when that part of the source was consolidated on the first
pass). It also replaces item references with file addresses (from the file
created In stage two). The design for the test consolidator Is depicted
graphically in the hierarchy chart of Figure 6.
Stage-One. Stage-One Is a parser for CATL statements. To parse a
statement, the source statement from the test-specifi!cation file Is
translated Into the shorthand executable form. When the statement has
been parsed, the executable statement Fs written to the executable test
file. This parsing Is accomplished through a combination or two
well-established parsing techniques. Most of the statements are parsed via
a recurs ive-descent procedure. The parser design follows very closely
the BNF (Backus-Naur Form, a system for formalizing syntax) definition of
the language. (See Appendix B for a partial BNF description of CATL.)
For example, a CATL test consists of the following sequence: (1) the
keyword TEST, (2) declarative statements, (3) executable statements, and
(41) the keyword ENDTEST. Since the outermost syntactic unit is a test,
the Stage-One parser attempts to find the four syntactic parts that make
up a test. First it must find the keyword TEST. Finding that,
Stage-One simply writes the appropriate keyword to the executable test
ffile.
Next must come the declarative statements (see section 8.5 of the
draft User's Manual In Appendix A). Since declarative statements can
themselves consist of many parts, it is appropriate that Stage-One call a
separate module to parse declarative statements. Hence,
Parse-Declaratives is required.
-61-
- U
ccc
I-I
>. E
CL
Su
I.IA
U.C
>11
<statements>. Again this is a complex structure, so a separate module,
Parse-Statements, is called to parse <statements>.
The process described above follows directly from the BNF and
exemplifies the design for Stage-One and its subordinate routines. The
Parse-Declaratives and Parse-Statements modules can be designed in a like
manner: Keywords are handled within the module for the corresponding
syntactic element; more complete structures are handled by subordinate
modules.
The second form of parsing used within the first stage of the test
consolidator is operator-precedence parsing. This approach is used to
parse logical and arithmetic expressions. Operator precedence is used
here because is it especially well-suited to parsing arithmetic and logical
expressions -- it can use the precedence and associativity of the various
operators to guide the parse (Aho & UlIman, 1977).
-63-
test administration. This approach makes the executable test a wholly
separate entity, removing the restriction that the entire Item bank be
on-line during test administration.
The Read-IDB procedure locates the specified item in the Item data
base (IDB). It returns a file pointer/disc address for the location of that
item in the IDB. Copy-Item takes that address and copies the item into
the file of items for this test. It creates that file, if necessary (as on
first call) and returns the address of the item in the new item file.
This process is repeated for all items in the table of item numbers.
When all items have been copied, Stage-Two is complete.
Summary
-64-
_ _ _-6-
file. The executable test file can be used with the test administration and
test analysis subsystems.
Item Banking
Test Administration
The Test Administration subsystem consists of the programs used to
administer a test to an examinee. At the heart of the test administration
software is the test executor, which interprets the test code, initiates item
displays, accepts responses, and invokes scoring procedures. That is, it
administers the tests created by the test consolidator, and through
interaction with the examinee, creates a file of test results. These results
can be used with the test analysis or test interpretation subsystems.
The executor software administers the test by executing the
executable test file produced by the test consolidator. Figure 7 shows
these functions as logical modules arranged In a hierarchy, with the
highest functions on top and subordinate functions below. As is the case
with all designs in this report, Figure 7 represents only the upper level
of the design.
I~ ..
D (A-
Z. t
>
w
U'<
0 Lu
01
2 c
LUL
<c
1 21
0 41
1 0 C 2 E
OF
Fetch-and-Dispatch
Nest-Modules
Each test must begin with a TEST instruction, which can In turn
signal the beginning of a subtest nested within the main test. When a
TEST instruction is encountered within a test, the status of the test
currently executing must be stored, and the set of variables appropriate
for the new test must be activated. When an ENDTEST instruction is
encountered, the opposite sequence must be performed: The variables of
the current test must be inactivated and the status of the previous test
must be restored. This is referred to as module nesting and Is
accomplished by the procedure Nest-Modules.
Assign
Assignment is accomplished by the Assign procedure, which is
invoked with the SET statement. A SET statement consists of an
expression and a variable name. The expression is evaluated and the
variable is set to the result by Assign.
Declare
Present
-67-
Override-Characteristics replaces the original values stored with the item
with the new values.
Sequence
Search
Search finds the item that will yield the most psychometric information
given the current score status. To do this on an exhaustive, sequential
basis would require too much computer time. Instead, the consolidator
organizes the items in the search pool by trait level. Each trait level
corresponds to a sequence of items. The executor's job is thereby
reduced to determining a trait-level category and calling the proper
sequence. The sequence used here is processed by calling the procedure
Sequence.
Skip
Conditional execution of sections of CATL statements is implemented
by the Skip procedure. Skip controls the flow of execution through the
IF, ELSEIF, and ENDIF statements. The conditional Instructions are
followed by a boolean expression to be evaluated as true or false. If the
expression is true, execution continues on the next line; otherwise
execution is directed to another location. Evaluate-Expression evaluates
the expression, and Set-Next-Instruction sets the location at which
execution will continue.
Keep
The Keep procedure writes variables to a file for later use. The
KEEP statement includes a variable list to be preserved. When executed,
the Keep procedure writes this variable list to the file. To do this, it
-68-
calls Update-Scores to update any score it will use and calls
Format-And-Output to write the data to a file.
Test Interpretation
-69-
1.n
411
00
Lu
0 -541
L4
I..z
4E1
a-x
Open-Module is called when a left bracket is found, indicating the
beginning of the module. It reads the module name, if one Is provided,
and puts It into a table along with the module's address in the executable
interpretation file so that it can be called by name by other modules.
Parse-Expression converts the logical expression at the beginning of
the module to RPN form and stores it in an expression buffer. If no
expression is provided with the module (an acceptable condition), a logical
expression equivalent to "true" is inserted in the buffer so the module will
be executed every time it is encountered.
-71-
file is flagged to prevent it from being executed. Otherwise, the file Is
ready to be executed.
Interpretation Executor
The Interpretation Executor, diagrammed In Figure 9, Is somewhat
simpler than the compiler. The main procedure, Execute- Interpretation,,
has three subordinate procedures: Open-Module, Process-Module, and
Close-Module. The internal logic of Execute- Interpretation causes it to
execute these three procedures sequentially. Since the modules can C311
each other, however, Execute- Interpretation may be called recursively
from the procedure Process-Module.
Execute-
Interpret.
-72-
theen-Module. To allow the remainder of a module to be skipped if
theassciaedexpression is evaluated to be false, the location of the end
of the module must be known when execution of the module begins. This
information is provided in the executable interpretation file by the
compiler. The nature of the interpretation language provided allows
nested modules, however. When a left bracket is encountered indicating
the beginning of a module, the end location of the module currently
executing must be stacked so that the module can be returned to. The
function of Open-Module is to stack this Information from the current
module, if one Is executing, and to initialize the next one. Complete
stacking of the module is not required because the information prior to the
module call has already been processed and will not be needed again.
Process-Module. Process-Module sequentially processes the
interpretive information in the module. It uses two subordinate
procedures to accomplish this. Evaluate-Expression evaluates the module's
logical expression. If it is true, Process-Text is called. If it is false,
control returns to Execute- Interpretation to close the module.
Process text has two subordinate procedures: Output-Text and
Call-Module. Output-Text processes the text character by character.
Standard characters are printed. Output control characters are translated
into their appropriate functions. Score-printing characters cause a score
to be printed. Call-Module generates a recursive call to the main
procedure, Execute- Interpretation. Stacking of the required information is
handled in that procedure.
Close-Module. Close-Module performs the reverse function of
Open-Module. it unstacks the information from the previously executing
module and continues where that one left off.
Summary. The interpretation system designed for the proposed
system consists of a compiler and an executor. A very simple
module-processing language is used to specify the Interpretive output that
is produced from a test record. This language Is compiled to executable
form and executed by the interpretation executor. The output produced
may contain textual and numerical information about an examinee's test
scores.
Test Analysis
-73-
the system. They P i basically statistical programs and no system details
for these programs wre provided here.
Summary
-714-
IV. SELECTION OF COMPUTER HARDWARE
-75-
Images on computer screens are composed of many small dots called
pixels (short for picture elements). Color images are composed of
combinations of pixels in the three primary colors. On black-and-white
computer screens, shaded images may be displayed by using pixels of
different intensities; this is called a gray scale.
Hardware Requirements
Hardware requirements were drawn from two sources. The first was
the review of existing Item types and testing strategies. The second was
the survey of potential system uses. Both sources of information were
used to determine the minimum hardware requirements and the optional
hardware configurations desired for an adaptive testing system.
Cost
The minimum hardware configuration was designed to cost less than
$3,000. List prices were used to compare the costs of different types and
brands of hardware. Because most hardware manufacturers offer quantity
discounts or governmental and institutional discounts, and because the
cost of computer hardware is decreasing rapidly, a twenty-percent factor
was added to the $3,000. In the evaluation of hardware, it was assumed
that any computer hardware available at the time of the evaluation with a
single-unit list price lower than $3,600 would be available for $3,000 by
the time a prototype system was developed, especially if discounts were
applied.
Display
Several survey questions were designed to determine the type of
display required for adaptive testing systems. Eighty percent of those
responding indicated that they needed a system to display black-and-white
line drawings In addition to text. Less than one third were interested in
color displays, moving displays, or a system that could only display text.
Forty percent were interested in shaded drawings, and 53% were interested
in a videotape or videodisc interface. Of the 30 testing strategies
reviewed, 24 required a character display and 10 required a graphic
display. A videotape or videodisc interface would be required for the
three strategies that require dynamic display. To meet these needs, the
display on the minimum configuration should include a text and graphics
* capability with a resolution from 256 x 256 to 512 x 512 pixels (dots).
Options should Include gray-scale display for shading and a video
Interface.
Input and Output Devices
Several survey questions dealt with the input devices needed for the
system. Seventy-one percent of the researchers wanted a simplified
keyboard, 32% wanted a Joystick or trackball, and 50% wanted a touch
-76-
screen. There was very little interest in voice recognition as an input
device. Most of the researchers indicated that the system would receive
unusual wear and tear or abuse by its users. Sixty-seven percent
thought that the buttons would be pressed harder than necessary. Since
users have more contact with the input devices than with any other part
of the system, the input devices should be especially durable.
Twenty-eight of the testing strategies reviewed require a keyboard,
although a touch screen could be substituted for a keyboard in most
strategies. Two strategies require a joystick. Voice recognition was
considered desirable, but not necessary, for any strategies. A standard
keyboard would be required, at least for test development. Special
function keys (eg, a numeric keypad) may be able to simulate or replace a
simplified keyboard, but a joystick may still be required on many systems
for analog input. The minimal configuration should include a durable
keyboard with special function keys. Options should include joystick,
touchscreen, and simplified keyboard.
Network Capability
Secondary-Storage Requirements
-77-
____________________________
I____N _____"___....________I_____
STORAGE = (max. items x (words per item x 6 + parameters))
+ fixed storage
Performance Requirements
Summary
1. Character display
2. Black-and-white graphics
3. Durable keyboard
4. Simplified keyboard or function keys
5. Ability to run compilers and other general software
6. Ability to connect to a proctor's station
7. Disk storage approaching one megabyte
-78-
system costing less than $3,000 per user, although this implementation
would make systems for a small number of users more expensive. Large
systems would also be very expensive for test development where only one
console is required. A microprocessor-based system capable of supporting
a single user can cost less than $3,000. Microprocessor-based systems are
portable, and they do not require the air conditioning and special facilities
that larger computers require. Furthermore, the addition of testing
stations does not degrade the performance of the system, nor does it
affect the other stations if one console breaks down. A review of the
state-of-the-art in computer hardware by Croll (1980) showed that both
microprocessors and minicomputers were capable of supporting CAT
interactive testing and monitoring functions.
Thus, the option to build from components has very high overhead
and a long lead time, and results in a product that is functionally
equivalent to existing products. In addition, the product would have a
lifetime of only a few years before it became obsolete, unless funds were
allocated to constantly redesign it.
The single-board computers that are sold in a case with disc drives
usually do not contain any kind of bus. If a system has a bus that is
accessible through a non-standard connector, then only the company
manufacturing that board can provide expansions. Some companies offer
many options to expand their system, but many companies do not.
-80-
data base processing, and graphics) are the same as those available on
general-purpose microprocessor systems. There are very few items that
could be eliminated for an adaptive testing system. In addition, many
personal-computer manufacturers have established service centers in major
cities across the country, so repairs can be made quickly.
-81-
Webster's Microcomputer Buye r'sGuide (Webster, 1981) has a very
good list of systems and very detailed information about their options
and costs. However, there were two problems with the information in this
guide. First, because it was published in 1981, many of the newest computer
systems were not listed. This is a serious shortcoming, because the
newer systems are more powerful and less expensive than the older
systems. Second, some of the information listed was already out of date;
this was especially true of prices.
One of the most critical requirements for the hardware was that it
cost less than $3,000. Because the price of computer systems is
decreasing, and because rules of supply and demand often cause price
fluctuations, systems within 20% of $3,000 were included for initial
consideration. Although an attempt was made to seek information only on
those systems that appeared to cost less than $3,600, many of the systems
about which information was received were clearly too expensive. (For a
complete list of the systems eliminated. because of cost, see Appendix C.)
-82-
1024 x 768 pixel (dot) screen. 1? includes graphics software and has both
joystick and light pen options. " his system could be used to administer
tests with almost photographic illustrations in the items. Unfortunately,
very few testing applications can justify the cost of $20,000 per station.
-83-
ii1I
least two manufacturers. The minimum cut-off points used for storage
were 438K for RAM and 300K for disc storage.
Many of the systems with more than one configuration had an
unacceptable configuration for less than $3,600 and an acceptable
configuration for more than $3,600. In these cases the acceptable
configuration was considered and eventually eliminated because of cost.
Several systems were eliminated because they lacked serviceability or
flexibility. Most of the systems eliminated in this category can be
characterized by two properties. First, they are marketed directly by the
manufacturer and do not have dealers or local service centers. If any
piece of the system fails, it has to be shipped back to the factory for
repair. One manufacturer recommended that four weeks be allowed for all
repairs. Systems of this type might be acceptable for organizations that
do their own repair work. Second, to minimize costs, most of these
systems have all the electronics on one board, and they do not have a
bus. A bus allows optional RAM, discs, or other devices to be added to
the system. The lack of a bus decreases the flexibility of the system by
limiting the number of available options.
However, many of these systems have a better cost-per-value ratio
than most of the products on the list of available computer systems. This
is because their specific inflexible design allows a high optimization by
removing all unnecessary parts. Were it not for the great need for
flexibility and serviceability, one of these systems might have been
recommended for acceptance.
Systems That Were Acceptable
The systems in this category met all the minimum requirements for
adaptive testing applications. All of these systems were compared to
choose the ones with features that are most desirable for adaptive testing
applications.
Of these acceptable systems, some were found to be less desirable
than others. The Radio Shack Microcomputer Technology Mod Ill has only
16 lines of text display and lacks the sophisticated graphics capabilities of
the other systems. The Xerox 820 was designed to be a word processor
and therefore lacks the flexibility of some of the other systems considered
in this category. The Apple 11 computer has only 48K of memory, a
slower microprocessor, and limited disc space compared to the others.
The Tetevldeo Systems TS802 is adequate in many respects, but lacks the
flexibility and graphics capabilities of the systems finally selected.
Three systems remained that met all or most of the criteria described
as Important by the survey respondents. These systems are described in
detail below. (The order of the subsections does not indicate any ranking
or preference.)
-84-
Intertec Data Systems SuperBrain. The Intertec SuperBrain, like
many small computers made by large manufacturers, is available in a
number of configurations. The configuration of greatest interest to this
project is the SuperBrain QD, which, like most systems examined, contains
64K RAM. It also has 700K disc space on floppy disc, a distinct
advantage. The SuperBrain uses two Z80 microprocessors that will also
run programs written for the 8080 microprocessor. The SuperBrain comes
with the CP/M-80 operating system and utilities. More general-purpose
software is available for the CP/M operating system than for all other
microprocessor operating systems combined.
The SuperBrain also has several hard disc options from which the
user can choose, including the SuperFive and SuperTen, which have five
and ten megabyte Winchester hard discs, respectively. Larger hard discs
for the system are available from Corvus.
Because it has the S-100 bus adaptor, it can use any of hundreds of
external devices. Since many of the users surveyed wanted various
options, this could be a great advantage for expansion. The standard
configuration has two serial ports, and any device requiring such an
interface could be connected.
The list price for the system is $3595. There is also a configuration
with 350K of disc space for $2695. CMC International, a wholesaler for
this system, lists government/institution prices of $2950 for the first
configuration (700K disc space) and $2565 for the second (350K disc space).
-85-
IBM Personal Computer. The IBM Personal Computer is based on the
8088 microprocessor, which makes it much faster than the Intertec and
other eight-bit systems. It also permits configuration with four times as
much memory, if needed. The standard configuration of the IBM Personal
Computer has 64K RAM and 320K disc space. The screen holds 24 lines
by 80 columns of text. It supports two types of graphics: four-color
graphics (resolution of 320 x 200 pixels) and black-and-white graphics
(resolution of 640 x 200 pixels). This provides the kind of line drawing
capability in which potential users expressed interest.
Hard discs could be purchased through several companies; Corvus
Systems sells them in 6MB, 11MB, and 20MB sizes. The IBM Personal
Computer also has a special interface for game adaptations which may
provide a low-cost joystick option.
The NEC APC has most of the advantages of the IBM Personal Computer.
RAM in the initial configuration is 128K and disc space exceeds one
megabyte. It is based on the 8086 microprocessor, which is slightly
faster than the 8088 IBM used. But the two are totally software-
compatible, so that applications designed for the IBM can be adapted to
the APC.
-86-
i,,
This system offers much more RAM and disc space for a lower price
than the IBM Personal Computer. No information has been released about
its bus, however. The primary mechanism for adding options to this
system is through its two serial ports. These can be used for joystick,
digitizer, or for connecting test stations to a proctor station.
The standard system with black and white display is $3298. With two
megabytes of disc space, the price is $3998, and with color, $4998.
Conclusions
-87-
V. RECOMMENDATIONS FOR FURTHER DEVELOPMENT
Accomplishments in Phase I
Phase I of this project began with four objectives:
1. To evaluate the need for a m ic rocomp uter -based CAT system
and to specify the system requirements to meet that need.
2. To develop a preliminary system configuration to meet that
need.
3. To develop a preliminary design for the software required by
this system.
-88-
Recommendations for Phase II
When the design and acceptance testing are complete, the software
should be developed and implemented on the most promising of the
systems. Basic hardware options should be interfaced and the neccessary
interface software should be developed.
-89-
-- 'II
I _ ""-I Ii-. .
REFERENCES
Bell & Howell. GENIS I User's Manual, Bell & Howell, 1979.
-90-
4
Control Data Corporation. Control Data PLATO Author Language Reference
Manual. St. Paul: Author, 1978. (a)
-91-
Hunter, D. R. Research on computer-based perceptual testing. In D.
J. Weiss (Ed.), Proceedings of the 1977 computerized adaptive
testing conference. Minneapolis: University of Minnesota,
Department of Psychology, Psychometric Methods Program, July
1978.
-92-
Myers, G. J. Reliable software through composite design. New York:
Petrocelli, 1975.
-93-
I.
Shuford, E. H., Albert, A., & Massengill, H. E. Admissible probability
measurement procedures. Psychometrika, 1966, 31, 125-145.
-94-
7AD-18 722 ASSESSMENT SYSTEMS COPP ST PAUL MN F/G 0/?
AI DEVELOPMENT OF A MICROCOMPUTER-HASEO ADAPTIVE TESTING SYSTEM P--ETC(
JUN 82 C D VALE, C ALBINO. L FOATE-LINNOX NO001 -R2-C-;132
UNJLASSFIED ONR-ASC-82- 01 NL
mmIIIIIIIIIIIl
IIIIIIIIII
IIIIfII
IIIIIIIEIIIEI
IIIIIIIII
IIIIIIIIII
Weiss, D. J., & Betz, N. E. Ability measurement: conventional
or adaptive? (Research Report 73-1). Minneapolis: University
of Minnesota, Department of Psychology, Psychometric Methods Program,
1973.
Wesman, A. G. Writing the test item. In R. L. Thorndike (Ed.), Educational
Measurement. Washington, DC: American Council on Education,
1971.
Whitely, S. E. Measuring aptitude processes with multicomponent
latent trait models. Journal of Educational Measurement, 1981,
18, 67-84.
-95-
i i i i . i i
APPENDIX A. DRAFT USER'S MANUAL
This appendix contains a draft of the User's Manual for the proposed
system. This draft was sent to four experts In the computerized testing
field for review. They were asked to comment on the system described,
suggesting additions or deletions, and to comment on the clarity of
the User's Manual, suggesting possible changes. Three of the experts
provided critiques in time for inclusion In this report. The following
manual is essentially the same version as was sent to the reviewers.
Their comments will be incorporated into the final version.
1. One reviewer felt that menu systems are much easier to use
than systems requiring commands and languages. Specifically,
he suggested that all operating-system commands be entered
from a menu.
8. Two reviewers felt that the ARCHIVE facility provided too many
opportunities for the user to destroy or misinterpret data.
Several suggestions for making it foolproof were given.
-96-
il1
9. One reviewer suggested that the item calibration routine should
have item-parameter linking capabilities.
-97-
COMPUTERIZED ADAPTIVE TESTING SYSTEM
USER'S MANUAL
-- DRAFT -
Do Not Cite or Distribute
-99-
L-- • - .
F
Ir
Preface
It is assumed that the reader is familiar with the basics of testing and
statistics. A brief introduction to CAT is provided, but this is not
intended to thoroughly familiarize the reader with the technology. No
previous computer experience is necessary or assumed; such experience is
undoubtedly helpful, however.
-100-
TABLE OF CONTENTS
i ~ 0
CHAPTER 7
7.1.
7.2.
7.3.
7.4.
-- TEST PRE-EVALUATION
Introduction
Running EVAL.UATE
Interpreting the Output
Summary
CHAPTER 8 -- DESIGNING NEW TESTING STRATEGIES
8.1. Introduction
8.2. Module Delimiters
8.3. Basic Statements
8.4. Variables
8.5. Declarative Statements
8.6. Output Control
8.7. Conditional Statements
8.8. Adaptive Statements
8.9. Menu Statements
8.10. Summnary
-102-
CHAPTER 1. SYSTEM OVERVIEW
1.1. Introduction
One of the major advances in the field of psychological testing during the
past thirty years is computerized adaptive testing (CAT). CAT offers a
means of providing more efficient, less threatening human assessment than
was possible with conventional tests. Using a CAT strategy, test items are
tailored to the ability of each individual taking the test: each examinee
receives the subset of test items most appropriate for his/her personal
level of ability. (The term "ability" is used throughout this manual to
refer to the examinee's trait level. The methods are general, however, and
can be used for non-ability traits as well.) Measurement is thus more
efficient, because time is not wasted with items of inappropriate
difficulty, and less threatening, because items that are too difficult
for the examinee are not administered.
Although CAT technology has been available for at least a decade, only
recently has the cost of computerized delivery systems become low enough
to make the implementation of CAT cost effective when compared to
conventional paper-and-pencil test administration. The computer software
to implement the technology still remains formidable, however, and cost-
effective implementation can rarely be achieved when the software must be
custom developed for a specific application.
-103-
The Item Banking subsystem, described in Chapter 4, provides procedures
through which test items, including the item text and relevent item
characteristics, can be entered into a random-access database and can
be refined and updated after they are entered.
The Item and Test Analysis subsystem, described in Chapter 6, uses the
data collected during test administration to calibrate test items. It
also provides estimates of test reliability and other indices of the
psychometric performance of the test. The Test Evaluation subsystem,
described in Chapter 7, uses item characteristics computed from previous
administrations to allow the test developer to evaluate various testing
methods and item sets for a specific application. Good estimates of test
performance can thus be obtained without actually administering the test,
and deficiencies can be corrected before valuable time is wasted on an
unacceptable test.
1.4 Summary
The system hardware and software combine to give the user of the
computerized adaptive testing a powerful tool for test administration and
scoring. Even more versatile are the features which allow the researcher
to design and implement new testing strategies without having to program
them in a standard programming language. Details of these procedures are
discussed in succeeding chapters.
-104-
CHAPTER 2. AN INTRODUCTION TO COMPUTERIZED ADAPTIVE TESTING
2.1. Introduction
The solution to the problem of selecting items without knowing the ability
level has been approached intuitively from several directions. The simplest
solution was to create a hierarchy of tests, to use scores from one test
to estimate the examinee's ability level, and then to appropriately select
subsequent tests on the basis of the score on the first one. One of the
earliest of the hierarchical strategies was the two-stage strategy in which
all examinees first responded to a common routing test. The score on this
test was then used to assign each examinee to a second-stage measurement
test. Responses to both tests were then considered in arriving at a score.
A problem with the two-stage strategy was that errors in measurement at the
first stage resulted in a misrouting to an inappropriate measurement test.
-105-
test. In a pyramidal test, everyone starts at a common item and branches to
an easier item after each incorrect response and to a more difficult item
after each correct response. A diagram of a pyramidal test is shown in
Figure 2-1.
SITEMO01
Increasing Difficulty 0
Several such mechanical branching mechanisms were evaluated over the years
but no clear winners, in the psychometric sense, emerged. Solutions to the
second problem, that of test scoring, did produce some superior strategies,
however. While a few of the strategies above could be scored in simple,
meaningful ways, many could not. The general solution to the item selection
and test scoring problems lay in the basic theory that provided much of the *
original motivation for adaptive test administration, item response theory
(IRT).
-106-
Figure 2-2. Diagram of a Stradaptive Test
b = -2 b - -1 b = 0 b - 1 b = 2
-107-
b= 0.0, and c = 0.2. The slope of the curve at any point is related
to a; high values of a make it steeper in the middle. The location along
the theta axis is a function of the b parameter; higher positive values
shift the curve to the right. The lower left asymptote corrresponds to a
probability or c of 0.2; high values of c raise the lower asymptote. The
ICC shown with a dashed line is for an item with a - 2.0, b = 1.0, and
c = 0.2. The midpoint of the curve has shifted to a theta of 1.0. The
curve is steeper at points near this value. The lower asymptote remains
at 0.2.
I / I I
(10)
-3 -2 -1 ) 1 2 3
The guessing parameter, c, causes the most difficulty and can be eliminated
if the items cannot be answered correctly by guessing. Recall-type items
that do not provide an opportunity for guessing can be used with the
reduced model. The model with c assumed to be zero is called the two-
parameter logistic model.
-108-
Several IRT models expand upon the utility of the dichotomous-response
models for other response formats. Polychotomous models, for instance,
are appropriate where multiple response categories are scored on each item.
One example of this situation is a multiple choice item in which the
incorrect alternatives are weighted as a function of how "incorrect" they
are. Another example is an interest-test item with ordered like,
indifferent, and dislike responses. Still another is a performance rating
scale on which performance is rated in ordered categories.
Another family of IRT models assumes a slightly different shape of the ICC.
All models discussed to this point have assumed that the response
probabilities follow a logistic ogive (i.e., a specific shape of the
response characteristics curve). Early in the development of IRT, several
models were based on a normal ogive rather than a logistic ogive. The
normal ogive model arose from the widespread use of the normal curve for
statistical models in psychology. The shape of the ICCs is nearly the
same for both models and it is difficult to say which better fits reality.
The logistic ogive is more attractive, mathematically, because of its
simplicity and has replaced the normal model in most practical
implementations.
2.3.2. Scoring and general item selection methods. The ICCs shown in
Figure 2-3 represent the probabilities of a correct response to each of the
items. Complementary curves to each of these exist, representing the
probability of an incorrect response. IRT allows the curves corresponding
to the correct and incorrect responses in an examinee's response pattern
to be multiplied together to yield a likelihood function. The likelihood
function indicates the probability of observing the entire vector of
responses at each level of ability. From this likelihood function, an
estimate of the examinee's ability can be obtained. Conceptually, this
can be done by assuming that the best estimate of an examinee's ability
is the level of ability that would most likely produce the vector of
responses observed. This score is called the maximum-likelihood estimate
of ability.
Figure 2-4 shows three response characteristic curves, in dotted lines, for
three items, two answered correctly and one answered incorrectly. The solid
curve shows the product of these curves multiplied together (i.e., the likelihood
function). The point at which the likelihood function reaches its peak is at a
theta value of ----- The value of ---------is thus taken as the maximum-
likelihood estimate of the examinee's ability.
-109-
Figure 2-4. A Likelihood Function With Two Items Correct
> 0
/\ /
P(Xle) / /
/
/ 1
/ /\
/ /\
/ o
Theta
neaieiniiyo and
Maximum-likelihood th modal.-Bayesian
tht fra for
aiiylvl can be used
cl.Rrl|rability estimation
any item-selection strategy when the items are calibrated according to an
IRT model. These scoring methods have suggested some new and improved
item-selection strategies. Intuitively, it can be seen that the more
peaked the likelihood function, the more accurate the estimate of ability.
It thus makes good sense to explicitly select test items that will sharpen
S -110-
Figure 2-5. A Likelihood Function With All Items Correct
P(xjo6)
.0,
Theta
P(xIO) /
The ta
0 NW
the peak of the function. Although the ideal Bayesian and maximum-
likelihood selection strategies differ slightly, very good item selection
can be accomplished by selecting for both, using the maximum-likelihood
ideal of selecting for maximum item information.
2.4.Summary
Running a sample test is a good way to become familiar with the system.
Three sample tests have been included in the system for you to administer
and to study. This chapter contains the instructions you will need to
administer these sample tests. After running the introductory tests, you
will be ready to proceed to Chapters 4 and 5 which describe how to
enter test items and how to revise the sample tests to include your own
items. Chapter 5 will also describe how to create tests using other
pre-defined testing strategies provided with the system.
Insert the disc marked "CATL Testing System - Main Disc" and press the B
key on the keyboard. The disc will click and the following message will
appear on the screen:
The system is now running and ready for you to tell it what to do.
As you recall from Chapter 1, there are basically three types of tests:
sample tests (in which everything is specified and ready to go), tests with
pre-defined strategies (you provide only the items to be administered), and
tests which you specify entirely (you write the strategy in CATL and provide
the items). Three sample tests have been supplied as examples. These tests
are titled SAMCONV (conventional-sample), SAMPYR (pyramidal-sample), and
SAMSTRAD (stradaptive-sample). To run a sample test, remove the Main Disc
and load the disc marked "Sample Tests (Chapter 3).'
The general command to see the names of the tests on the disk is SHOW. If
you enter the command by itself, it will provide a list of everything on the
disc. To get a list of only the tests on the disc, you should enter the
command:
SHOW *.TST
To get a list of only the sample tests on the disc, you should enter the
command:
SHOW SAM*.TST
If you enter the latter command, the list in Figure 3-1 should be displayed:
ADMIN
This command will cause the test administration program to run. The first
thing it will ask for is the name of the test to run. The example in
Figure 3-2 shows the administration of the sample pyramidal test. The
parts that you have to type have been underlined. The symbol <CR> is used
to indicate a carriage return.
ADMIN <CR>
3.3 Summary
• -114-
CHAPTER 4. ENTERING AND EDITING ITEMS
4.1. Introduction
This chapter begins by describing the item formats available in the system.
Then, the procedure for creating a new item is described. Finally, these
procedures are applied to the editing of an existing item. When you have
completed this chapter, you should be able to enter new test items and
modify existing ones.
-115-
Figure 4-1. Dichotomous Item Screen Display
1:
2: f
3:
4:
5:
6:
7:
8:
9:
12:
13:
14:
15:
16:
17:
18:
19:
20:
-116-
____________ |_________
The text of the item appears on the 20 lines following the characteristics
in exactly the format it will be presented on the screen. The first column
on the screen is the column immediately following the colon. An example of
what a completed item might look like is shown in Figure 4-2.
Tbhis item, labeled DEMOOl, has five alternatives. The item-text display
will be presented for a maximum of 10 minutes or until the examinee
responds. The examninee will be given no time to respond after the display
disappears. The screen will be cleared before the text is presented. The
correct response is "A." The a, b, and c parameters are 1.322, -0.301, and
0.214.
A graded item has a similar format but the parameter line is different, as
shown in Figure 4-3.
A nominal item has two lines of parameters, one a parameter and one b
parameter, for each response category. The item characteristic lines for a
nominal item look like the sample in Figure 4-4.
BI: -1.222 B2: -0_.454 B3: 1.123 B4: 0-6.802 B5: ____ B6: ___
In this example the item has four alternatives, each with correspondinga
and b parameters.
4.2.2 Graphic Items. Graphic items may have any of the three formats
available to the textual items. A graphic item is displayed in the same
general format except that the textual area of the display is replaced by
a graphic display. As will be discussed in the sections that follow, only
the item characteristics of a graphic item may be edited with EDIT and
I . special procedures are used to enter the graphic portions.
4.3. Creating a New Item
EDIT <CR>
If you enter an item number at this point, the item will be fetched and
displayed. To enter a new item, a format should be specified. EDIT
recognizes items in three formats: dichotomous, graded, and nominal. The
three corresponding titles are DICHOT, GRADED, and NOMIN. If any one of
these is entered, a blank item format will appear on the screen. This
format can then be filled in using the operations described below.
The cursor (the white square on the screen) can be moved to any location
where changes or additions are to be made. You can move the cursor arou.,d
the screen with the four arrow keys at the upper left of the keyboard.
In Change mode, any character typed will replace the character under the
cursor. In this mode, you can make character-by-character changes in the
text or item characteristics by simply typing over whatever characters
presently exist. Formatting characters (e.g., the characteristic labels) are
protected and typing over them will not change anything.
In Insert mode, characters are inserted into a line directly to the left of
the cursor. Characters to the left of the cursor can be deleted by
backspacing over them. Characters to the right of the cursor can be
deleted by moving the cursor to the right of them and then backspacing over
them. A new line can be inserted into the text by pressing the return key
while in Insert mode. The new line will appear immediately following the
cursor and the cursor will move to the beginning of that line.
Control-A clears the screen and asks for another item without making any
of the changes permanent.
Control-Q causes the editor to make the changes permanent, clear the screen,
and ask for another item. You can terminate the editor at this point by
entering QUIT instead of an item format or an item number.
An existing item can be edited in a manner very similar to the way a new
item is entered. Instead of specifying a format when EDIT asks for an item
number or format, you simply enter the number of the existing item. This
item appears on the screen with the appropriate format and you can edit it
using the basic edit operations described above.
You will not be able to edit the graphic portion of a graphic item. You
can, however, redigitize the graphic portion of the item without altering
the characteristic portion. To do this, simply enter control-G and
proceed with the digitization as before.
4.5 Summary
Items form the core of psychological testing. To use any of the tests
subsequently described you must have a bank from which to draw items. The
system facilitates item entry, editing, and retrieval, making it possible
to maintain and change a large bank of items.
-119-
5.1 Introduction
Several testing strategies have been provided with the system. These
strategies are provided as test templates. A test template is a general
test specification in an author language (discussed in Chapter B) that
has several blanks in it allowing you to fill in the items to administer
as well as certain test characteristics. The templates provide an
administration framework, allowing you to construct tests by simply
specifying the items to be used. You can use items in the database for
any of these pre-defined strategies.
To create a test from a template, you need to do two things. First you
need to select and fetch a template. Then you need to fill in the data
required by the template.
A list of the templates provided with the system along with descriptions of
them can be found in Appendix (4). Each description details the testing
strategy. For example,
In order to use any of the templates, first enter the command "SPECIFY."
The system will respond
If you answer Y, the system will display a list of the templates available.
(Again, to see a complete list of the template descriptions, refer to
Appendix (4).) At the end of the list, the system will say
-120-
If you answ~er N, the system will respond
without giving you a list of the available templates. HELP will result in
an explanation of the question and directions to refer to the manual for
more details.
After you enter the name of the template you want, the system will ask
If you have specified the wrong template, you can answer N at this time
and enter the correct template name. Otherwise, a Y response will begin
the processing of the template.
The template directs construction of the test. The system will ask you for
a name for the test. A mnemonic (perhaps including a datc or version number
of the test you are about to create) is usually most appropriate. Next, the
system will typically ask for item numbers, specifying their sequence
automatically.
Each template is different; some are more complicated than others. For
illustration, the sequence for specifying a fixed length conventional
test is shown in Figure 5-1. Your responses are underlined and
capitalized.
When you have completed specification of the items, your test will exist
in two or more parts. The test specification will be on the file created
by SPECIFY and the items themselves will be in one or more banks. If your
system has small discs, the item banks may be on several different discs.
To make tests execute efficiently, it is helpful to consolidate all of the
test information on one file and to do as much preprocessing as possible.
This is done by the program CONSOLIDATE.
-121-
Figure 5-1. Specification of a Fixed-Length Conventional Test
SPECIFY <CR>
:Y <CR>
:FXDLENCN <CR>
Conventional fixed length test -- Is this the template you wish to specify?
:Y <CR>
:FXDVERI.SPC <CR>
#ITM102 <CR>
#1TM209 <CR>
#1Th210 <CR>
#ITM2lI <CR>
#Y1tM329 <CR>
# <CR>
:FXDVERI.SPC <CR>
-122-
Figure 5-2. Specification of a Variable-Length Conventional Test
:N <CR>
:VARLENCN <CR>
:Y <CR>
:VARLVERl.SPC <CR>
Enter test item numbers, one at a time. When you have entered them all,
enter a carriage return alone.
#ITM602 <CR>
#1TM671 <CR>
#ITM772 <CR>
#IfT778 <CR>
#ITM792 <CR>
#ITM809 <CR>
#ITM851 <CR>
# <CR>
:VARLVERI.SPC <CR>
-123-
To run the consolidator, enter the command:
CONSOLIDATE
You should enter the name of the test you created using SPECIFY. If
you name a test that does not exist, the consolidator will respond with:
The test you named does not exist.
Enter the name of the test specification file:________
At this point, you should either enter the proper file name or STOP to
stop. When the consolidator finds the specification file, 1-twill respond:
Enter a name for the executable test file:
You should enter the name that you would like the test to have. This is
the name that you will use when you administer the test. If the file
already exists, the consolidator will give you a warning:
If you respond with an N, you will be asked to rename the file. If you
respond with a Y, the old file will be er~ased. When an acceptable file
name has been entered, the consolidator will respond:
If you insert the proper disc and type GO, the consolidation will continue.
If the proper bank does not exist, the consolidation cannot continue and
you should enter STOP to end it.
When the consolidation is complete, the consolidator will issue the message:
At this point, the executable test file has been created and the test is
ready to administer.
-124-
5.4 Summary
Templates provided with the system can be completed using the system
program SPECIFY. This produces a complete test specification. Completed
test specifications need to be consolidated before they are administered.
This is done using the program CONSOLIDATE.
-125-
CHAPTER 6 - CALIBRATING TEST ITEMS
6.1. Introduction
Many useful adaptive testing strategies are based on item response theory
(IRT), discussed in Chapter 2. To use IRT, the test items must be
calibrated. That is, the parameters that describe the statistical behavior
of the items as a function of ability must be estimated.
There are four steps required to calibrate iteyrwc: data collection, item
calibration, interpretation, and item bank updaLing.
ARCHIVE condenses all data files with the file qualifier DTA. For example,
the data files SB11376.DTA, SB48853.DTA, and SB97301.DTA will all be
archived onto a specified archive file.
The individual examinee data files read by ARCHIVE must contain single
conventional tests and all of the files to be archived must contain the same
test. The template CALIRCON is provided to assist you in constructing such
tests. Figure 6-1 shows an example of a test created using the template.
The test administration program will create a file with a file name composed
of the examinee's identification and the appropriate file qualifier.
-126-
Figure 6-1 -- Specification of a Calibration Test
:SPECIFY <CR>
:N <CR>
:CALIBCON <CR>
:Y <CR>
#TST035 <CR>
#TSTL98 <CR>
#TST132 <CR>
#TSTOO4 <CR>
0
0
0
# <CR>
To archive the data in all of the DTA files into a single file, the archive
program can be run by entering the command:
ARCHIVE
When the archive file name is entered, ARCHIVE will collect data from all
of the DTA files and write them to the end of the archive file. If the
named archive file has not been created, ARCHIVE will create it. If it
has been created and contains data from previous archive runs, the new
data will be added to the end of it.
-127-
After running ARCHIVE, the original examinee files should be erased to
conserve space. The system command for erasing all of the DTA files is:
ERA *.DTA
If the files are not erased, they should be renamed with a file qualifier
other than DTA before ARCHIVE is run again. This is neccessary because
ARCHIVE collects data from all DTA files. If ARCHIVE is run twice with
files having the DTA qualifier still on the disc, the data will be
archived twice and the archive file will contain duplicate data.
CALIBRAT
You should respond with the name of the file onto which the individual
examinee files were archived. If the named file is nonexistent or empty,
CALIBRAT will respond with an appropriate message and will again ask for
the input file name. If you wish, you can abort the program by entering
STOP. If the named file contains valid data, CALIBRAT will respond by
asking for the name of the output file:
You should respond by entering the name of a file where you would like the
item parameters to be written. If the file named already exists, CALIBRAT
will ask if it is all right to erase it:
If you enter Y, the file will be erased and CALIBRAT will continue. If
you enter N, CALIBRAT will again ask you to name the output file. The
final file needed by CALIBRAT is the filu for the summary output:
-128-
You should respond by entering the name of a file where you would like the
item analysis summary to be written. If the file named already exists,
CALIBRAT will ask if it is all right to erase it:
If you enter Y, the file 'will be erased and CALIBRAT will continue. If
you enter N, CALIBRAT will again ask you to name the file.
You can put bounds on each of the three item parameters. This feature can
be used to keep item parameters within reasonable limits when the data
suggest otherwise. It can also be used to fix any parameter for a
restricted model. A parameter can be fixed by simply setting both the
uppet --d lower bounds to the same value. CALIBRAT always asks for bounds
for the parameters. If you do not .ish to bound the parameters, simply
press the return key after each request without entering anything. To get
the bounds, CALIBRAT will ask:
Enter the upper and lower bounds for the "a" parameter
separated by a comma:
If two values are entered and the second is greater than or equal to the
first, those values will become the parameter bounds. If only a return
is entered, the default bounds of 0.0 and 2.5 will be used. If anything
else is entered, CALIBRAT will refuse to accept it and will repeat the
request. When the a parameter bounds are set, the b parameter bounds
will be requested:
Enter the upper and lower bounds for the "b" parameter
separated by a comma:
If two values are entered and the second is greater than or equal to the
first, those values will become the parameter bounds. If only a return
is entered, the default bounds of -2.5 and 2.5 will be used. If anything
else is entered, CALIBRAT will refuse to accept it and will repeat the
request. When the b parameter bounds are set, the c parameter bounds
will be requested:
Enter the upper and lower bounds for the "c" parameter
separated by a comma:
If two values are entered and the second is greater than or equal to the
first, those values will become the parameter bounds. If only a return is
entered, the default bounds of 0.0 and 0.5 will be used. If anything else
is entered, CALIBRAT will refuse to accept it and will repeat the request.
To set a parameter to a constant value, the lower and upper bounds must
both be set equal to that value.
CALIBRAT will then begin to calibrate the items. During the process of
calibration, CALIBRAT will provide data to the terminal to indicate the
-129-
status of the calibration run. CALIBRAT makes several passes through the
data, refining the parameter estimates on each pass. At the end of each
pass, CALIBRAT will display the number of the current pass, the time
required to complete that pass, and after the third pass, the estimated
number of passes and estimated time to completion. The terminal display
of a calibration run migbt look like the one in Figure 6-2.
:CALIBRAT <CR>
Name the file on which to write the item parameters: ITEMPARS.CAL <CR>
Name the file on which to write the item-analysis summary: TEMP.OUT <CR>
Enter the upper and lower bounds for the "a" parameter
separated by a comma: 0.6, 2.5 (CR>
Enter the upper and lower bounds for the "b" parameter
separated by a comma: <CR>
Enter the upper and lower bounds for the "c" parameter
separated by a comma: <CR>
-130-
• |
CALIBRAT terminates, reports the current estimates, and provides a message
that the process did not converge.
Upon completion of the run, the parameter estimates are written to the
appropriate output file. The file begins with a title line including
the input file name and the date of the run. All data for each item are
written on single lines in the file. Each line begins with the item
reference number and continues with the three parameters a, b, and c,
followed hy their respective standard errors. An example of the first
few lines of such a file is shova in Figure 6-3.
0
0
0
An extended item analysis is written to the file named for the summary
file. Both classical and IRT item and test analyses are provided on this
file. An abbreviated listing of a sample of this file is shown in Figure
6-4.
-13 1-
Figure 6-4. Item and Test Analysis
Mean a : 1.237
Mean b : .242
Mean c : .189
Total information XX.XXX
Expected information XX.XXX
I
I
I
I I
N I ....
F I
0 1
R I
M I
A I
T I
I I
0 I ..
N I
I e•..
I *.. •
I
I
I
I
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
THETA
Figure 6-4 (Continued)
* Item analysis
0
0
0
As Figure 6-4 shows, the summary output is divided into four sections.
The first section provides documentation for the run indicating the data
file used, the date of the run, the number of examinees included, and the
number of items calibrated.
The second section provides classical test analysis statistics. The XR-20
internal consistency reliability is computed along with the mean number-
correct score, the standard deviation, and the standard error of
measurement.
The third section presents IRT test analyses. First the means of the three
IRT item parameters are presented. Then, the total-test information is
provided. This corresponds to the total area under the test information
curve. The expected information is the expected value of the test
information curve weighted by a standard normal density function. The
graphic plot is the total available test information plotted as a function
of theta. It is evaluated at 61 equally spaced theta values between -3.0
and 3.0.
i The final section presents the individual item characteristics, one line
per item. The first column contains the item reference number.
two through four contain the three item parameter estimates. The next
three columns contain the standard errors of the parameter estimates.
Columns
After the items have been calibrated, the parameters should be entered into
the item bank for use in future test construction. There are two ways by
-133-
which this can be done. The first is to manually take the data from a
printout of the output from CALIBRAT and then enter it into the bank using
the item editor. A much more efficient way is to use the utility, UPDATE.
UPDATE reads the CALIBRAT output file and transfers all of the parameters
to the appropriate items in the bank. To run UPDATE, you should enter the
command:
UPDATE
You should enter the name of the parameter file. UPDATE will then store
all of the parameters in the item bank along with the items. A log of
the items updated will be displayed on the terminal. If for any reason
the parameters cannot be added for an item, an error message will be
displayed and the updating will continue with the next item. An example
of an UPDATE run is shown in Figure 6-5.
:UPDATE
TST035 Updated
TST098 Updated
TST132 Item not in bank -- Not updated
TST004 Updated
0
0
0
6.6 Summary
IRT item parameters are required for most adaptive testing '.Bthods. For
items scored dichotomously (e.g. right/wrong), these parameters can be
estimated using the facility CALIBRAT. Calibration requires four steps:
(1) creation of a conventional test containing the experimental items,
(2) estimation of the parameters, (3) evaluation of the results, and
(4) insertion of the parameters into the item bank. When this has been
accomplished, the items are ready for inclusion in an adaptive test.
-134-
CHAPTER 7. TEST PRE-EVALUATION
7.1. Introduction
EVALUATE can be used with both conventional and adaptive tests. The
information provided shows the test's characteristics in the limiting case
where all items are to be administered.
EVALUATE
If you enter the name of a valid executable test file, EVALUATE will
continue. If the file does not exist or is not a valid executable test
file, EVALUATE will respond with an appropriate error message and will
again ask you to name the test file. You can terminate EVALUATE at this
point by entering STOP.
-135-
EVALUATE will then ask for an output file:
If the file already exists, EVALUATE will ask you whether to erase it:
If you respond with a "Y," EVALUATE will erase the file and proceed. If
you type "N," EVALUATE will ask you for another file name.
When you have entered the file name, EVALUATE will ask you for the mean
and standard deviation of ability of the group to which the test will be
administered.
The group on which the items are calibrated is assumed to have a mea" of
0.0 and a standard deviation of 1.0, so the new group should be described
relative to the calibration group. EVALUATE assumes that the distribution
of ability is normal with these parameters.
A long test with many subtests can produce a large volume of output.
EVALUATE therefore asks whether you wanL all of the output. The high-
volume components of the output are the test and subtest information
functions. EVALUATE will ask if you want them.
After you answer these two questions, EVALUATE will analyze the test
file. The output is written to the file you specified.
A sample output file for a simple test consisting of two conventional subtests
is shown in Figure 7-1. Only the test information function was requested.
The output contains reports on the two subtests individually and combined
as a single test. In this example, the first subtest (OLDTEST) contained
15 items, had an estimated reliability of .677 in a population with ability
distributed normally with the specified mean of 0.5 and standard deviation
of 1.0. The a, b, and c parameter means are as shown. The total test
information is the total area under the test information curve. The
expected information is the average of the test information function
-136-
!U
Figure 7-1. Sample EVALUATE Report
POPULATION CHARACTERISTICS
Mean: 0.500
Standard deviation : 1.000
SUBTEST OLUTEST
Number of items: 15
Estimated reliability : .677
Average a parameter: 1.222
Average b parameter: .131
Average c parameter: .178
Total test information : XX.XXX
Expected test information : XX.XXX
SUBTEST NEWTEST
Number of items: 20
Estimated reliability : .774
Average a parameter: 1.311
Average b parameter: .126
Average c parameter: .185
Total test information : XX.XXX
Expected test information : XX.XXX
TOTAL TEST
Number of items: 35
Estimated reliability : .XXX
Average a parameter: 1.273
Average b parameter: .128
Average c parameter: .182
Total test information : XX.XXX
Expected test information : XXJXXX
-137-
Figure 7-1 (Continued)
I
I
I
I I
N I ....
F I •.
O I ....
R I ....
M I
A I
T I
I I
O I.
N I .
I
I
I..
I
-3.0 -2.0 -1.0 0.0 1.0 2.0 3.0
THETA
The reports on the second subtest and on the total test can be interpreted
similarly.
7.4 Summary
The system allows you to predict the test characteristics througn IRT using
the EVALUATE program. EVALUATE prompts you to enter necessary information
and provides statistical pre-evaluation of the test specified.
-138-
CHAPTER 8. DESIGNING NEW TESTING STRATEGIES
8.1. Introduction
The following statements in CATL are described in this chapter. They are
listed in the order in which they are presented.
-139-
Tests can be nested, one within the other, to a depth of ten. To ensure
that each TEST has a corresponding ENDTEST, it is a good idea to include
the optional ENDTEST name to allow the test consolidator to detect
unmatched TEST-ENDTEST pairs. Figure 7-1 shows a sample TEST
specification. There is a major TEST, SAMPLEI, which includes all of the
subtests. On the next level, there are three subtests, SUBI, SUB2, and
SUB3. On the third level, under SUB3, there is test SUB3.1.
TEST SAMPLE I
0
0
0
TEST SUMi
0
0
0
ENDTEST SUBI
TEST SUB2
0
0
0
ENDTEST SUB2
TEST SUB3
TEST SB
0
0
0
ENDTEST SUB3
0
0
0
ENDTEST SUB3
0
0
0
ENDTEST SAMPLE
-140-
Figure 8-2. Test Without Indentation or ENDTEST Names
TEST SAMPLE2
0
0
0
TEST SUBI
0
0
0
ENDTEST
TEST SUB2
0
0
0
ENDTEST
TEST SUB3
0
TEST SUB3.1
0
0
0
ENDTEST
0
ENDTEST
ENDTEST
The tests in Figure 8-2 are the same as those in Figure 8-1 but are much
more difficult to read. It is unclear in Figure 8-2, for instance, which
ENDTEST corresponds to each TEST statement.
TEST BASIC
#ITMOO 1
*ITMOO3
ENDTEST BASIC
o4
When executed, TEST BASIC will administer items ITMO01, ITMOO2, and
ITMOO3 from the item bank.
Comments may seem unimportant, since a test stripped of all comments will
execute exactly as before. They are important, however, because they make
programs easier to read and help in test verification and maintenance.
TEST COMMENTS
I This is a sample test, created to display
comments in a test, 82/04/16, T. Smythe.
#ITMOO4
#ITMOO5
#ITh006 I Item 006 added in 82/09/12, R. Carothers
ENDTEST COMMENTS
TEST INCLUDE
I This test will execute a pool of items as well
I as the other individual items specified.
* ITMOO:
#1Th008
#ITMO08
*POOL402
ENDTEST INCLUDE
The test above will consist of items ITMO07 and ITMOO8 as well as
all of the items listed on the file named POOL402.
-142-
8.4. Variables
CATL provides both local and global variables and a SET statement for
setting them. Local variables are used only within the TEST block in
which they are declared. The "scope" of a local variable is that test,
and any attempt to reference it outside of that test will fail. A local
variable is NOT passed to subtests.
Global variables can be declared anywhere in the test and can be referenced
anywhere. Global variable names begin with an
All CATL variables are initialized to zero (default), but you can assign
any value to any variable using SET. More than one variable can be SET at
a time if you separate the declarations with a comma. In Figure 8-6, the
variables MEAN and VAR are used.
TEST VARIABLE
SET @MEAN = 1.0
I Set global variable MEAN to 1.0
# ITM009
#ITMO1O
TEST SUBTEST
SET LMEAN - @MEAN, VAR - 3.0
I Local LEAN set to global @MEAN whose value is 1.0.
Notice that two variables are set here.
#ITMOII
*POOL002
ENDTEST SUBTEST
ENDTEST VARIABLE
Scoring mechanisms are provided with the system. They calculate scores
such as the time it takes the examinee to respond, Bayesian ability
estimates, and percentage correct. Appendix (SCORE) lists all available
scoring mechanisms and the required variables for each. To use one of
these scoring mechanisms you must declare SETSCORE, the name of the score
and the names of variables (in the correct order) that you will use.
-143-
_ _ _ _ _ _-. -. . ..... . .
The format for SETSCORE is:
TEST BAYES
SETSCORE BAYESM (MEAN,VAR)
Bayesian scoring mechanism
I requires mean and variance
I variables
SET MEAN - @PRIORMN, VAR - @PRIORVAR
! Note that @PRIORMN and @PRIORVAR are global variables
set in another test outside TEST BAYES.
#ITMO12
#ITMOI3
#ITMO14
#ITM015
#ITM016
#ITMOI 7
ENDTEST BAYES
-144-
+ arithmetic add
- arithmetic subtract
* arithmetic multiply
/ arithmetic divide
C opening parenthesis
) closing parenthesis
TEST TERMEXM
SETSCORE PCORR(PROPCORR), NADMIN(N)
I Specify the variables to be used in conjunction with
1 scoring mechanisms PCORR and NADMIN.
TERMTNATE (PROPCORR < .3 OR PROPCORR > .8 OR N - 10)
I Execute the following items until one of the
conditions above is met.
#ITMO16
#1TMO17
0
0
0
#ITM029
ENDTEST TERMEXM
TEST TERMEND
SETSCORE BAYES(MEAN,VAR), TIME (ET)
I Both Bayesian and timing scores will be used
#ITM016
#ITMO17
#ITM018
*POOL004
TERMINATE (VAR < 0.001 OR ET > 1200)
I TEST untilend will execute the items listed above,
I including the Items in POOLO04, until the
I variance value reaches 0.001 OR the elapsed time (ET)
I is greater than 1200 tenths of seconds.
ENDTEST TERMEND
-145-
The TERMINATE declarative will function in the same way here as it did
in the previous figure.
The AUTOKEEP declarative is described in the next section along with its
executable form, KEEP.
There are two simple output facilities in CATL: the KEEP and the AUTOKEEP
statements. KEEP writes the variables and strings of characters to the
disc for storage. It is important to KEEP any data (e.g., scores) needed
to analyze the test results.
Although writing scores and other data to a disc is adequate for storing
the data, numbers in isolation can be difficult to interpret later. It is
a good idea to precede each data KEEP with an identifying character string.
More than one variable or string may be kept on the same line if they are
separated by commas.
TEST KEEPSCOR
This test consists of two subtests, one using
I conventional items, one incorporating a pool
I of items. Scores from each are kept on the disc.
TEST CONVENTL
SETSCORE TIME(TENTHS)
I Assigning the variable TENTHS to the
I elapsed-time function.
#1Th009
#ITMOIO
#ITMO11
KEEP "Length of time for three responses was:", SECONDS
I The character string will identify the
I numerical value in the data file.
ENDTEST CONVENTL
TEST POOLED
SET MEAN - 1.0, VAR - 2.0
SETSCORE BAYES(MEAN,VAR)
I Initialize local MEAN and VAR and specify
I Bayesian scoring with MEAN and VAR
*POOLO03
KEEP "Bayesian MEAN score for POOLO03 was:", MEAN
ENDTEST POOLED
ENDTEST KEEPSCOR
-146-
AUTOKEEP is similar in format and function to KEEP. It is different in
two respects: (1) it is automatically executed every time an item is
administered, and (2) it is a declarative statement, which means that it
is either on or off for the entire TEST block in which it is used.
AUTOKEEP is useful whenever item response information must be kept. It
is particularly useful in keeping data for item calibration. Figure 8-11
shows AUTOKEEP used in the template CALIBCON, which is used to set up a
conventional test containing trial items for pre-calibration administration.
TEST CALIB
INSTRUCT Calibration template -- Use to collect data for item calibration.
INSTRUCT
ENDTEST
IF (logic)
Figure 8-12 shows an example of a simple use of IF, ELSEIF, and ENDIF.
-147-
Figure 8-12 -- Simple IF, ELSEIF, and ENDIF
The SEARCH statement searches a pool of test items and determines which
unadministered item has the most psychometric information at an ability
equal to a value specified. The SEARCH statement includes a variable or
constant specifying the search value. The item pool is delimited by the
SEARCH statement and an ENDSEARCH statement. An example of a Bayesian
test is shown in Figure 8-13.
-148-
Figure 8-13. A Searched Bayesian Test
TEST BAYES
SETSCORE BAYES(MEAN,VAR)
KEEP MEAN,VAR
ENDTEST BAYES
A SEQUENCE statement can be used much like an item for branching. When
it is branched to, it administers an item and then follows the specified
logic to branch to another item or label outside the sequence. The
sequence is terminated with an ENDSEQUENCE. Figure 8-14 gives an example
of a two-stratum stradaptive test implemented using sequences.
-149-
Figure 8-14. Use of the Sequence Statement
TEST STRAT
#ITMOOI
#ITMOO2
0
o Branch logic will be ignored
o ! within the sequence itself.
#ITMO10
ENDSEQUENCE
-150-
8.9. Menu StateminL.
Now all the tools necessary to write CATL programs that will execute tests
have been introduced. With a few more menu statements, CATL templates
(such as the pre-defined ones described in Chapter 5) can be written.
Several versions of the same test can be easily created with these
templates, without duplicating the CATL code each time. The only additional
statements necessary are INSTRUCT, -, and J.
8.10 Summary
CATL, like any other programming language, requires rigor in use. However,
unlike most languages you may have experience using, CATL has been
specifically designed for use in computerized adaptive testing. CATL frees
the user from the time consuming lower level details of implementation.
TEST
This KEEP will put the scores the user wants to keep
out to the disc.
ENDTEST
This ends the test specified by the user.
-152-
CHAPTER 9. DISPLAYING TEST RESULTS
9.1. Introduction
After a test has been administered you will undoubtedly want to see the
results. Previous chapters have discussed how to develop a test, how to
administer it, and how to keep test scores on a file. The system also
contains a facility for translating the raw numbers into any of several
forms. Two programs, DISPGEN and DISPEX, constitute this facility.
The scores to be displayed must be read by the display program from the
file produced by the test administrator. The scores must be recorded
unambiguously on that file. This is easy if only one line of scores is
kept. If several lines are kept, recording is still easy if the number of
lines is constant and the same set of scores is kept for all examinees. In
* either of these cases, the sequence and location of scores is unambiguous.
However, if the number and/or sequence of scores can vary, the scores must
be identified.
The test specification shown in Figure 9-1 will always produce two lines
on the KEEP file and all of the scores will be in a known location.
When an examinee is tested, the test administrator will create a data file
that includes two lines of data. The proportion correct will always be in
the first eight spaces of the first line, and the Bayesian mean and variance
will be in the first sixteen spaces of the second line.
-153-
Figure 9-1. A Constant Data File
I
I Conventional test with Bayesian scoring
TEST CONBAYES
KEEP PROPORTION
The score correspondence section begins with the first line of the
specification and continues until the beginning of the interpretation
section. There are two ways in which correspondences can be made.
First, scores can be listed on a line-by-line basis without labels with a
one-to-one correspondence between lines in the file and lines in the
-154-
Figure 9-2. A Labeled Data File
!
! Bayesian test
TEST CONBAYES
SEARCH MEAN
#ITMOO1
#1TMOO2
# ITMOO3
#IiTMOO4
#1TMO05|
# ITMO06
OITM007
#ITKO07
# ETMOO9
#t1TM009
ENDSEARCH
-155-
correspondence section of the specification. For examplte, the following
correspondence would work well with the conventional test specified in
Figure 9-1:
PROPORTION
MEAN, VAR
would search for a line in the Keep file beginning with the label "MEANVAR"
and would assign the values found in the two fields following the label to
the variables MEAN and VAR, respectively. The Keep file should contain
only one line with the specified label. If several lines are so labeled,
unpredictable results may occur. Labeled and unlabeled correspondences
should not be mixed in the specification because the results are
unpredictable.
-156-
< less than
<. less than or equal to
equal to
>< not equal to
>- greater than or equal to
> greater than
+ arithmetic add
- arithmetic subtract
* arithmetic multiply
/ arithmetic divide
( opening parenthesis
) closing parenthesis
The display statements can consist of a combination of text and three other
operators. The text is printed. If a # is -tund, the program expects a
module number to follow and that module is iaserted in its entirety. If a
question mark is found, a score name is expected to follow and it is
printed in the text, occupying eight spaces with a decimal and three digits
to the right of the decimal. An @ forces the beginning of a new line of
text.
Other modules wiay be embedded within a module. They are executed as they
are encountered. An embedded module is treated the same as a called
module.
The display program will sequentially execute only the first module on the
display specification. This means that additional modules can be included
at the end of the specification and called when they are needed. They will
not be executed unless they are called.
Figure 9-3 shows a sample display specification for the conventional test
with Bayesian scoring discussed above.
Another display specification for the same data is shown in Figure 9-4.
In the first example, the display only lists the scores. In the second,
Interpretive statements are made regarding the scores.
-157-
Figure 9-3. Display Specification for a Conventional Test
PROPORTION
MEAN, VAR
The proportion correct score achieved was ?PROPORTION @
@
The Bayesian mean was ?MEAN and the Bayesian posterior variance
was ?VAR @
PROPORTION
MEAN, VAR
I
[*(MEAN < -i.0)A
The examinee did not do well on the test.
DISPGEN
You should respond by entering the name of the file that contains the
display specification. The system will then respond with:
-158-
.7 -- -
.- r
You should enter the file on which to write the translated output. DISPGEN
will then translate the input file and write it to the output file. If
any errors In the logic or the text are discovered, the line with the error
and a pointer to the location will be printed on the terminal in the
following format:
In this example, the closing caret was omitted from the logic line and the
error was detected when the # character was read.
The display is executed by running the program DISPEX, which is run by entering
the command:
DISPEX
You should respond by entering the name of the output file produced by
DISPGEN. Next, the system will ask where the display should be printed:
Display to terminal (T) or printer (P) ? _
You should respond with a T or a P. The system will then respond by asking
for a data file:
When you enter the file name, the display will be produced at the specified
terminal. Upon completion, it will ask for another examinee's data file.
You can terminate DISPEX by entering STOP.
-159-
APPENDIX B. PARTIAL BNF DESCRIPTION FOR CATL
-160-
<statement> <IF statement> I
<item statement> I
<KEEP statement> I
<SEARCH statement> I
<SEQUENCE statement> I
<SET statement>
I
i
-161--
APPENDIX C. LIST OF SYSTEMS CONSIDERED
The following systems were too expensive. This was due, in part,
to the fact that they were sixteen-bit microprocessors. Sixteen-bit
microprocessors are much more powerful than the eight-bit processors,
however they are still very new and fairly expensive. Sixteen-bit
microprocessor-based systems usually contain more memory than eight-bit
systems which also increases the cost. These systems would be logical
choices for testing applications in which high speed is more important
than low cost or where multi-user systems are desired.
System 8305
Dual Systems
720 Channing Way, Berkeley, CA 94710
Processor: 68000 microprrcessor
Memory: 256K RAM
Discs: Two dual-density single-sided eight inch floppies
Bus: S-100
Operating System: UNIX
Price: $8,295
AMPEX Dialogue 80
Computex
5710 Drexel Avenue, Chicago, IL 60637
Processor: Z8000 microprocessor
Memory: 256K RAM
Discs: Two eight inch floppies
Bus: Multibus compatible
Operating System: Custom multiuser
Price: $7,053
CD 100
Callan Data Systems
2637 Townsgate Rd., Westlake, CA 91361
Processor: LSI-11 or Z80 microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch floppies
Bus: Multibus or Q-bus
Operating System: RT-11, CP/M
Price: $7,465
TEC-86
Tecmar, Inc.
23600 Mercantile Rd., Cleveland, OH 44122
Processor: 8086 microprocessor
Memory: 64K RAM
Discs: Two eight Inch floppies
Bus: S-100
Operating System: CP/M 86
Price: $4,390; no terminals
-162-
.- ----- w -. :-
System 2
Seattle Computer
1114 Industry Dr., Seattle, WA 98188
Processor: 8086 microprocessor
Memory: 128K RAM
Discs: dual density floppy controller; no drive
Bus: S-100
Operating System: MS-DOS
Price: $4,785.50; no terminal
DS990
Texas Instruments
P.O. Box 202129, Dallas, TX 75220
Processor: 9900 microprocessor
Memory: N/A
Discs: Two eight inch floppies (1.2M)
Bus: N/A
Operating System: UCSD Pascal
Price:
ERG68-696
Empirical Research Group
P.O. Box 1176, Milton, WA 98354
Processor: 68000 microprocessor
Memory: 64K RAM
Discs: two eight inch dual-density floppies
Bus: S-100 (IEEE)
Operating System: FORTH, IDRIS
Price: $7,395
Model 235
Zendex
6644 Sierra Lane, Doublin, CA 94566
Processor: 8088 or 8085 microprocessor
Memory: 64K RAM
Discs: two floppies
Bus: Multibus
Operating System: MP/M II
Price: $7,595; no terminals
-163-
...
Iii - -- I - 1W •E E - II - iU 1,
ACS8000
Altos Computer Systems
2360 Bering Dr., San Jose, CA 95131
Processor: Z80 or 8086 microprocessor
Memory: 64K RAM
Discs: Two eight inch floppies (1 M)
Bus: Multibus
Operating System: CP/M, OASIS, MPMII, CP/M-86, MP/M-86,
OASIS-16, XENIX
Price: $3,650; no terminals
MP/100
Data General
Westboro, Mass. 01581
Processor: mN602 microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch floppies (630K)
Bus: N/A
Operating System: MP/OS, DOS, RTOS
Price: $9,490
SuperSixteen
Compupro/Godbout Electronics
Oakland Airport, CA 94614
Processor: 8088 and 8085 microprocessor
Memory: 128K RAM
[iscs: no drives
Bus: S-100 (IEEE)
Operating System: CP/M, CP/M-86
Price: $3,495; no terminals
VICTOR-9000
Victor Business Products
3900 North Rockwell St, Chicago, IL 60618
Processor: 8088 microprocessor
Memory: 128K RAM
Discs: Two eight inch single-sided floppies (1.2M)
Bus: N/A
Operating System: CP/M-86, MS-DOS
Price: !A,200
-164-
V.
k
The following systems have sixteen-bit microprocessors and special
features not available in most systems. Although these special features
make the system too expensive, more than $3,000, these systems would be
excellent choices where their special features were required.
CGC-7900
Chromatics
2558 Mountain Industrial Blvd., Tucker, GA 30084
Processor: 68000 or Z80 microprocessor
Memory: 128K RAM
Discs: Two eight inch floppy (IM)
Bus: N/A
Operating System: IDRIS, DOS, CP/M
Price: $19,995
DISCOVERY
Action Computer Enterprises
55 West Del Mar Blvd., Pasadena, CA 91105
Processor: 8086 and Z80 microprocessors
Memory: 192K RAM
Discs: Two eight inch floppies
Bus: S-100
Operating System: DPC/OS, CP/M
Price: $6,000 + $1,395 per user
EAGLE I I
AVL
503A Vandell Way, Campbell, CA 95008
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch dual-sided double-density (IM)
Bus: N/A
Operating System: CP/M
Price: $3,995
M System
Business Operating Systems
2835 East Platte Ave., Colorado Springs, CO 80909
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: two eight inch double-density floppies
Bus: S-100
Operating System: CP/M, MP/M, CPNET, BOS
Price: $5,000
-165-
Model 500
Columbia Data Products
8990 Route 108, Columbia, MD 21045
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: two 5 1/4 inch single-sided double-density drives (320K)
Bus: N/A
Operating System: CP/M, FDOS
Price: $4,495
Z2D
Cromemco
280 Bernardo Ave., Mountain View, CA 94040
Processor: Z80A microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch floppies (780K)
Bus: S-100
Operating System: CROMIX, CDOS
Price: $3,990
System 80
Exidy Systems
1234 Elko Dr., Sunnyvale, CA 94086
Processor: Z80 microprocessor
Memory: 48K RAM
Discs: two quad-density 5 1/4 inch floppies (616K)
Bus: S-100
Operating System: CP/M
Price: $4,490
T-200
Toshiba America
2441 Michelle Dr., Tustin, CA 92680
Processor: 8085A microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch floppies (560K)
Bus: N/A
Operating System: CP/M, BASIC
Price: $4,500
-166-
7 -
DB8/4
Dynabyte
1005 Elwell Ct., Palo Alto, CA 94303
Processor: Z80 microprocessor
Memory: 48K RAM
Discs: Two eight inch drives (2M)
Bus: S-100
Operating System: CP/M, MPM
Price: $4,495; no terminal
Model 525
Ithaca Intersystems
1650 Hanshow Rd., Ithaca, NY 14850
Processor: Z80 microprocessor
Memory: 128K RAM
Discs: two single-sided 5 1/4 inch floppies
Bus: 5-100
Operating System: Custom, CP/M, MP/M, COHERENT
Price: $5,595
Advantage, Horizon
Northstar Computers
14440 Catalina St., San Leandro, CA 94577
Processor: Z80
Memory: 64K RAM
Discs: two double-density floppies
Bus: S-100
Operating System: CP/M
Price: $3,996 (Advantage); $3,830 (Horizon)
QDP-100
Quasar Data Products
10330 Brecksville Rd., Cleveland, OH 44141
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: two eight inch double-sided double-density floppies (2.4M)
Bus: S-100
Operating System: CP/M, CBASIC
Price: $4,695
-167-
7.
4 ill
System 2812
Systems Group
1601 Orangewood Ave., Orange, CA 92668
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: Two single-sided floppies
Bus: S-100
Operating System: CP/M, MPM, OASIS
Price: $5,035
SPRINT 68
Wintek
1801 South St., Lafayette, IN 47904
Processor: 6800 microprocessor
Memory: 48K RAM
Discs: Two eight inch floppies
Bus: N/A
Operating System: WIZRD
Price: $3,949
S-11
Zobex
P.O. Box 1847, San Diego, CA 92112
Processor: Z8OA microprocessor
Memory: 64 K RAM
Discs: two eight inch floppies
Bus: S-100
Operating System: CP/M, MPM
Price: $4,900; no terminal
P300-01
California Computer Systems
250 Caribbean Dr., Sunnyvale, CA 94086
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: 1.2 megabytes of floppy
Bus: S-100
Operating System: CP/M, OASIS
Price: $5,695
56K B System
Gimix
1337 W. 37th Place, Chicago, IL 60609
Processor: 6809 microprocessor
Memory: 56K RAM
Discs: no disc drives included
Bus: SS-50
Operating System: OS-9, FLEX
Price: $2,988.59; no terminal
-168-
*OWN
Model 5000
IMS International
2800 Lockhead Way, Carson City, NV 89701
Processor: Z8OA microprocessor
Memory: 64K RAM
Discs: Two 5 114 inch dual-sided double-density floppies
Bus: S-100
Operating System: CP/M, MPM, TurboDOS
Price: $3,695
H -89
Heath
Dept 334-846, Benton Harbor, MI 49022
Processor: dual Z80 microprocessors
Memory: 64K RAM
Discs: three single-sided 5 1/4 inch drives (480K)
Bus: N/A
Operating System: CPJM, HDOS
Price: $4,095
YX3200
Sharp Electronics
10 Keyston Place, Paramus, NJ 07652
Processor: N/A
Memory: 64K RAM; 32K ROM
Discs: Two 5 1/4 inch floppies (570K)
Bus: N/A
Operating System: CP/M, FDOS
Price: $4,495
8052/8053
Intelligent Systems
225 Technology Park, Norcross, GA 30092
Processor: 8080A microprocessor
Memory: 8K RAM
Discs: Two 5 1/14 inch (8052, 160K) or eight inch (8053, 590K) floppies
Bus: N/A
Operating System: CP/M
Price: $4,795 (8052); $5,870 (8053)
-169-
Millie
MicroDaSys
2811 Wilshire Blvd., Santa Monica, CA 90403
Processor: Z80A microprocessor
Memory: 64K RAM
Discs: two eight inch single-sided double-density floppies (IM)
Bus: S-100
Operating System: CP/M
Price: $9,995
Mariner
Micromation
1620 Montgomery St., San Francisco, CA 94111
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: Two double-sided double-density floppies (2M)
Bus: S-100
Operating System: CP/M, MP/M
Price: $5,500
Zeus
OSM Computer
2364 Walsh Ave., Santa Clara, CA 95051
Processor: Z80A microprocessor
Memory: 64K RAM
Discs: Two eight inch floppies (1.2M)
Bus: S-100
Operating System: MUSE (CP/M)
Price: $5,900 + $3,400 per user
The following systems can be configured with two floppy discs and a
terminal (or built-in keyboard and screen) for less than $3,000. However,
all of the following systems lack disc space, memory space, or a good
software base.
CBM 8032
Commodore Computer Systems
681 Moore Rd., King of Prussia, PA 19406
Processor: 6508 microprocessor
Memory: 32K RAM
Discs: Two 5 1/4 inch drives (340K)
Bus: N/A
Operating System: Custom
Price: $3,095 (340K); $3,590 (1M)
-170-
-, ,-.. , ,-
A4
Mod 5
Compucolor
P.O. Box 569, Norcross, Georgia 30071
Processor: 8084 microprocessor
Memory: 32K RAM
Discs: Two single-sided single-density floppies (102.4K)
Bus: N/A
Operating System: BASIC
Price: $2,490
C8PDF
Ohio Scientific
1333 S. Chillicothe Rd., Aurora, OH 44202
Processor: 6502A microprocessor
Memory: 32K RAM
Discs: dual eight inch floppies
Bus: Custom
Operating System: OS65U
Price: $3,495
TI-99/4
Texas Instruments
P.O. Box 202129, Dallas, TX 75220
Processor: 9900 microprocessor
Memory: 16K RAM
Discs: three 5 1/4 inch single-sided, single-density floppies (270K)
Bus: N/A
Operating System: UCSD Pascal, COBOL, LOGO
Price: $2,825
-171-
The following systems fulfill all the requirements for adaptive
testing application. Furthermore, they offer a better cost per value
ratio that most of the other systems evaluated. They are all produced
by small companies and must be shipped back to the manufacturer for
repair. Unfortunately, they do not have a standard bus and are thus
less expandable than many other systems.
ACT-85 System
Autocontrol
11744 Westline Ind. Dr., St. Louis, MO 63141
Processor: 8085 microprocessor
Memory: 64K RAM
Discs: Two eight inch double-sided floppies
Bus: N/A
Operating System: CP/M
Price: $2,750
EXO Z80
Micro Business Associates
500 Second Street, San Francisco, CA 94107
Processor: Z80 microprocessor
Memory: 64K RAM
Discs: Two eight inch floppies (1.2M)
Bus: N/A
Operating System: Custom
Price: $2,995
LNW 80
LNW Research
2620 Walnut St., Tustin, CA 92680
Processor: Z80 microprocessor
Memory: 48K RAM
Discs: one floppy
Bus: N/A
Operating System: CP/M, CDOS, TRS-DOS
Price: $1,915
MOD III
Microcomputer Technology
3304 W. Macarthur, Santa Ana, CA 92704
Processor: Z80 microprocessor
Memory: 48K RAM
Discs: dual double-sided double-density floppies (1.SM)
Bus: none
Operating System: CP/M, RS-DOS
Price: $2,799
A -172-
;1~~
" -7
Model 500T
Quay
P.O. Box 783, 527 Industrial Way West, Eatontown, NJ 07724
Processor: Z8OA microprocessor
Memory: 64K RAM
Discs: Two double-density single-sided 5 1/4 inch floppies (400K)
Bus: N/A
Operating System: CP/M
Price: $2,995
The following systems have sufficient memory, sufficient disc space
for most testing applications, an ample supply of existing software, a
national maintenance service, and cost less than $3,600. They all are
packaged as stand-alone systems with screens and keyboards included.
They are thus all candidates for an adaptive testing system.
Apple II Plus
Apple Computer
10260 Brandley Dr., Cupertino, CA 95014
Processor: 6502 microprocessor
Memory: 48K RAM
Discs: Two 5 1/4 inch floppies (320K)
Bus: Apple
Operating System: DOS
Price: $2,700
SuperBrain
Intertec Data Systems
2300 Broad River Rd., Columbia, SC 29210
Processor: ZO microprocessor
Memory: 64K RAM
Discs: Two 5 1/4 inch floppies (320K)/(700K)
Bus: S-100 Interface
Price: $2,895 (350K)/$3,595 (700K)
I,
-173-
7- 1
Model 8000
NEC America
1401 Estes Ave., Elk Grove Village, IL 60007
Processor: UPD780c-1 (Z80) microprocessor
Memory: 32K/64K RAM
Discs: dual 5 1/4 inch floppies (286.72 K)
Bus: N/A
Operating System: CP/M
Price: $2,700 (32K)/$3,300 (64K)
APC
NEC America
1401 Estes Ave., Elk Grove Village, IL 60007
Processor: 8086 microprocessor
Memory: 256K RAM
Discs: quad-density 8 inch floppy (OM)
Bus: N/A
Operating System: MS-DOS, CP/M-V6
Price: $3,200
MBC-2000
Sanyo Business Systems
52 Joseph St., Moonachie, NJ 07074
Processor: dual 8085 microprocessors
Memory: 64K RAM
Discs: one or two 5 1/4 inch floppies (287K)/(574K)
Bus: Multibus interface
Operating System: CP/M, TS/DOS
Price: $1,995 (287K)/$3,495 (574K)
TS-802
Televideo Systems
1170 Morse Ave., Sunnyvale, CA 94086
Processor: Z80 miroprocessor (4 MHz)
Memory: 64K RAM
Discs: dual 5 1/4 inch floppies (IM)
Bus: N/A
Operating System: CP/M, MmmOST
Price: $3,495
~-174-
-14
- 'i~
Model 820
Xerox
1341 West Mockingbird Lane, Dallas, TX 75247
Processor: RS232, Z80
Memory: 64K
Discs: Two 8 inch floppies (400K)
Bus: N/A
Operating System: CPIM
Price: $3,200
22